SRIJAN - A Case Study on Database Partitioning

A case Study on Database Partitioning -
An approach to improve overall performance of system

Mr. Ateet Mehta
Lecturer

In today’s era of information and communication technology, every organization builds and maintains their vital data and information using powerful database systems that offers better performance and availability. Apart from the typical transaction processing system to tape day to day operations, most organizations uses database systems heavily for data warehousing and data mining to support different information systems of an organization like decision support systems, executive information systems, expert systems and business intelligence systems that must run 24 x 7 around the globe. Not only is the volume of the data but also the criticality and value of such data has been increasing day by day for every organization from small cap, mid cap to enterprise. The key factors for any database systems therefore are manageability, availability, performance and maintainability. In this article, we will discuss an approach of storing data called partitioning to achieve above mentioned criteria and we will discuss a small case study of a student information system (SIS) implemented at SRIMCA where the use of partitioning can improve the overall performance of the system. The implementation issues on partitioning are considered for Oracle based database systems.

Oracle normally stores data physically into data files and logically into tablespace. So every tablespace is physically associated with one data file. One tablespace can contain any number of tables. In short, every table in the tablespace is stored physically in the same data file. Partitioning is basically splitting a large chunk of data physically into small manageable pieces. It means that partitioning allows a table to split logically into different tablespaes and therefore physically in different data files. Every organization stores not only the current data but also the history. The current data is an active set and the records, which are part of an active set and a history, depend on an organization. E.g. SIS maintains information of students related to their attendance in the lecture, their performance in different exams, leave information, fine details, complaints, feedbacks etc. Lecturers and counselors normally are interested in getting the information or the current semester. First we try to figure out the volume of data generated just for the attendance of the students. There are 6 classes, 4 lectures a day, and 60 students per class. It means 6 * 4 * 60 = 1440 records of students are inserted every day just to keep track of student’s attendance. It grows to approximately 44000 a month and 1300000 in a semester. A normal query to get attendance information for the current semester takes around 1300000 records. There will be around 85 lacks records of attendance for three years of students study in the institute.

The huge volume of data puts constraints in manageability, availability, performance and maintainability. If we divide a table that stores attendance in three partitions according to their frequency of use, these four important factors can be improved. We can have one partition for storing a data for the current month, one more partition to store data for the rest of the months of the same semester and the final partition for storing data of previous semesters. So any query to find out attendance details of the current month will have to fetch just one partition, which is 44000 records in place of 1300000. In fact, these numbers may be even reduced to quite many if index is used. But any either case, partition will always boost up the overall performance. Now we briefly address the four key objectives.

Manageability

Now with partitioning, we can choose to have high-speed storage systems to accommodate data for the current semesters and we buy low speed storage systems for storing the data that do not belong to active set. It is the history of previous semesters in our case study of SIS. As the volumes of data grow like anything, there will be significant differences in the prize between low speed storage systems and high-speed storage systems. It’s also logical to move older data to low speed storage systems and to keep active data onto high-speed storage systems. Most organizations buy high-speed storage systems for storing the whole database irrespective of their use which we can easily avoid using partitioning.

Availability

Partitioning allows data to split logically as well as physically. So if we have a logical or physical corruption in some part of the database, it will still allow rest of the database to be running and usable. So if we loose one partition, other partitions are still in use and information systems are up. If we store all our data in our partition only, which is the traditional approach, loosing that partition, will immediately bring the IT Systems down.

Performance

With partitioning, we can distribute data into multiple data files physically and these files can be stored into separate hard disks. It allows parallel queries. If the table is distributed physically into four partitions and full table access is required, then oracle will use four separate process to fetch the records in place of one process fetching the great volume of data. Secondly, not all queries require all records of the table. E.g. Teachers looking for records of the current month requires just one partition to scan. It greatly improves the speed.

Maintainability

As data is distributed into separate partitions, what we manage is small partitions of Mega Bytes and not a big table of Giga Bytes. It means we can independently perform backup and recovery procedures on partitions. E.g. Attendance records of previous months and semesters are never going to be modified once entered. There are going to be READ ONLY forever. It’s not logical to get the backup of such static data every month. We can also control the availability of different partitions. E.g. We can offline data of previous month or previous semesters simply by brining the corresponding tablespace offline. It requires a single click!

Conclusion

Partitioning helps greatly in achieving the major factors of database systems. It helps from reducing the cost of storage systems to reducing the overall time in executing the query. It helps tremendously in time taken in maintenance like backup and recovery. It is completely transparent to the application programs. We can easily convert non-partition tables into partitioned table without having to write a single line in an application code. This transformation is done by Oracle itself. Transaction Processing Council (TPC) which is the IT Industry’s standards for Benchmark, has acknowledged Oracle’s approach of database partitioning to be quite effective not only for the decision support systems and data warehousing but also for the transaction processing systems.