In today’s era of information and communication
technology, every organization builds and maintains their vital data and
information using powerful database systems that offers better performance
and availability. Apart from the typical transaction processing system to
tape day to day operations, most organizations uses database systems
heavily for data warehousing and data mining to support different
information systems of an organization like decision support systems,
executive information systems, expert systems and business intelligence
systems that must run 24 x 7 around the globe. Not only is the volume of
the data but also the criticality and value of such data has been
increasing day by day for every organization from small cap, mid cap to
enterprise. The key factors for any database systems therefore are
manageability, availability, performance and maintainability. In this
article, we will discuss an approach of storing data called partitioning
to achieve above mentioned criteria and we will discuss a small
case study of a student information system (SIS) implemented at SRIMCA
where the use of partitioning can improve the overall performance of the
system. The implementation issues on partitioning are considered for
Oracle based database systems.
Oracle normally stores data physically into data files and
logically into tablespace. So every tablespace is physically associated
with one data file. One tablespace can contain any number of tables. In
short, every table in the tablespace is stored physically in the same data
file. Partitioning is basically splitting a large chunk of data physically
into small manageable pieces. It means that partitioning allows a table to
split logically into different tablespaes and therefore physically in
different data files. Every organization stores not only the current data
but also the history. The current data is an active set and the records,
which are part of an active set and a history, depend on an organization.
E.g. SIS maintains information of students related to their attendance in
the lecture, their performance in different exams, leave information, fine
details, complaints, feedbacks etc. Lecturers and counselors normally are
interested in getting the information or the current semester. First we
try to figure out the volume of data generated just for the attendance of
the students. There are 6 classes, 4 lectures a day, and 60 students per
class. It means 6 * 4 * 60 = 1440 records of students are inserted every
day just to keep track of student’s attendance. It grows to
approximately 44000 a month and 1300000 in a semester. A normal query to
get attendance information for the current semester takes around 1300000
records. There will be around 85 lacks records of attendance for three
years of students study in the institute.
The huge volume of data puts constraints in manageability,
availability, performance and maintainability. If we divide a table that
stores attendance in three partitions according to their frequency of use,
these four important factors can be improved. We can have one partition
for storing a data for the current month, one more partition to store data
for the rest of the months of the same semester and the final partition
for storing data of previous semesters. So any query to find out
attendance details of the current month will have to fetch just one
partition, which is 44000 records in place of 1300000. In fact, these
numbers may be even reduced to quite many if index is used. But any either
case, partition will always boost up the overall performance. Now we
briefly address the four key objectives.
Manageability
Now with partitioning, we can choose to have high-speed
storage systems to accommodate data for the current semesters and we buy
low speed storage systems for storing the data that do not belong to
active set. It is the history of previous semesters in our case study of
SIS. As the volumes of data grow like anything, there will be significant
differences in the prize between low speed storage systems and high-speed
storage systems. It’s also logical to move older data to low speed
storage systems and to keep active data onto high-speed storage systems.
Most organizations buy high-speed storage systems for storing the whole
database irrespective of their use which we can easily avoid using
partitioning.
Availability
Partitioning allows data to split logically as well as
physically. So if we have a logical or physical corruption in some part of
the database, it will still allow rest of the database to be running and
usable. So if we loose one partition, other partitions are still in use
and information systems are up. If we store all our data in our partition
only, which is the traditional approach, loosing that partition, will
immediately bring the IT Systems down.
Performance
With partitioning, we can distribute data into multiple
data files physically and these files can be stored into separate hard
disks. It allows parallel queries. If the table is distributed physically
into four partitions and full table access is required, then oracle will
use four separate process to fetch the records in place of one process
fetching the great volume of data. Secondly, not all queries require all
records of the table. E.g. Teachers looking for records of the current
month requires just one partition to scan. It greatly improves the speed.
Maintainability
As data is distributed into separate partitions, what we
manage is small partitions of Mega Bytes and not a big table of Giga
Bytes. It means we can independently perform backup and recovery
procedures on partitions. E.g. Attendance records of previous months and
semesters are never going to be modified once entered. There are going to
be READ ONLY forever. It’s not logical to get the backup of such static
data every month. We can also control the availability of different
partitions. E.g. We can offline data of previous month or previous
semesters simply by brining the corresponding tablespace offline. It
requires a single click!
Conclusion
Partitioning helps greatly in achieving the major factors
of database systems. It helps from reducing the cost of storage systems to
reducing the overall time in executing the query. It helps tremendously in
time taken in maintenance like backup and recovery. It is completely
transparent to the application programs. We can easily convert
non-partition tables into partitioned table without having to write a
single line in an application code. This transformation is done by Oracle
itself. Transaction Processing Council (TPC) which is the IT Industry’s
standards for Benchmark, has acknowledged Oracle’s approach of database
partitioning to be quite effective not only for the decision support
systems and data warehousing but also for the transaction processing
systems.
|