Introduction to Normalization : Normalization concept

Normalization : 

Database Normalization is a technique of organizing the data in the database. Normalization basically split a large table into smaller tables and define relationships between them to increases the clarity in organizing data. Normalization is a systematic approach of decomposing tables to eliminate or minimize data redundancy and undesirable characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that puts data into tabular form by removing duplicated data from the relation tables.

Reasons for Normalization :

In general, the motive of normalization is to generate a set of relation schema that allows us to store information without unnecessary redundancy, yet also allows us to retrieve information easily. There are three main reasons to normalize a database :

    1. Minimize duplicate data or data redundancy.
    2. Minimize or avoid data modification issues.
    3. Simplify queries.

Without Normalization, it becomes difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anomalies are very frequent if Database is not Normalized. To understand these let consider the following table 'student':

The first thing to notice is this table serves many purposes including:

  • Listing all students, their age, city and Subjects.
  • Listing the available Subjects to study in College
  • Associating students with their subjects of study

By observing the above table, we know that the table serve many purposes, which introduces many of the challenges; namely, data duplication, data update issues, and increased effort to query data. we identify that :

  • John, alex and Jacob is applied for two different subjects, At the same time, their addresses are repeated for each of his subject, which is unnecessary data. It unnecessarily increases the table size and introduced a redundancy in the above table. 
  •  In case if we have to update City of John, then we need to update all the entries of John in the above table. Else his data will become inconsistent, leading to update anomaly.
  •  Lets say, student's City is present in more than one table. IF we want to update any one of the student's City, just updating Student table is not enough. We need to know which all table has City field which are related to Student. Not updating any one of the table will lead to incorrect data leading to update anomaly. 
  • Similarly, if any of the students drops out any of the subject and we need to delete his entry, entire entry of his would be deleted from above table. But he is still part of that college and whole of information is lost. But this should not happen. We should have his information stored in database. This is called leading to delete anomaly. 
  •  Also, if we need to enter any new student detail, which has not yet opted for any subject, then we need to enter all of his information with subject as NULL. Also, if a particular field, say City of a student is present in more than one table, we need to insert the data in all the tables. But for some of the tables, these data would be irrelevant at that point of time. All these situations lead to insertion anomaly.

 Hence to avoid all these redundancies and anomalies, set of guidelines are introduced which is called Normalization.

Types of Normal forms : 

Normalization rule are divided into following normal form.

1. First Normal Form (1NF)
2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
5. Fourth Normal Form (4NF)
6. Fifth Normal Form (5NF)

For Detailed explanation : Types Of Normal Forms : 1NF, 2NF, 3NF, BCNF, 4NF and 5NF 

Next Topic :  -  Pitfalls in Database Design

No comments:

Post a Comment