Normalization

Normalization

Published by: Sareena Kumari Basnet

Published date: 26 Jul 2024

Normalization

Normalization

  • Normalization is the process of organizing the data in the database.
  • Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
  • Normalization divides the larger table into smaller and links them using relationships.
  • The normal form is used to reduce redundancy from the database table.

Advantages of Normalization

  1. Reduced data redundancy
  2. Improved data integrity
  3. Enhanced data organization
  4. Efficient data retrieval
  5. Easier maintenance and updates
  6. Better database scalability
  7. Minimizes data anomalies

Disadvantages of Normalization

  1. Increased complexity of queries
  2. Potential performance issues due to joins
  3. More tables to manage
  4. Can be time-consuming to design and implement
  5. May require more storage for keys and indexes
  6. Potential for slower performance on large databases
  7. Requires a deep understanding of the data model and relationships

Types of DBMS Normal forms

  1. First Normal Form

  2. Second Normal Form

  3. Third Normal Form

  4. BCNF

  5. Fourth Normal Form

1. First Normal Form (1NF)

For a table to be in the First Normal Form, it should follow the following 4 rules:

  1. It should only have single(atomic) valued attributes/columns.

  2. Values stored in a column should be of the same domain.

  3. All the columns in a table should have unique names.

  4. And the order in which data is stored should not matter.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table:

The decomposition of the EMPLOYEE table into 1NF has been shown below:

2. Second Normal Form (2NF)

For a table to be in the Second Normal Form,

  1. It should be in the First Normal form.

  2. And, it should not have Partial Dependency.

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can teach more than one subject.

TEACHER table

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_SUBJECT table:

3. Third Normal Form (3NF)

A table is said to be in the Third Normal Form when,

  1. It satisfies the First Normal Form and the Second Normal form.

  2. And, it doesn't have Transitive Dependency.

Example:

EMPLOYEE_DETAIL table:

In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMPLOYEE_ZIP table:

4. Boyce Codd normal form (BCNF)

  • BCNF is the advance version of 3NF. It is stricter than 3NF.
  • A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
  • For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example:

  • In the table above, student_id, subject form primary key, which means subject column is a prime attribute.
  •  But, there is one more dependency, professor → subject.
  • And while subject is a prime attribute, professor is a non-prime attribute, which is not allowed by BCNF.

To make this relation (table) satisfy BCNF, we will decompose this table into two tables, student table and professor table.
Below we have the structure for both the tables.

  • And now, this relation satisfy Boyce-Codd Normal Form.

5. Fourth normal form (4NF)

  • A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
  • For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will be a multi-valued dependency.

Example:

STUDENT

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STUDENT_HOBBY

 

 

FAQs About Topic
Normalization is the process of organizing data in a database to eliminate redundancy and improve data integrity by partitioning it into tables and identifying relationships between them.
While both 3NF and BCNF aim to remove redundancy, BCNF is more strict than 3NF. Every determinant in BCNF must be a candidate key, whereas 3NF just needs that non-key properties be not transitively dependent on the primary key.
The key drawbacks include greater query complexity, potential performance concerns from multiple joins, and more tables to handle.