Step by step to normalize Database data

Tram Ho

It’s not that the project is developing a big feature, having to design more tables, the project brothers keep saying “Customers don’t follow DB standardization at all”, even though I still remember my university days, but to be sure of what you’re talking about, do some research on the internet; And now share it with you.”

The concept of  Normalization is a technique in DB design, used to design tables of  Relational Databases  to higher “standards”.

This technique is divided into several levels including: in order to advance to the next level the previous level requirement must be satisfied. For example, we have a data set unnormalized form  (this is the lowest level) and we are looking to achieve the highest level of normalization.

Then the first step must first make sure normalized form I  is followed, then we further design the DB to satisfy the specific requirements of normalized form II,… and so on until the data set ours reaches the sixth normalized form.


NOTE

The fourth normalized form and above are only theoretical in nature, the problems that these forms aim to solve often “rare” appear in practice.


Initialization data

Here is an example of an initialized data set – not normalized.

TitleAuthorAuthor NationalityFormatPriceSubjectPagesThicknessPublisherPublisher CountryPublication TypeGenre IDGenre Name
Beginning MySQL Database Design and OptimizationChad RussellAmericanHardcover49.99MySQL,Database,Design520ThickApressUSAE-book1Tutorial

We assume that 1 book belongs to only 1 author

1NF – Normalized Form I

To satisfy  1NF, the value in each column needs to be a single value.

Looking at the table above, the  Subject  column is containing more than one value, causing this table to not satisfy 1NF.

One way to achieve 1NF in this case is to split those values into many other Subject columns.

TitleFormatAuthorAuthor NationalityPriceSubject 1Subject 2Subject 3PagesThicknessPublisherPublisher countryGenre IDGenre Name
Beginning MySQL Database Design and OptimizationHardcoverChad RussellAmerican49.99MySQLDatabaseDesign520ThickApressUSA1Tutorial

The above results have met the standard  1NF but there is a problem. If a book is on more than 3 topics, we won’t be able to add that new topic to the table without adding another column to the table.

To solve this problem more subtile, it is necessary to first identify the  objects  that appear in the table and divide them into corresponding tables. With this topic, 3 tables will be: Book, Subject and Publisher

TitleFormatAuthorAuthor NationalityPricePagesThicknessGenre IDGenre NamePublisher ID
Beginning MySQL Database Design and OptimizationHardcoverChad RussellAmerican49.99520Thick1Tutorial1

Publisher Table

Publisher_IDNameCountry
1ApressUSA

Bảng Subject

Subject IDSubject name
1MySQL
2Database
3Design

Obviously, splitting the table like this will make the data no longer linked. Which means we will have to  somehow  represent the relationship between these tables.

You notice that the  Publisher ID  column in the Books Table is the foreign key determining the  one – many relationship between the book and the publisher.

On the other hand, a book can belong to many different subjects, just as a subject can belong to many book rights. Therefore, we need a many-to-many relation represented in the intermediate table  below:

Table Title – Subject

TitleSubject ID
Beginning MySQL Database Design and Optimization1
Beginning MySQL Database Design and Optimization2
Beginning MySQL Database Design and Optimization3

Up to this point, we have a total of 4 tables detached from the first initialization table – just to comply with the  1NF standard

Satisfying 2NF

To identify a book, for the table below we need to use the Composite Key of {Title, Format}

TitleFormatAuthorAuthor NationalityPricePagesThicknessGenre IDGenre NamePublisher ID
Beginning MySQL Database Design and OptimizationHardcoverChad RussellAmerican49.99520Thick1Tutorial1
Beginning MySQL Database Design and OptimizationE-bookChad RussellAmerican22.34520Thick1Tutorial1
The Relational Model for Database Management: Version 2E-bookE.F.CoddBritish13.88538Thick2Popular science2
The Relational Model for Database Management: Version 2PaperbackE.F.CoddBritish39.99538Thick2Popular science2

We see that the columns that are not in the composite key  {Title, Format}  all depend on  Title, each column  Price  depends on both  Title and  Format.

To satisfy the  2NF criterion, we need to ensure that no attribute (column) depends only  1 part  on the  column key. In other words, in the  2NF standard, every attribute must depend completely  on the coherent key.

For the current problem, we will split up a table so that the  Title column becomes the primary key, all the remaining attributes (columns) depend entirely on it; and the Price  column splits into another table where it depends on the Format  column (and of course the Title)

Table Books

TitleAuthorAuthor NationalityPagesThicknessGenre IDGenre NamePublisher ID
Beginning MySQL Database Design and OptimizationChad RussellAmerican520Thick1Tutorial1
The Relational Model for Database Management: Version 2E.F.CoddBritish538Thick2Popular science2

The Books table is now standard  2NF

Table Format – Price

TitleFormatPrice
Beginning MySQL Database Design and OptimizationHardcover49.99
Beginning MySQL Database Design and OptimizationE-book22.34
The Relational Model for Database Management: Version 2E-book13.88
The Relational Model for Database Management: Version 2Paperback39.99

The Formats – Price table is still not up to the  2NF  standard..

Satisfying 3NF

The  Books  table is still experiencing a phenomenon called: bắctransitional dependency, specifically: {Author Nationality} depends on {Author}, and {Author} depends on {Title}.

Its downside is: if the {Author} value is changed, then we will have to find and change all the corresponding {Author Nationality} in other rows (if any) in the table.

Same thing happens between {Genre Name} {Genre ID} and {Title}

We say that the  Books  table is not up to the  3NF standard now.

To comply with this standard, we will redesign as follows: Split Books into 3 tables with 2 new tables respectively Authors and Genres.

Table Books

TitleAuthorPagesThicknessGenre IDPublisher ID
Beginning MySQL Database Design and OptimizationChad Russell520Thick11
The Relational Model for Database Management: Version 2E.F.Codd538Thick22

Table Authors

AuthorAuthor Nationality
Chad RussellAmerican
E.F.CoddBritish

Table Genres

Genre IDGenre Name
1Tutorial
2Popular science

4NF

First, to meet this standard, you need to meet the BCNF standard, that is, every attribute (column) must be a member of the composite key; or simply having a column that is not part of the composite key is BCNF false.

Next, the 4NF standard requires that inside a relational table NO can have more than 1 many-valued relations..

Take a look at the following example:

Assume these Books are owned by Franchisees (Enterprises) based in many States in the USA. Now we need to add a new table describing where a book is present. Assumption: every Franchisee has books at all its headquarters

Table Franchisee – Book – Location

Franchisee IDTitleLocation
1Beginning MySQL Database Design and OptimizationCalifornia
1Beginning MySQL Database Design and OptimizationFlorida
1Beginning MySQL Database Design and OptimizationTexas
1The Relational Model for Database Management: Version 2California
1The Relational Model for Database Management: Version 2Florida
1The Relational Model for Database Management: Version 2Texas
2Beginning MySQL Database Design and OptimizationCalifornia
2Beginning MySQL Database Design and OptimizationFlorida
2Beginning MySQL Database Design and OptimizationTexas
2The Relational Model for Database Management: Version 2California
2The Relational Model for Database Management: Version 2Florida
2The Relational Model for Database Management: Version 2Texas
3Beginning MySQL Database Design and OptimizationTexas

We see two relationships here:

  1. Franchisee ->> Location(1 Franchisee has many headquarters)
  2. Franchisee ->> Book (1 Franchisee sells many books)

At this point, we can say that this table has 2 many-valued relations.

To comply with  4NF ta must be redesigned so that there is only one value-to-value relation in a table. We separate as follows:

Table Franchisee – Book

Franchisee IDTitle
1Beginning MySQL Database Design and Optimization
1The Relational Model for Database Management: Version 2
2Beginning MySQL Database Design and Optimization
2The Relational Model for Database Management: Version 2
3Beginning MySQL Database Design and Optimization

Table Franchisee – Location

Franchisee IDLocation
1California
1Florida
1Texas
2California
2Florida
2Texas
3Texas

Summary

The rest of the standards are many, but the programmers I know here are probably OK. If you want to read more carefully the original document, please see the sources below for more:

Reference source

https://en.wikipedia.org/wiki/Database_normalization;

https://www.geeksforgeeks.org/introduction-of-4th-and-5th-normal-form-in-dbms/;

https://www.guru99.com/database-normalization.html

Share the news now