11 important rules in Database design

Tram Ho


Rule 1: What is the nature of the application (OLTP or OLAP)?

When you start designing your database, the first thing to analyze is the nature of the application you are designing, be it Transactional or Analytical. You will find many developers by default apply normalization rules without thinking about the nature of the application and then run into performance and customization issues. With that said, there are two types of applications: transaction based and analytics based, let’s understand what these are.

Transactional : In this type of application your end users are more concerned with CRUD i.e. create, read, update and delete records. The official name of such a database is OLTP.

Analytics : In these types of applications, your end users are more concerned with analysis, reporting, forecasting, etc. These types of databases have less number of inserts and updates. The main aim here is to fetch and parse the data as fast as possible. The official name of such a database is OLAP.

image.png In other words, if you think inserts, updates and deletes are more prominent then design a normalized table, otherwise create a flat denormalized database structure.

Here is a simple diagram showing how the name and address on the left side is a simple normalized table and by applying the denormalized structure we have created a flat table structure like how.


Rule 2: Break your data into logical pieces, make life simpler

This rule is actually the first rule from the first normal form. One of the signs of this rule violation is that if your queries are using too many string parsing functions like substring, charindex, etc., then perhaps this rule needs to be applied.

For example, you can see the table below with student names; if you want to query for student names with “Koirala” and not “Haftingh” you can imagine what kind of query you will end up with.

So a better approach would be to split this field into further logical parts so that we can write clean and optimal queries.


Rule 3: Do not exceed the 2 rule

Developers are cute creatures. If you tell them this is the way, they keep doing it; Well, they abuse it leading to undesirable consequences. This also applies to rule 2 that we just talked about above. When you think about decomposing, pause and ask yourself, is it necessary? With that said, the decay must be reasonable.

For example, you might see a phone number field; rarely do you work on the ISD codes of private phone numbers (until your app asks for it). So it would be a wise decision to just leave it as it can lead to more complications.


Rule 4: Consider heterogeneous duplicate data your biggest enemy

Centralize and restructure duplicate data. My personal worry about duplicate data is not that it takes up hard disk space, but the confusion it creates.

For example, in the diagram below, you can see that “Fifth Standard” and “Fifth Standard” mean the same thing. Now you can say that data entered your system due to bad data entry or poor validation. If you want to get a report, they show them as different entities, which is very confusing from an end user’s point of view.

image.png One of the solutions would be to move the entire data to another master table and reference them via foreign keys. You can see in the image below how we have created a new primary table named “Standard” and linked the same using a simple foreign key.


Rule 5: Watch for data separated by delimiters

The second rule of 1 st normal form tells the group to avoid repetition. One of the examples of repeating groups is explained in the diagram below. If you look closely at the curriculum field, we have too much data in one field. These field types are called “Repeat Groups”. If we have to manipulate this data, the query will be complicated and I also doubt the performance of the queries.

image.png Those column types with data stuffed with delimiters need special attention and a better approach would be to move those fields to another table and associate them with keys for better management.

image.png So now let’s apply the second rule of first normal form: “Avoid repeating groups”. You can see in the image above, I created a separate syllabus table and then created a many-to-many relationship with the topic table.

With this approach, the syllabus field in the main table is no longer repeated and there are data separators.

Rule 6: Beware of partial dependencies

image.png Notice the fields that are partially dependent on the primary key. For example in above table we can see primary key generated on roll number and standard. Now let’s look at the curriculum area closely. The curriculum field is linked to a standard and not to the student directly (the number of rolls).

The curriculum is tied to the standard students are studying, not directly to the students. So if tomorrow we want to update the syllabus we have to update it for each student, which is very difficult and illogical. It makes more sense to move these fields out and link them to the Standard table.

You can see how we have moved the syllabus field and attached it to the Standards panel.

This rule is nothing but 2nd normal form: “All keys must depend on the full primary key, not the partial one”.

Rule 7: Preselect derived columns

image.png If you’re working on OLTP applications, getting rid of derived columns would be a good thought, unless there’s some pressing performance reason. In the case of OLAP where we do a lot of calculations, summarizing, these field types are necessary to gain performance.

In the image above you can see how the average field depends on the score and subject. This is also one of the forms of backup. So for those types of fields that derive from other fields, think: are they really necessary?

This rule is also called 3rd normal form: “No column should depend on other non-primary key columns”. Personally I think don’t apply this rule blindly, see the situation; it’s not that redundant data is always bad. If the redundant data is computational data, see the situation and then decide if you want a 3rd normal form implementation.

Rule 8: Don’t be hard to avoid redundancy, if performance is key

image.png Don’t make it a strict rule that you will always avoid redundancy. If there is an urgent need for performance, think about denormalization. During normalization you need to create joins with multiple tables and during denormalization the joins will decrease and thus increase performance.

Rule 9: Multidimensional data is a whole other beast

OLAP projects mainly deal with multidimensional data. For example, you can see the picture below, you want to get sales by country, customer and date. Simply put, you’re looking at a sales metric that has three intersections of dimension data.

image.png For such situations, dimensional and factual design is a better approach. Simply put, you can create a simple central sales data table that has a sales quantity field and it makes connections to all dimension tables using foreign key relationships.

image.png image.png

Rule 10: Focus on the design of the name value table

Many times I have looked through the name value table. A name and value table means it has a key and some data associated with the key. For example in the image below you can see we have a currency table and a country table. If you look closely at the data, they really only have one key and value.

image.png For such table types, it makes more sense to create a central table and differentiate the data using a type field.

Rule 11: For unlimited hierarchical data, the PK and FK are self-referencing

Many times we come across data with unlimited parent child hierarchy. For example, consider a multi-level marketing scenario where a single salesperson can have multiple sellers below them. For such cases, using self-referencing primary key and foreign key will help to achieve the same thing.

image.png This article is not meant to say do not follow the usual forms, instead do not follow them blindly, look at the nature of your project and the type of data you are dealing with first.

image.png image.png Source: Internet >>> Read more:

COURSE DATA WAREHOUSE : Synthesize, standardize and build a data warehouse in the enterprise




Share the news now

Source : Viblo