Rule 1: What is the nature of the application (OLTP or OLAP)?
When you start designing your database, the first thing to analyze is the nature of the application you are designing, be it Transactional or Analytical. You will find many developers by default apply normalization rules without thinking about the nature of the application and then run into performance and customization issues. With that said, there are two types of applications: transaction based and analytics based, let’s understand what these are.
Transactional : In this type of application your end users are more concerned with CRUD i.e. create, read, update and delete records. The official name of such a database is OLTP.
Analytics : In these types of applications, your end users are more concerned with analysis, reporting, forecasting, etc. These types of databases have less number of inserts and updates. The main aim here is to fetch and parse the data as fast as possible. The official name of such a database is OLAP.
Here is a simple diagram showing how the name and address on the left side is a simple normalized table and by applying the denormalized structure we have created a flat table structure like how.
Rule 2: Break your data into logical pieces, make life simpler
This rule is actually the first rule from the first normal form. One of the signs of this rule violation is that if your queries are using too many string parsing functions like substring, charindex, etc., then perhaps this rule needs to be applied.
For example, you can see the table below with student names; if you want to query for student names with “Koirala” and not “Haftingh” you can imagine what kind of query you will end up with.
So a better approach would be to split this field into further logical parts so that we can write clean and optimal queries.
Rule 3: Do not exceed the 2 rule
Developers are cute creatures. If you tell them this is the way, they keep doing it; Well, they abuse it leading to undesirable consequences. This also applies to rule 2 that we just talked about above. When you think about decomposing, pause and ask yourself, is it necessary? With that said, the decay must be reasonable.
For example, you might see a phone number field; rarely do you work on the ISD codes of private phone numbers (until your app asks for it). So it would be a wise decision to just leave it as it can lead to more complications.
Rule 4: Consider heterogeneous duplicate data your biggest enemy
Centralize and restructure duplicate data. My personal worry about duplicate data is not that it takes up hard disk space, but the confusion it creates.
For example, in the diagram below, you can see that “Fifth Standard” and “Fifth Standard” mean the same thing. Now you can say that data entered your system due to bad data entry or poor validation. If you want to get a report, they show them as different entities, which is very confusing from an end user’s point of view.
Rule 5: Watch for data separated by delimiters
The second rule of 1 st normal form tells the group to avoid repetition. One of the examples of repeating groups is explained in the diagram below. If you look closely at the curriculum field, we have too much data in one field. These field types are called “Repeat Groups”. If we have to manipulate this data, the query will be complicated and I also doubt the performance of the queries.
With this approach, the syllabus field in the main table is no longer repeated and there are data separators.
Rule 6: Beware of partial dependencies
The curriculum is tied to the standard students are studying, not directly to the students. So if tomorrow we want to update the syllabus we have to update it for each student, which is very difficult and illogical. It makes more sense to move these fields out and link them to the Standard table.
You can see how we have moved the syllabus field and attached it to the Standards panel.
This rule is nothing but 2nd normal form: “All keys must depend on the full primary key, not the partial one”.
Rule 7: Preselect derived columns
In the image above you can see how the average field depends on the score and subject. This is also one of the forms of backup. So for those types of fields that derive from other fields, think: are they really necessary?
This rule is also called 3rd normal form: “No column should depend on other non-primary key columns”. Personally I think don’t apply this rule blindly, see the situation; it’s not that redundant data is always bad. If the redundant data is computational data, see the situation and then decide if you want a 3rd normal form implementation.
Rule 8: Don’t be hard to avoid redundancy, if performance is key
Rule 9: Multidimensional data is a whole other beast
OLAP projects mainly deal with multidimensional data. For example, you can see the picture below, you want to get sales by country, customer and date. Simply put, you’re looking at a sales metric that has three intersections of dimension data.
Rule 10: Focus on the design of the name value table
Many times I have looked through the name value table. A name and value table means it has a key and some data associated with the key. For example in the image below you can see we have a currency table and a country table. If you look closely at the data, they really only have one key and value.
Rule 11: For unlimited hierarchical data, the PK and FK are self-referencing
Many times we come across data with unlimited parent child hierarchy. For example, consider a multi-level marketing scenario where a single salesperson can have multiple sellers below them. For such cases, using self-referencing primary key and foreign key will help to achieve the same thing.
COURSE DATA WAREHOUSE : Synthesize, standardize and build a data warehouse in the enterprise
DATA MODEL COURSE – DESIGN DATA MODEL IN ENTERPRISE
THE ROAD TO BECOME A DATA ENGINEER FOR STARTERS