Assuming you have a huge amount of transaction information, after years of storage, we analyze the statistics to improve the system. In this sentence implication we need to have Database (database) and Data Warehouse (data warehouse) to complement each other.
Normal processing diagram of data warehouse
What is Data Warehouse?
With the analysis information industry, data warehouse is the guideline of the profession (this job is rare and difficult, even requiring machine learning knowledge). A data warehouse is a system that stores combined information from one or more sources. It is designed specifically for analyzing, reporting, and reducing the statistical analysis of a system for a large organization.
Comparison table between database and data wirehouse
To make it easier for everyone to see the view of the Data warehouse with the database, I have to refer to the following comparison table, which is translated according to my understanding.
|Purpose||Designed to save a recording||Designed for analysis|
|Handling||Online Transactional Processing ( OLTP )||Online Analytical Processing ( OLAP )|
|Table and Joins||tables and joins complex, relationship, standardized tables||not standardized|
|Orientation||serve the application and product orientation||Orientation for different types of purposes|
|storage limit||Usually limited to 1 application||Store data from various sources|
|availability||Data is available from real time, should be there||are refreshed when needed from various sources, need to wait for the system to run and recreate the necessary periodic data|
|Use||ER model technique is used||Data modeling techniques are used|
|Skill||Capture data||Data analysis|
|Data type||Data stored in the Database is updated.||Current data and history are stored. May not be updated.|
|Data saving||Flat relational approach is used to store data.||Use a multi-dimensional and standardized approach to data structure. Example: Star schema and snowflakes.|
|Type of query||Simple transaction queries are used.||Complex queries are used for analysis purposes.|
|Data summary||Save detailed data||Store summary data|
What is the actual application?
Suppose a business has millions of users every day like telecommunications X, in order to understand user behavior, this enterprise X needs to collect from phone location, calling frequency, frequency of internet use, etc. …. this data is very very huge. In order to survive and maximize revenue, enterprise X needs to analyze and analyze the good and bad things from those data, then the enterprise has an overall view to show the customer trends, segmentation coordinated transfer …
Or as a medical enterprise specializing in health care, patient data is valuable when it is very large, it can be the input of machine learning or AI to analyze report data to be able to better meet customers. , understand what most users want?
Suppose we have a common database relational table that is said to be part of the database data for our data warehouse.
To summarize the schools we need, statistics, periodic reports that the bosses interested in, we can query normally one by one. But the problem is not fast, not fully filtered when urgent need to report on an X problem, the data warehouse analysis tool will work (OLAP tools).
As shown in the figure, we can filter different types of multi-dimensional queries (Dim Date, Dim Product, Dim Customer, DIm Times, …) combined with the records to produce very large different combinations of reports, whereas normal queries like databases will not respond.
This introductory article on Data warehouse is nothing but a little illustration to show you the face of simple multi-dimensional query reports. Next I will go deeper on the demo to create the above reports.