Difference between Database and Data Warehouse

Tram Ho

Assuming you have a huge amount of transaction information, after years of storage, we analyze the statistics to improve the system. In this sentence implication we need to have Database (database) and Data Warehouse (data warehouse) to complement each other.

Normal processing diagram of data warehouse

What is Data Warehouse?

With the analysis information industry, data warehouse is the guideline of the profession (this job is rare and difficult, even requiring machine learning knowledge). A data warehouse is a system that stores combined information from one or more sources. It is designed specifically for analyzing, reporting, and reducing the statistical analysis of a system for a large organization.

Comparison table between database and data wirehouse

To make it easier for everyone to see the view of the Data warehouse with the database, I have to refer to the following comparison table, which is translated according to my understanding.

ParametersDatabaseData warehouse
PurposeDesigned to save a recordingDesigned for analysis
HandlingOnline Transactional Processing ( OLTP )Online Analytical Processing ( OLAP )
Table and Joinstables and joins complex, relationship, standardized tablesnot standardized
Orientationserve the application and product orientationOrientation for different types of purposes
storage limitUsually limited to 1 applicationStore data from various sources
availabilityData is available from real time, should be thereare refreshed when needed from various sources, need to wait for the system to run and recreate the necessary periodic data
UseER model technique is usedData modeling techniques are used
SkillCapture dataData analysis
Data typeData stored in the Database is updated.Current data and history are stored. May not be updated.
Data savingFlat relational approach is used to store data.Use a multi-dimensional and standardized approach to data structure. Example: Star schema and snowflakes.
Type of querySimple transaction queries are used.Complex queries are used for analysis purposes.
Data summarySave detailed dataStore summary data

What is the actual application?

Suppose a business has millions of users every day like telecommunications X, in order to understand user behavior, this enterprise X needs to collect from phone location, calling frequency, frequency of internet use, etc. …. this data is very very huge. In order to survive and maximize revenue, enterprise X needs to analyze and analyze the good and bad things from those data, then the enterprise has an overall view to show the customer trends, segmentation coordinated transfer …

Or as a medical enterprise specializing in health care, patient data is valuable when it is very large, it can be the input of machine learning or AI to analyze report data to be able to better meet customers. , understand what most users want?

Preliminary look

Suppose we have a common database relational table that is said to be part of the database data for our data warehouse.

To summarize the schools we need, statistics, periodic reports that the bosses interested in, we can query normally one by one. But the problem is not fast, not fully filtered when urgent need to report on an X problem, the data warehouse analysis tool will work (OLAP tools).

As shown in the figure, we can filter different types of multi-dimensional queries (Dim Date, Dim Product, Dim Customer, DIm Times, …) combined with the records to produce very large different combinations of reports, whereas normal queries like databases will not respond.

Conclude

This introductory article on Data warehouse is nothing but a little illustration to show you the face of simple multi-dimensional query reports. Next I will go deeper on the demo to create the above reports.

Share the news now

Source : Viblo