Data Warehouse – the core of data mining 4.0 era

Tram Ho

Mining anything requires tools to increase efficiency (for example, minerals will need excavators, drills …), and with data mining – Data Mining is no exception. . Data excavators are called Data warehouse , like workers, who take the raw resources (data) from the source, file and then carefully store and analyze its value.

1. What is Data Warehouse (DW)?

Data warehouse is a data storage system according to many historical milestones. The data in DW will be processed, analyzed, … in order to create valuable predictions and reports for the organizations that own it. => Data warehouse is an important part for businesses digitized.

Characteristics of DW :

  • Subject Oriented : Not intended for operations such as Database, the data in DW focuses on subject (eg, products, customers, suppliers, etc.) in order to model, analyze and support the output. decision.
  • Integrated : Data is integrated from heterogeneous sources such as relational database, flat files, etc. to improve the efficiency of data analysis.
  • Time varitant : Data stored in time direction -> Provide historical perspective.
  • Non-volatile : It means that data can only be added, not deleted. This is because DW is completely separate from the DB, and regular DB updates / changes do not affect DW.

You may ask the question Database also stores data, in addition to functional differences, what is the difference DW ?

2. Data Warehouse vs. Database

The basic difference is that Database serves normal queries while Data warehouse with architecture is optimized for queries that handle large amounts of complex, multi-dimensional and multi-level data (from general level to detail. ) to be suitable for data analysis and data mining tasks of enterprises and data scientists.

DatabaseData warehouse
PurposeDesigned to save a recordingDesigned for analysis
HandlingOnline Transactional Processing (OLTP)Online Analytical Processing (OLAP)
Table and Joinstables and joins complex, relationship, standardized tablesnot standardized
Service orientedOrientation for applications and productsOrientation for different types of purposes
storage limitUsually limited to 1 applicationStore data from various sources
availabilityData is available from real time, should be thereare refreshed when needed from various sources, need to wait for the system to run and recreate the necessary periodic data
UseER model technique is usedData modeling techniques are used
SkillCapture dataData analysis
Data typeData stored in the Database is updated.Current data and history are stored. May not be updated.
Data savingFlat relational approach is used to store data.Use a multi-dimensional and standardized approach to data structure. Example: Star schema and snowflakes.
Type of querySimple transaction queries are used.Complex queries are used for analysis purposes.
Data summarySave detailed dataStore summary data
Simultaneous accessSupport multiple access at the same timeLimit access to 1 time, because only optimal for a small group of users.
ApplicationExamples in healthcare: storing patient information (height, weight, …)For example in healthcare: store the height, weight, … of patients according to different time points => Assess nutrition, predict the future.

3. Data Warehouse Architecture

The commonly applied architecture applied to Data Warehouse is a 3-tier architecture:

  • Bottom Tier : Data Warehouse server to extract information from various sources, then perform conversion, cleaning, load or refresh operations.
  • Middle Tier : OLAP server, converting data into a structure suitable for analysis and complex queries.
  • Top TIer : Tools for analysis, statistics, reporting … on the client side.

4. Strengths and weaknesses of D

4.1. Strength

  1. Through integration, DW allows quick, easy access to data from various sources.
  2. Provides consistent information for complex queries.
  3. Reduce analysis time and generate reports.
  4. Storing data over time, helping users analyze data according to many historical landmarks, predicting the future.

4.2. Weakness

  1. Not suitable for unstructured data.
  2. Creating and adding data takes time and costs.
  3. Can be outdate quickly.
  4. Difficult to modify settings such as data types, ranges, schema, indexes, ..
  5. Not easy to use with ordinary users. However, this is not a big deal because DW usually only serves a small group of specialized users.

5. Summary

Data Warehouse has now become an important part of data analysis in Data Mining, assisting businesses in making important decisions. Therefore, it is being widely used in many fields such as finance – banking, trading business, production management, … With the development of Cloud Computing technology as currently, it is in the future. In the near future, most Data Warehouse will be deployed in the cloud to increase usability, stability and safety.
Thank you for reading to the end of the article.

References

Share the news now

Source : Viblo