Learn more knowledge about Hadoop

Tram Ho

What is Hadoop

This is certainly not a strange term for you to do Big data, can understand Hadoop is a collection of open source programs and processes, it allows distributed processing. Big data on clusters of computers through simple programming model to more effectively operate Big data. Here are some basic concepts and features of Hadoop to help you quickly understand the term easily.

Hadoop Analysis (HD)

HD total has 4 modules:

HDFS (Hadoop Distributed File System)

HDFS is understood to be a file system capable of storing terrible data and at the same time distributing, in addition to optimizing the use of bandwidth between nodes. Therefore it is used to run on a large cluster with tens of thousands of nodes.

Besides, we can use HDFS as a drive with almost no capacity limit. It allows to access multiple drives as a single drive, so to increase capacity just add a node to the system.

MapReduce (Hadoop MapReduce)

MapReduce is a framework that helps develop distributed applications according to the MapReduce model easily and strongly, a YARN-based system for parallel processing of large data sets. In addition, distributed application MapReduce can run on a large cluster with multiple nodes.

Hadoop YARN

Hadoop YARN has the function of managing resources of data storage systems and running analysis. We can extend YARN beyond a few thousand nodes through the YARN Federation feature. This feature allows us to tie multiple YARN clusters into one large cluster. This allows the use of independent, assembled clusters.

Hadoop Common

Last but not least, this is the necessary Java library and utility for other modules to use. These libraries provide abstract file systems and OS classes, and contain Java code to start Hadoop.

Advantages of Hadoop

HD helps users write and test distributed systems quickly. This is an efficient way to deliver data and workflow across workstations thanks to the parallel processing mechanism of the CPU cores.

Besides, HD is not subject to hardware failure mechanism, so Hadoop itself has libraries designed to detect and handle application layer errors.

Servers that have been broken down many times still work without interruption. A great advantage of Hadoop in addition to open source is compatibility across all platforms due to being developed on Java.

References

All of the above knowledge is gathered from my own knowledge and experience along with reference to some sources at home and abroad, to learn more about my HD to the link below!

hadoop.apache.org/docs/current/index.html

talend.com/resources/what-is-mapreduce/

Share the news now

Source : Viblo