Introduction to BIG DATA: What is, Types, Characteristics & Example

Tram Ho

What is Data?

The number, characters or symbols of processes and applications performed by a computer can be stored and transmitted as electrical signals and stored on magnetic recording media. (magnetic media: tapes, floppy disks, …..), optical media (optical media) or mechanical conversion media (mechanical recording media)

What is Big Data?

Big Data is also a form of data but with huge size. Big Data is a term used to describe a huge data set that is growing exponentially over time. In short, that data is so large and complex that no traditional data management tool can effectively store or process it.

In the ever-evolving digital era, Big Data is made up of many different sources such as from websites, media, personal computers, mobile applications, transmitters. data, v..v ….

Here are some examples of Big Data:

  1. The New York Stock Exchange generates about 1 terabyte of new trading data each day.

2. Social Media: Statistics show that 500+ terabytes of new data are put into the database of the social media site Facebook every day. This data is mainly created by users uploading photos and videos, exchanging messages, posting comments, etc.

3. Jet engine: A Jetar Jet engine can generate over 10 terabytes of data in 30 minutes of flight time. With thousands of flights a day, data is generated up to multiple Petabytes.

Types Of Big Data

Big Data can be found in 3 different formats as below:

  1. Structured:

Any data that can be stored, accessed and processed in a fixed format is called ‘Structured’ data. In just a short period of time, computer science has gained greater achievements in development techniques while working with Structured format like this. However, nowadays, more and more problems arise when the size of the data increases greatly, the typical size today has reached zettabytes.

Do you know how big zettabyte is? 10 ^ 21 bytes or 1 billion terabytes = 1 zettabyte

A typical example of the Big Data Structured format is the database base

Employee_IDEmployee_NameGenderDepartmentSalary_In_lacs
2365Rajesh KulkarniMaleFinance650000
3398Pratibha JoshiFemaleAdmin650000
7465Shushil RoyMaleAdmin500000
7500Shubhojit DasMaleFinance500000
7699Priya SaneFemaleFinance550000
  1. Unstructured:

Any data without a defined template or structure is classified as unstructured. In addition to the fact that the size of the data is very large, unstructured data poses many processing challenges so that users can derive value from it. A typical example of unstructured data is heterogeneous data, no database relationship with each other such as text, image files, video, audio, etc. An example of unstructuted data is the output returned using the Google Search tool

  1. Semi-structured:

Is a combination of 2 forms of Structured data and Unstructured data, also known as semi-structured data format. For example, XML or Json, here is an XML fragment that stores user data

Data grows over the years

It should be noted that web application data is an unstructured data, including log files, transaction history files, and so on. OLTP systems – On-line transactional processing is built to work with structured data, in which data is stored in relationships (tables), for example an ATM is an OLTP system.

Characteristics Of Big Data

Big Data’s characteristics are described by the following 5 factors (or they are also called 5V):

  • Volume: the size of the data

The name Big Data in itself points out the connection to a very large size. The size of the data plays a very important role in determining the value of the data. Also, whether a particular data is actually considered Big Data or not, depends on the volume of the data. Therefore, volume is a trait that needs to be considered when processing Big Data.

  • Variety: the abundance and diversity of data

Diversity refers to heterogeneous sources and the nature of the data, both structured and unstructured. In the past, spreadsheets and databases were the only data sources that were considered to store data for most applications. But today, data in the form of email, photos, videos, surveillance devices, PDF files, audio files, etc. are also being considered in analytical applications. This variety of unstructured data poses certain problems for data storage, mining and analysis.

  • Velocity: processing speed, data analysis

The term velocity refers to the speed of data creation. How quickly data is generated and processed to meet demand, identifying the true potential for data processing.

The high data processing speed is related to the speed of data transfer from sources such as business processes, application logs, networks and social media sites, sensors, mobile devices, etc.The data flow is very big and continuous non-stop.

  • Variability: data conversion

Variability refers to inconsistencies that can be displayed by data over time, so this variability can affect, hinder the process of processing and managing data effectively.

  • Veracity: the reliability of the data

One of the most complex properties of Big Data is the reliability and accuracy of the data. With the growing trend of today’s Social Media and Social Network platforms, the dramatic increase in interaction and sharing of Mobile users makes the picture determine the reliability & accuracy of data. more and more difficult. The problem of analyzing and eliminating inaccurate data and perturbations are important properties of Big data.

Benefits of Big Data Processing

The ability to process Big Data brings many benefits such as:

  • Businesses can use external sources when making decisions. For example, using data from social networks like Facebook and Twitter to allow organizations to devise their business strategies.
  • Improve customer care system. Traditional customer feedback systems are being replaced by new systems designed with Big Data technology. In these new systems, Big Data and natural language processing technology are being used to read and evaluate consumer feedback.
  • Identify early risks that may arise for the product / service (if any).
  • Better performance

Big Data can be used to create staging areas or landing zones for new data to determine which data should be stored in a data warehouse before storing them.

To learn more about Big Data, you can refer to the sources cited below.

https://www.guru99.com/what-is-big-data.html

https://www.datamation.com/big-data/structured-vs-unstructured-data.html

https://techblog.vn/oltp-va-olap-co-gi-khac-nhau

https://ehealth.gov.vn/Index.aspx?action=News&newsId=46156

Share the news now

Source : Viblo