Learn NoSQL and Elasticsearch

Tram Ho

1. NoSQL

NoSQL is a relational DMS, requires no fixed schema, avoids concatenation and is easy to expand, used for distributed data warehouses, with huge dl storage needs. NoSQL is divided into four main categories:

  1. Key-value stores
    • A hash table or linked list containing key-value key pairs.
    • Average complexity O (1).
    • Typical databases: Amazon DynamoDB, Redis, Riak, …

  1. Document database
    • Data stored and retrieved are documents (formats: XML, JSON, BSON …).
    • Describe itself, inheriting the tree DL structure.
    • Document is part of key-value.
    • Typical databases: MongoDB, RavenDB, Terastore, …

  1. Wide column stores
    • It is possible to store data in multiple columns in each row, with the key for each row.
    • Popular databases: Cassandra, Hypertable, and Amazon DynamoDB.

  1. Graph database
    • Storage of entities and relationships between entities.
    • Objects are nodes, which have properties.
    • The organization of the graph allows data to be stored once and interpreted in various ways.
    • Typical databases: Neo4j, Infinite graph, …

2. Elasticsearch

Elasticsearch is a search tool based on Lucene apache platform:

  • Provide API for storing and searching dl 1 quickly.
  • Building & ptrien with ng2 java, based on Lucene.
  • Built to operate as a RESTful cloud server
  • Interoperable and used by many languages ​​=> security is not high.
  • Consistent with the purpose of searching and aggregating data.

Advantages of Elasticsearch

  • Fast speed, excel in full-text search, near real-time search platform.
  • Distributed naturally: allows expansion to hundreds, thousands of servers and petabytes of data processing.
  • Integrating a number of powerful features to help store and search DL more effectively.
  • Elastic stack simplifies data entry, visualization and reporting. High scalability and availability.

Disadvantages of Elasticsearch

  • Elasticsearch is only strong in search, other tasks are often inferior to other dbs.
  • No guarantee of data integrity in write, update, delete operations …
  • Does not provide features for security and decentralization, so the security is worse than the other database.

Concepts in Elasticsearch

Compare with the concept in MySQL

ElasticsearchMysql
IndexTable
documentRecord
  • Index: A collection of documents with some similar characteristics. An index contains many documents.
  • Document: is the smallest unit to store data in Elasticsearch, can be indexed (indexed).
  • Node: A single server is part of a cluster. Each node consists of multiple shards.
  • Cluster: A collection of nodes that work together will have the same attribute ‘cluster_name’. The primary function of a cluster is to determine which shards are allocated to which node.
  • shard: a subset of the documents of an index An index can be divided into several shards.
    • primary shard: Store data and type replica shard. By default, each index has 5 primary shards and 1 primary shard has 1 replica shard included.
    • replica shard: Stores replicated data of primary shard.
  • segment: allows Lucene to add documents to the index easily.

Note: The number of primary shards for an index cannot be changed after the index has been created.


Application of Lucene

  • A shard is actually a Lucene index, which is where the data is actually stored.
  • A shard is also a search engine.
  • A Lucene index is made up of many segments (each segment is an inverted index).

Share the news now

Source : Viblo