ITZone

Learn NoSQL and Elasticsearch

1. NoSQL

NoSQL is a relational DMS, requires no fixed schema, avoids concatenation and is easy to expand, used for distributed data warehouses, with huge dl storage needs. NoSQL is divided into four main categories:

  1. Key-value stores
    • A hash table or linked list containing key-value key pairs.
    • Average complexity O (1).
    • Typical databases: Amazon DynamoDB, Redis, Riak, …

  1. Document database
    • Data stored and retrieved are documents (formats: XML, JSON, BSON …).
    • Describe itself, inheriting the tree DL structure.
    • Document is part of key-value.
    • Typical databases: MongoDB, RavenDB, Terastore, …

  1. Wide column stores
    • It is possible to store data in multiple columns in each row, with the key for each row.
    • Popular databases: Cassandra, Hypertable, and Amazon DynamoDB.

  1. Graph database
    • Storage of entities and relationships between entities.
    • Objects are nodes, which have properties.
    • The organization of the graph allows data to be stored once and interpreted in various ways.
    • Typical databases: Neo4j, Infinite graph, …

2. Elasticsearch

Elasticsearch is a search tool based on Lucene apache platform:

  • Provide API for storing and searching dl 1 quickly.
  • Building & ptrien with ng2 java, based on Lucene.
  • Built to operate as a RESTful cloud server
  • Interoperable and used by many languages ​​=> security is not high.
  • Consistent with the purpose of searching and aggregating data.

Advantages of Elasticsearch

  • Fast speed, excel in full-text search, near real-time search platform.
  • Distributed naturally: allows expansion to hundreds, thousands of servers and petabytes of data processing.
  • Integrating a number of powerful features to help store and search DL more effectively.
  • Elastic stack simplifies data entry, visualization and reporting. High scalability and availability.

Disadvantages of Elasticsearch

  • Elasticsearch is only strong in search, other tasks are often inferior to other dbs.
  • No guarantee of data integrity in write, update, delete operations …
  • Does not provide features for security and decentralization, so the security is worse than the other database.

Concepts in Elasticsearch

Compare with the concept in MySQL

Elasticsearch Mysql
Index Table
document Record
  • Index: A collection of documents with some similar characteristics. An index contains many documents.
  • Document: is the smallest unit to store data in Elasticsearch, can be indexed (indexed).
  • Node: A single server is part of a cluster. Each node consists of multiple shards.
  • Cluster: A collection of nodes that work together will have the same attribute ‘cluster_name’. The primary function of a cluster is to determine which shards are allocated to which node.
  • shard: a subset of the documents of an index An index can be divided into several shards.
    • primary shard: Store data and type replica shard. By default, each index has 5 primary shards and 1 primary shard has 1 replica shard included.
    • replica shard: Stores replicated data of primary shard.
  • segment: allows Lucene to add documents to the index easily.

Note: The number of primary shards for an index cannot be changed after the index has been created.


Application of Lucene

  • A shard is actually a Lucene index, which is where the data is actually stored.
  • A shard is also a search engine.
  • A Lucene index is made up of many segments (each segment is an inverted index).

Share the news now