Search is a real-time distributed and open source full-text search and
analytics engine. Elastic Search is a search engine which is based on Lucene.
Lucene is an open source information retrieval software library and is
completely free to use. It is supported by the Apache Software Foundation and
is released under the Apache Software License.
Elastic Search is licensed under the Apache license version 2.0. It is
developed in Java and used by many huge organizations. It is a way to organize
data and make it accessible easily and efficiently. It is a distributed, scalable
and a text search engine. With ever-increasing size of data and its complexity,
the performance of the tradition systems is not adequate. They fail to provide
quick query execution for analysis. By using Elasticsearch we can exploit various
advantages of document-oriented architecture such as high query performance,
the simplicity of design, simpler horizontal scaling to clusters and finer
control over availability.
Elastic search is a real-time
distributed and open source full-text search and analytics engine. It is
accessible from RESTful web service interface and uses schema less JSON
documents to store data. It was initially built on Java programming language,
which enables Elastic search to run on different platforms. It enables users to
explore very large amount of data at very high speed. In elastic search
everything is related to the algorithms for matching text and storing optimized
indexes of query terms which is executed by Lucene. Elasticsearch is a near real time search platform. What this
means is there is a slight latency from the time you index a document until the
time it becomes searchable.
It is a search engine that
provides a quick full text search over various documents. It searches within
full text fields to find the document and return the most relevant and suitable
result first. It uses Boolean model to find a document. Boolean model is an
informational retrieval model which is adopted by many organizations. Since the first version of Elasticsearch was
released in 2010, it has quickly become the most popular search engine, and is
commonly used for log analytics, full-text search, and operational intelligence
use cases. When coupled a visualization tool, Elasticsearch can be used to
provide near-real time analytics using large volumes of log data.
2 BASIC CONCEPT
Some of the concepts of Elasticsearch are essential
for complete understanding of the working and operations of Elasticsearch.
It is a collection of different type of documents
and document properties. Elastic search stores the data in one or more than one
indices. It uses Apache Lucene to write as well as read the data from the
index. Indexing is similar to a database. Elasticsearch can store the data in
more than one indices. With the help of shards, Elasticsearch index can be of
more than a single Apache Index. An index is identified by a name which is
mandatory to be in lowercase.
It is the most important attribute. A document
is a basic unit of information that can be indexed. It is a collection of
documents which are carried out in a systematic manner defined in JSON format.
It consists of fields and each field is recognized by a name which may consists
of single value as well as multiple values. Within an index/type many documents
can be stored but it must be assigned to a type inside an index.
Each document has its defined
type. Collection of documents who share similar fields which are present in the
same index. A type was used to do a logical partition of
the index to store different types of documents in the same index.
Note: Type has been deprecated in the later
version of Elasticsearch (6.0.0 and higher).
Single running instance of Elastic search server is node.
Single and virtual instances accommodate multiple nodes. A node is a single
server that is part of your cluster, stores your data, and participates in the
cluster’s indexing and search capabilities. A node is identified by its name or
it can have a default value assigned at the node start-up which is a random
Universally Unique IDentifier (UUID).
The first node to take place is the data node since Elastic
search is designed to index and then search the data. Second type of node is a
master node that works as a master who controls the working of other substitute
nodes. Tribe node is a crucial node as through this distributed architecture
can communicate with each other as this node can join multiple Clusters and
hence performs a similar function that of a bridge between them.
A single node of Elasticsearch can perform
multiple, easy operations but for large data to be handled more efficiently a
group of nodes can be used. Therefore, a cluster is required which is a
collection of one or more nodes that together holds your entire data and
provides federated indexing and search capabilities across all nodes. A cluster
is identified by a unique name which is “elasticsearch” by default.
2.6 SHARDS AND
Single Index on a node with large
data can perform slow for search request. To solve this problem, Elasticsearch provides
the ability to subdivide the index into multiple pieces called shards which can
be defined at the time of index creation. Each shard is a fully-functional and
independent. Sharding is essential as it can horizontally split/scale the data
volume as well as distribute and parallelize operations across them.
At times network failure can occur which may
result in request failure, to avoid this elasticsearch can have copies of
shards that can achieve high availability at shard/node failure. This copy is
known as replica shards. Replicas is never allocated on the same node as its