Introduction to Analyzer in Elasticsearch

Introduction to Analyzer in Elasticsearch

If we want to create a good search engine with Elasticsearch, knowing how Analyzer works is a must. A good search engine is a search engine that returns relevant results. When the user queried something in our Search Engine, we need to return the documents relevant to the user query.

Brilian Firdaus
Brilian Firdaus
10 min read

One component we can tune so Elasticsearch can return relevant documents is Analyzer. Analyzer is a component responsible for processing the text we want to index and is one component that control which documents are more relevant when querying.

A bit about Inverted Index

Since Analyzer correlates tightly to Inverted Index, we need to understand about what Inverted Index is first.

Inverted Index is a data structure for storing a mapping between token to the document identifiers that have the term. Other than document identifiers, the Inverted Index also stores the token position relative to the documents. Since Elasticsearch map the tokens with document identifiers, when you do a query to Elasticsearch, it can easily get the documents you want and returns the documents quick.

Indexing documents into Inverted Index

Let’s say that we want to index 2 documents:

Document 1: “Elasticsearch is fast”

Document 2: “I want to learn Elasticsearch”

Let’s take a peek into the Inverted Index and see the result of the Analysis and Indexing process:

Inverted Index

As you can see, the terms are counted and mapped into document identifiers and its position in the document. The reason we don’t see the full document “Elasticsearch is fast” or “I want to learn Elasticsearch” is because they go through Analysis process, which is our main topic in this article.