One component we can tune so Elasticsearch can return relevant documents is Analyzer. Analyzer is a component responsible for processing the text we want to index and is one component that control which documents are more relevant when querying.
A bit about Inverted Index
Since Analyzer correlates tightly to Inverted Index, we need to understand about what Inverted Index is first.
Inverted Index is a data structure for storing a mapping between token to the document identifiers that have the term. Other than document identifiers, the Inverted Index also stores the token position relative to the documents. Since Elasticsearch map the tokens with document identifiers, when you do a query to Elasticsearch, it can easily get the documents you want and returns the documents quick.
Indexing documents into Inverted Index
Let’s say that we want to index 2 documents:
Document 1: “Elasticsearch is fast”
Document 2: “I want to learn Elasticsearch”
Let’s take a peek into the Inverted Index and see the result of the Analysis and Indexing process:
As you can see, the terms are counted and mapped into document identifiers and its position in the document. The reason we don’t see the full document “Elasticsearch is fast” or “I want to learn Elasticsearch” is because they go through Analysis process, which is our main topic in this article.