How to Handle Typos in Elasticsearch Using Fuzzy Query

How to Handle Typos in Elasticsearch Using Fuzzy Query

Typo is something that often happens and can reduce user’s experience, fortunately, Elasticsearch can handle it easily with Fuzzy Query. Handling typos is a must if you’re building an advanced autocomplete system with the Elasticsearch.

Brilian Firdaus
Brilian Firdaus
9 min read

If you want to create a simple one instead, you can read my other articles “Create a Simple Autocomplete With Elasticsearch“.

What is fuzzy logic

Fuzzy logic is a mathematics logic in which the truth of variables might be any number between 0 and 1. It is different with a Boolean logic that only has the truth values either 0 or 1.

In the Elasticsearch, fuzzy query means the terms in the queries don’t have to be the exact match with the terms in the Inverted Index.

To calculate the distance between query, Elasticsearch uses Levenshtein Distance Algorithm.

How to calculate distance using Levenshtein Distance Algorithm

Calculating a distance with Levenshtein Distance Algorithm is easy.

You just need to compare the first and second word character by character.

If the character is different, then you can add the distance between the words by one.

Let’s see an example, how to calculate the distance between the common typo word “Gppgle” with the correct word “Google”

elasticsearch fuzzy query: Levenshtein distance


After we calculate the distance between “Gppgle” and “Google” with Levenshtein Distance Algorithm, we can see that the distance is 2.

Fuzzy Query in Elasticsearch

Handling typo in Elasticsearch with Fuzzy Query is also simple.

Let’s start with making an example of the typo word “Gppgle”.

Request

curl --request POST \
  --url http://localhost:9200/fuzzy-query/_doc/_search \
  --header 'content-type: application/json' \
  --data '{
	"query": {
		"match" : {
			"text": {
				"query": "gppgle"
			}
		}
	}
}'

Response

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
}

When we’re using normal Match Query, the Elasticsearch will analyze the query “gppgle” first before searching it into the Elasticsearch.

The only term in the inverted index is “google” and it doesn’t match the term “gppgle”. Therefore, the Elasticsearch won’t return any result.

Now, let’s try Elasticsearch’s fuzzy in Match Query