Create a Simple Autocomplete With Elasticsearch

Create a Simple Autocomplete With Elasticsearch

Table of contents

Creating an autocomplete might sound daunting at first if you’ve never created one. But with the help of the features in Elasticsearch, it’s actually a simple thing to do.

Things You Should Know

If you have little knowledge of Elasticsearch, I suggest that you read my other articles first. We do not require this, but knowing how an analyzer and a text field work definitely will help you understand this article.

The article “Basics of Elasticsearch for Developer” will introduce you to Elasticsearch. The article “Elasticsearch: Text vs. Keyword” will teach you the difference between text and keyword in Elasticsearch and also will explain how Elasticsearch’s analyzer works.

Setup

Creating the index

First, let’s create an index called autocomplete-example. We will use this index for the examples in this article.

Request:

curl --request PUT \
  --url http://localhost:9200/autocomplete-example/ \
  --header 'content-type: application/json'

Response:

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "autocomplete-example"
}

Defining a mapping

Before indexing a document, let’s first define a mapping. We will only need one field, simple_autocomplete, with field data type text and will use a standard analyzer.

Since Elasticsearch uses the standard analyzer as default, we need not define it in the mapping.

Request:

curl --request PUT \
  --url http://localhost:9200/autocomplete-example/_mapping \
  --header 'content-type: application/json' \
  --data '{
 "properties": {
  "simple_autocomplete" : {
   "type":"text"
  }
 }
}'

Response:

{
  "acknowledged": true
}

Indexing a document

Let’s index a document. For the examples in this article, we will only need one document, containing the text “Hong Kong.”

Request:

curl --request POST \
  --url http://localhost:9200/autocomplete-example/_doc \
  --header 'content-type: application/json' \
  --data '{
 "simple_autocomplete": "Hong Kong"
}

Response:

{
  "_index": "autocomplete-example",
  "_type": "_doc",
  "_id": "aFAbznQBPNT8JhPaDhND",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

Querying the Index With match Query

Let’s start with the query that we normally use, match query.

The standard analyzer will lowercase your indexed text and split the text to tokens on stop words before storing it to an inverted index.

The match query by default will use the index-time analyzer, so the analyzer it uses is the same as the one indexed in the index, which is standard analyzer.

Let’s see how our “Hong Kong” text looks in the inverted index with the API provided by the Elasticsearch:

Request:

curl --request GET \
  --url 'http://localhost:9200/_analyze?pretty=' \
  --header 'content-type: application/json' \
  --data '{
 "analyzer": "standard",
 "text": "Hong Kong"
}'

Response:

{
  "tokens": [
    {
      "token": "hong",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "kong",
      "start_offset": 5,
      "end_offset": 9,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

When we do a search query to the index with match query, we will only get a result when we type text containing either “Hong” or “Kong.” This is because Elasticsearch only returns a result when the analyzed query is an exact match with a token in the inverted index.

Request:

curl --request POST \
  --url 'http://localhost:9200/autocomplete-example/_doc/_search?pretty=' \
  --header 'content-type: application/json' \
  --data '{
 "query": {
  "match": {
   "simple_autocomplete": "Hong"
  }
 }
}'

Response:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "autocomplete-example",
        "_type": "_doc",
        "_id": "aFAbznQBPNT8JhPaDhND",
        "_score": 0.5753642,
        "_source": {
          "simple_autocomplete": "Hong Kong"
        }
      }
    ]
  }
}

If the user type “Ho” or “Kon” or “Hon Kon,” there won’t be any response from Elasticsearch.

For an autocomplete, this one isn’t very useful to help the user, right? At the least, autocomplete needs to show something, even if we do not type the full words.

Request:

curl --request POST \
  --url 'http://localhost:9200/autocomplete-example/_doc/_search?pretty=' \
  --header 'content-type: application/json' \
  --data '{
 "query": {
  "match": {
   "simple_autocomplete": "Hon"
  }
 }
}'

Response:

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
    ]
  }
}

To fix it, we can use a match_phrase_prefix query provided by Elasticsearch.

Using match_phrase_prefix Query

match_phrase_prefix query will allow the user to get a result without typing all the words. By using the usual match query, we won’t get any result from the Elasticsearch if we type “Hon” or “Kon,” but with match_pharse_prefix, we can get a result.

Request:

curl --request POST \
  --url 'http://localhost:9200/autocomplete-example/_doc/_search?pretty=' \
  --header 'content-type: application/json' \
  --data '{
 "query": {
  "match_phrase_prefix": {
   "simple_autocomplete": {
    "query": "Hon"
   }
  }
 }
}'
curl --request POST \
  --url 'http://localhost:9200/autocomplete-example/_doc/_search?pretty=' \
  --header 'content-type: application/json' \
  --data '{
 "query": {
  "match_phrase_prefix": {
   "simple_autocomplete": {
    "query": "Kon"
   }
  }
 }
}'

Response:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "autocomplete-example",
        "_type": "_doc",
        "_id": "aFAbznQBPNT8JhPaDhND",
        "_score": 0.2876821,
        "_source": {
          "simple_autocomplete": "Hong Kong"
        }
      }
    ]
  }
}

There is still a shortcoming of this autocomplete: If the user types “Hon Kon,” it still won’t return any result. This is because “Hon Kon” is not the prefix of “Hong Kong”.

Request:

curl --request POST \
  --url 'http://localhost:9200/autocomplete-example/_doc/_search?pretty=' \
  --header 'content-type: application/json' \
  --data '{
 "query": {
  "match_phrase_prefix": {
   "simple_autocomplete": {
    "query": "Hon Kon"
   }
  }
 }
}'

Response:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
    ]
  }
}

The Pros and Cons

An autocomplete with a text field data type and the standard analyzer is very simple, but it has pros and cons that you can consider before using this type of autocomplete.

Pros

  • Easy to no setup: You don’t even have to define any mapping because by default, if you index a text document into Elasticsearch, it will get mapped into the text and keyword field data types.
  • Fast index time: Because this type of autocomplete is using the standard analyzer, it doesn’t process your text much when saving it to the inverted index, which translates to fast index time.
  • Enough most of the time: Most of the time, you don’t need a complex autocomplete. This autocomplete type will be enough.

Cons

  • Can’t handle typos: This type of autocomplete can’t handle typos, so if the user types one wrong word, it won’t return any result.
  • The query can’t start from the middle word: The text queried to this type of autocomplete also can’t start from the middle. In the previous example of “Hong Kong,” if we do a query with text “ong kong,” the Elasticsearch won’t return anything.
  • Can’t handle space character: If we had mistakenly typed “HongKong” in the previous example, the Elasticsearch wouldn’t have returned anything with this type of autocomplete.

When to Use

I recommend an autocomplete with only the standard analyzer when you only need a simple autocomplete. You can also use this type of autocomplete if the index you want to create an autocomplete of is already in production and indexed with documents. Since this autocomplete uses the default analyzer and default mapping for text, it will work for most text documents.

Conclusion

Creating an autocomplete with the text field data type and standard analyzer is the simplest and easiest autocomplete that we can build with Elasticsearch. It requires almost no setup and can usually create an autocomplete for an existing index.

Even if it’s enough for most use cases, it still has many weaknesses because it can only handle simple queries. To overcome that, we can use a custom-defined analyzer or the Suggesters feature in Elasticsearch, which I plan to write about. Please wait for it!

At last, I want to say thank you to you for reading this article until the end. I hope this article will help you with your project.

References

https://opster.com/elasticsearch-glossary/elasticsearch-auto-complete-guide/

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase-prefix.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html

Brilian Firdaus

Indonesia
A Software Engineer based in Indonesia.