Elasticsearch: Text vs. Keyword

Elasticsearch: Text vs. Keyword

Many people that have just started learning Elasticsearch often confuse the Text and Keyword field data type. The difference between them is simple, but very crucial.

Brilian Firdaus
Brilian Firdaus
7 min read

In this article, I will talk about the difference, how to use them, how they behave, and which one to use between the two.

The Differences

The crucial difference between them is that Elasticsearch will analyze the Text before it’s stored into the Inverted Index while it won’t analyze Keyword type. Analyzed or not analyzed will affect how it will behave when getting queried.

If you’re just starting to learn Elasticsearch and still don’t know what is Inverted Index and Analyzer, I recommend reading a basic guide to Elasticsearch first.

How to Use Them

If you index a document to Elasticsearch containing string without defining mapping to the fields before, Elasticsearch will create a dynamic mapping with both Text and Keyword data type. But even if it works with dynamic mapping, I suggest that you define a mapping settings before you index any document depending on the use case to save space and increase writing speed.

These are examples of the mapping settings for Text and Keyword type, note that I will use an Index named “text-vs-keyword” which I have created before for this example.

Keyword Mapping

curl --request PUT \
  --url http://localhost:9200/text-vs-keyword/_mapping \
  --header 'content-type: application/json' \
  --data '{
 "properties": {
  "keyword_field": {
   "type": "keyword"
  }
 }
}'

Text Mapping

curl --request PUT \
  --url http://localhost:9200/text-vs-keyword/_mapping \
  --header 'content-type: application/json' \
  --data '{
 "properties": {
  "text_field": {
   "type": "text"
  }
 }
}'

Multi Fields

curl --request PUT \
  --url http://localhost:9200/text-vs-keyword/_mapping \
  --header 'content-type: application/json' \
  --data '{
 "properties": {
  "text_and_keyword_mapping": {
   "type": "text",
   "fields": {
    "keyword_type": {
     "type":"keyword"
    }
   }
  }
 }
}'

How They Work

Both of the field types are indexed differently in the Inverted Index. The difference in the indexing process will affect when you’re doing a query to the Elasticsearch.

Let’s index a document for example:

curl --request POST \
  --url http://localhost:9200/text-vs-keyword/_doc/example \
  --header 'content-type: application/json' \
  --data '{
 "keyword_field":"The quick brown fox jumps over the lazy dog",
 "text_field":"The quick brown fox jumps over the lazy dog"
}'

After executing the curl command above, then if you get all of the documents in the index then you should have:

{
    "_index": "text-vs-keyword",
    "_type": "_doc",
    "_id": "example",
    "_score": 1.0,
    "_source": {
        "keyword_field": "The quick brown fox jumps over the lazy dog",
        "text_field": "The quick brown fox jumps over the lazy dog"
    }
}

Keyword

Let’s start with the simpler one, Keyword. Elasticsearch won’t analyze Keyword data types, which means the String that you index will stay as it is.

So, with the example above, what would the string looks like in the Inverted Index?

Yes, you’re right, it’s exactly as you write.

Text

Unlike the Keyword field data type, the string indexed to Elasticsearch will go through the analyzer process before it is stored into the Inverted Index. By default, the Elasticsearch’s standard analyzer will split and lower the string that we indexed. You can learn more about the standard analyzer on Elasticsearch’s documentation.

Elasticsearch has an API to check what the text would look like after the analyzing process, we can try it with:

curl --request POST \
  --url http://localhost:9200/text-vs-keyword/_analyze?pretty \
  --header 'content-type: application/json' \
  --data '{
  "analyzer": "standard",
  "text": "The quick brown fox jumps over the lazy dog"
}'

So according to the response above, this is how the Inverted Index should look like for text_field field

Only a little different from the keyword one, right? But you need to pay attention to what it stores in the Inverted Index because it will majorly affect the query process.

Querying Text and Keyword

Now that we understand how text and keyword behave when indexed, let’s learn about how they behave when they’re queried.

First, we must know there are two types of query for the string:

  • Match Query
  • Term Query

Same as Text and Keyword, the difference between Match Query and Term Query is that the query in Match Query will get analyzed into terms first, while the query in Term Query will not.

Querying Elasticsearch works by matching the queried terms with the terms in the Inverted Index, the terms queried and the one in the Inverted Index must be exactly the same, else it won’t get matched. This means that the analyzed string and non-analyzed string in indexing and querying results will produce a very different result.

Querying keyword field with Term Query

Because both the field data type and query aren’t analyzed they both will need to be exactly the same so they can produce a result.

If we try with the exact same query:

curl --request POST \
  --url 'http://localhost:9200/text-vs-keyword/_doc/_search?size=0' \
  --header 'content-type: application/json' \
  --data '{
 "query": {
  "term": {
   "keyword_field": "The quick brown fox jumps over the lazy dog"
  }
 }
}'

Elasticsearch will return a result:

{
        "_index": "text-vs-keyword",
        "_type": "_doc",
        "_id": "example",
        "_score": 0.2876821,
        "_source": {
          "keyword_field": "The quick brown fox jumps over the lazy dog",
          "text_field": "The quick brown fox jumps over the lazy dog"
        }
      }
}

If we try with something that is not exact, even if there is the word in the Inverted Index:

curl --request POST \
  --url 'http://localhost:9200/text-vs-keyword/_doc/_search?size=0' \
  --header 'content-type: application/json' \
  --data '{
 "query": {
  "term": {
   "keyword_field": "The"
  }
 }
}'

It returned no result because the term in the query doesn’t match any of the terms in the Inverted Index.

Querying keyword field with Match Query

Let’s first try querying the same string “The quick brown fox jumps over the lazy dog” with Match Query to keyword_mapping and see what happens:

curl --request POST \
  --url http://localhost:9200/text-vs-keyword/_doc/_search \
  --header 'content-type: application/json' \
  --data '{
 "query": {
  "match": {
   "keyword_field": "The quick brown fox jumps over the lazy dog"
  }
 }
}'

The result should be:

{
 "_index": "text-vs-keyword",
 "_type": "_doc",
 "_id": "example",
 "_score": 0.2876821,
 "_source": {
  "keyword_field": "The quick brown fox jumps over the lazy dog",
  "text_field": "The quick brown fox jumps over the lazy dog"
 }
}

Wait, it shouldn’t produce any result because the terms produced analyzed query aren’t an exact match with the “The quick brown fox jumps over the lazy dog” in the Inverted Index, but why is it producing a result?

That’s right, the query was analyzed because we’re using Match Query, but instead of a standard analyzer, the Elasticsearch used index-time analyzer, which was mapped to the Keyword field data type. Since the analyzer mapped with Keyword field data type is Term Analyzer, the Elasticsearch changed nothing in the query.

Now, let’s try with a standard analyzer:

curl --request POST \
  --url http://localhost:9200/text-vs-keyword/_doc/_search \
  --header 'content-type: application/json' \
  --data '{
 "query": {
  "match": {
   "keyword_field": {
    "query": "The quick brown fox jumps over the lazy dog",
    "analyzer":"standard"
   }
  }
 }
}'

No result is produced because it analyzes the query in terms and nothing is an exact match with the term in the Inverted Index.

Querying text type with Term Query

The indexed document of text type will have many terms as we can see in the previous section. To show how the query gets matched with the terms in Inverted Index, let’s try two queries, The first query sends the entire sentence to Elasticsearch;

curl --request POST \
  --url 'http://localhost:9200/text-vs-keyword/_doc/_search?pretty=' \
  --header 'content-type: application/json' \
  --data '{
 "query": {
  "term": {
   "text_field": "The quick brown fox jumps over the lazy dog"
  }
 }
}'

the second one only “The.”

curl --request POST \
  --url 'http://localhost:9200/text-vs-keyword/_doc/_search?pretty=' \
  --header 'content-type: application/json' \
  --data '{
 "query": {
  "term": {
   "text_field": "The"
  }
 }
}'

Both of the queries produce no results.

The first query produced no result because, in the Inverted Index, we never stored the entire sentence, the indexing process only stores the terms that have already chunked from the text.

The second query also produced no result. There is a “The” in the indexed document, but remember that the analyzer lower-cased the word, so in Inverted Index, it is stored as “the”

Let’s try the Term Query again with “the”:

curl --request POST \
  --url 'http://localhost:9200/text-vs-keyword/_doc/_search?pretty=' \
  --header 'content-type: application/json' \
  --data '{
 "query": {
  "term": {
   "text_field": "the"
  }
 }
}'

Yep! it produced a result because queried “the” is an exact match with the “the” in the Inverted Index.

Querying text type with a match query

Now it’s time for text type with Match Query, since it analyzes both types it is easy to get them to produce results. Let’s try with two queries first

The first query will send “The” to the Elasticsearch, we know that with term query it produces no result, but what about match query?

The second query will send “the LAZ dog tripped over th QUICK brown dog,” some words are in the Inverted Index, some are not, will the Elasticsearch produce any result from it?

curl --request POST \
  --url 'http://localhost:9200/text-vs-keyword/_doc/_search?pretty=' \
  --header 'content-type: application/json' \
  --data '{
 "query": {
  "match": {
   "text_field": "The"
  }
 }
}'

curl --request POST \
  --url 'http://localhost:9200/text-vs-keyword/_doc/_search?pretty=' \
  --header 'content-type: application/json' \
  --data '{
 "query": {
  "match": {
   "text_field": "the LAZ dog tripped over th QUICK brown dog"
  }
 }
}'

Yep! Both of them produced a result

{
    "_index": "text-vs-keyword",
    "_type": "_doc",
    "_id": "example",
    "_score": 0.39556286,
    "_source": {
        "keyword_field": "The quick brown fox jumps over the lazy dog",
        "text_field": "The quick brown fox jumps over the lazy dog"
    }
}

The first query produced a result because “The” in the query was analyzed and became “the” which is the exact match with the one in the Inverted Index.

The second query, while not all the terms are in the Inverted Index, still produces a result. Elasticsearch will return a result, even if only one of the terms queried exactly matches the one in the Inverted Index.

If you pay attention to the result, there is a _score field. How many of the query’s terms that are an exact match with the one in the Inverted Index is one of the things that affects the score, but let’s save calculating score for another day.

When to Use One or the Another

Use keyword field data type if:

  • You want an exact match query
  • You want to make Elasticsearch function like other databases
  • You want to use it for wildcard query

Use text field data type if:

  • You want to create an autocomplete
  • You want to create a search system

Conclusion

Understanding how text and keyword field data types work is one of the things that you will want to learn in Elasticsearch, the difference seems simple but will matter a lot.

You will want to understand and choose the field data type suitable for your use case, if you want both field data types then you can use Multi Fields feature when creating the mapping.

Lastly, I hope this article helps you in learning Elasticsearch and understanding the differences between text and keyword field data type in Elasticsearch. Thanks for reading!