ElasticSearch for dummies, or why you can’t find what you just indexed

So, let’s say you are starting to use ElasticSearch. You have created a new index and put some data there. Then you go and try to search by one of the fields… and can’t find a thing.

How can this happen?

Well, one of the reasons is that you probably don’t have mappings.

If you just created an index with no specific settings, then mappings are inferred, as well as analyzers. There are three ways of analyzing a field:

  • analyzed. This is the default. It means that field content will be analyzed and available for full-text search.
  • not_analyzed. This means that field content won’t be processed, but it will be searchable. It will be stored “as is” and you will be able to query it by value.
  • no. It means the field won’t be indexed and you can’t use it for search.

So, the default value is analyzed, which means that all the fields are searchable by default. Well then, why aren’t they searched?

To answer the question, one should understand how analyzers work, which isn’t always obvious. You can use a query to test a standard analyzer:

POST /_analyze
{
"analyzer": "standard",
"text": "Text to analyze"
}

For example, you had a field value which is Base-64 encoded. You search using this value:

POST /mytestindex/_search
{
    "query": {
        "term": {
            "base64Id": "LTIxMDk4NzY3NTQ="
        }
    }
}

Nothing happens! Well fine. Now let’s see how the field is analyzed. This is how we check what a default analyzer actually does:

POST /_analyze
{
    "analyzer": "standard",
    "text": "LTIxMDk4NzY3NTQ="
}


// output
{
    "tokens": [
        {
            "token": "ltixmdk4nzy3ntq",
            "start_offset": 0,
            "end_offset": 15,
            "type": "",
            "position": 0
        }
    ]
}

Guess what, the standard analyzer tokenized it! You see that the token has no special chars and is lowercased. Well, what if we try to search by this token?

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 1.4054651,
        "hits": [
            {
                "_index": "mytestindex",
                "_type": "mytestfeed",
                "_id": "2109876754",
                "_score": 1.4054651,
                "_source": {
                    "id": 2109876754,
                    "base64Id": "LTIxMDk4NzY3NTQ=",
                    "name": "Entity #2109876754"
                }
            }
        ]
    }
}

Great. Now we found it. But that’s not what we wanted right? It’s an ID field, it should be queried as is.

Well then – not_analyzed is your friend. You can set/update index mapping either when creating an index or at some later time.

Unfortunately if you do it when the index already has data, you will most certainly have to reindex it all. So, it is better to think of the mappings you want in advance!

Advertisements

About Maryna Cherniavska

I have productively spent 10+ years in IT industry, designing, developing, building and deploying desktop and web applications, designing database structures and otherwise proving that females have a place among software developers. And this is a good place.
This entry was posted in ElasticSearch, Programming, Uncategorized and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s