Learn Advanced Full-Text Searches With MongoDB Atlas Search | by Lynn Kwong | Apr, 2022

Photo by Daniel Lerman in Unsplash.

As we have learned in the previous post, we can run full-text search queries with MongoDB Atlas Search now. If you are implementing a search engine and haven’t decided which tool to use, MongoDB Atlas Search can be a good alternative to traditional tools like Elasticsearch. In this post, we will introduce some advanced settings and search queries for MongoDB Atlas Search. You will find that they are pretty similar to their counterparts in Elasticsearch. This is because, under the hood, Apache Lucene is used as the core for both Elasticsearch and MongoDB.

It should be noted that MongoDB Atlas Search is only available if your MongoDB databases are hosted by MongoDB Atlas. It’s not available if you manage your MongoDB servers locally.

We will continue to use the laptops collection in the products database we’ve been working on because the product names and attributes are good examples for demonstrating full-text searches.

If you haven’t used MongoDB Atlas before, it’s recommended to have a quick start with this article. And if you want to follow along, please download this JSON file and use the following command to import the data to MongoDB Atlas. If you don’t have mongoimport installed yet, please follow this link to download and install the MongoDB Database Tools.

Note that you need to change the username, password, and cluster name for your own case. If you have network connection issues, remember to check if your IP address is added to the IP Access List in Atlas UI.

When the above command is run, you will have the products database and laptops collection created automatically in MongoDB Atlas, which contains 200 laptop documents as follows:

Let’s now create an Atlas Search Index that will be used for full-text searches. You can create an Atlas Search Index with the Atlas UI, Atlas Search API, or MongoDB CLI. The simplest way is to use the Atlas UI because you don’t need to specify the cluster metadata such as public/private keys, cluster name, group id, etc.

In the Atlas UI, find your organization, project, and cluster. If you have only one cluster, then it’s already shown there when you open the Atlas UI. Click the cluster name to open the control panels where you can find the “Search” tab. Click it to open the page for creating Search Indexes:

Now click the “Create Search Index” button to create one Search Index for our laptops collection in the products database. A page like this will be opened:

It’s recommended to choose the “JSON Editor” because you can have more advanced configurations there. Besides, almost all the documentation of MongoDB Atlas Search uses JSON configurations. Therefore, it’s better to get familiar with the JSON settings from the very beginning.

We will use static mapping in this advanced tutorial. In practice, it’s also recommended to use static mapping so you can choose which fields to be indexed and how they will be indexed. You can have advanced settings like autocomplete and synonyms with static mapping. The mapping document for the laptops collection is:

Some important notes regarding the index definitions:

  • The analyzer is used when the documents are indexed, and searchAnalyzer is used to analyze the search queries. They are normally the same. The default and also most commonly used analyzer is lucene.standard which lowercases a string and splits it into tokens by space and punctuations. Besides, common stop words are removed such as the, for, thatetc.
  • We specify that dynamic is falseso we need to explicitly specify the mapping for each field.
  • The name field is indexed with two types, one is the regular stringand the other one is autocomplete, which supports search-as-you-type queries. The autocomplete type has its own settings, but the default ones should be enough in most cases.
  • The attributes field is an array of documents. Atlas Search requires only the data type of the array elements.

Importantly, please note that we need to create a separate collection in the same database for the synonym definitions. You can run the following commands with mongosh to create the synonyms collection:

Note that the synonyms collection should be created in the same database as the target collection to be searched, and have the same name as in the synonyms field in the index definition.

There are two types of synonyms, namely equivalent and explicit. With an equivalent type, all the synonyms are equivalent to each other and are interchangeable. However, with the explicit type, it’s one direction only, namely, only the words in the input field can be replaced by those in the synonyms field and not the other way around. Therefore, if you search for “Lenovo”, the laptops containing only “ThinkPad” but no “Lenovo” will be returned. However, if you search for “ThinkPad”, those containing only “Lenovo” but not “ThinkPad” will not be returned. We will see it with an example later.

After the synonyms collection is created, you can continue to create the Search Index as shown above.

Now that the Search Index has been created, we can proceed to create and run full-text search queries. With static mappings including advanced settings like autocomplete and synonyms, we cannot use the “Search Tester” in Atlas UI to run complex queries but need to use the aggregation pipelines with mongosh or a driver. We will use mongosh in this post because it is not limited to a specific programming language and thus is more generic.

Search 1: Use the autocomplete feature to search-as-you-type:

[
{ _id: 2, name: 'Lenovo IdeaPad Y700-15', score: 1 },
{ _id: 8, name: 'Lenovo ThinkPad T470s', score: 1 }
]

We use the autocomplete operator to perform search-as-you-type queries. The path field specifies the field to search against, which should have the autocomplete type defined as shown above. If you want to learn a bit more about the basic syntax for the MongoDB Atlas Search aggregation queries, please check this post.

Search 2: Use the autocomplete plus fuzzy search feature.

[
{ _id: 2, name: 'Lenovo IdeaPad Y700-15', score: 1 },
{ _id: 8, name: 'Lenovo ThinkPad T470s', score: 1 }
]

This query has the same result as the one above, which means that the fuzzy search and autocomplete features are working properly. Note that we specify the maxEdits and prefixLength options for fuzzy so we won’t get too many irrelevant results with this query.

Search 3: Search with synonyms.

Let’s first try to search with equivalent synonyms.

[
{ _id: 134, name: 'Apple MacBook Pro', score: 1.738951921463012 },
{ _id: 184, name: 'Apple MacBook Pro', score: 1.738951921463012 }
]

You can try to search with “MacBook”, “Macintosh” or “Mac” and will always get the same results because they are equivalent synonyms and are interchangeable.

Now let’s try to search with explicit synonyms:

[
{ _id: 97, name: 'ThinkPad T480' },
{ _id: 40, name: 'Lenovo T480' },
{ _id: 117, name: 'Lenovo ThinkPad T480' }
]
[
{ _id: 97, name: 'ThinkPad T480' },
{ _id: 117, name: 'Lenovo ThinkPad T480' }
]

These two examples demonstrate that “ThinkPad” can be searched by “Lenovo”, but not vice versa.

Search 4: Search with a phrase.

Sometimes we would want to search an ordered sequence of terms that must appear exactly as specified in the input query. This can be achieved with the phrase operator. Let’s search for “Lenovo T480” with the text and phrase operators separately and you will see the difference immediately:

[
{ _id: 40, name: 'Lenovo T480' },
{ _id: 117, name: 'Lenovo ThinkPad T480' },
{ _id: 97, name: 'ThinkPad T480' },
...
]
[ { _id: 40, name: 'Lenovo T480' } ]

Only one result is returned with the phrase operator as it requires the tokens in the search string to appear in the same order in the resulting documents.

Search 5: Combine multiple operators together.

Finally, let’s learn to use the compound operator to combine multiple operators together. If you have some background with Elasticsearch, you will see the syntax is pretty similar. We also use the must, mustNot, shouldand filter clauses in MongoDB Atlas Search.

Let’s find all the HP laptops that are still in stock:

[
{ _id: 200, name: 'HP ZBook 14u G6', quantity: 8 },
{ _id: 196, name: 'HP ZBook 14u G6', quantity: 4 }
]

Oddly, the equals operator can only work with boolean and objectId values. Therefore, we need to use the range operator to check if the quantity is 0.

Let’s write a more complex query to find laptops meeting the following conditions:

  • brand is not Apple.
  • still in stock.
  • memory is 32GB or storage capacity is 1TB

Whoa! It can get really complex for such a simple search issue. Similar to Elasticsearch, we can easily get into such issues (“bool/compound hell”) if some field contains an array of nested documents. However, it’s actually not that complex once you know the pattern. It’s just kind of cumbersome to write.

Key points for this compound search query:

  • The filter clause has the same effect as must. However, it’s not used to calculate the final search score. If you want to boost the score for some field, you would need to use the must clause and not filter. Also, it’s worth pointing out that mustNot does not contribute to the search score, either. It works like the negation of a filter clause.
  • The should clause, as the name indicates, specifies the conditions that should be met, and are thus optional. However, we can use the minimumShouldMatch Option to specify how many optional conditions should be met to return a result.
  • We can use the compound operator inside a compound operator. This is where things start to get complex. However, we should just keep in mind that the nested compound operator works exactly the same as the top-level one. It’s normally used for the fields that contain an array of nested documents, like the attributes field in this example.

In this article, we have demonstrated how to create a MongoDB Atlas Search Index with static mapping. Some special settings are introduced such as autocomplete for search-as-you-type search and search with synonyms. We have also introduced some common search queries, from basic to advanced, which can be adapted and used directly in your practical work.

For the searching of nested fields, namely those whose value is an array of documents, don’t get intimidated by the seemingly complex queries. As long as you know how the compound operator works, you can build powerful queries by yourself, using must, mustNot, shouldand filter clauses as the building blocks.

Leave a Comment