How Bitset Enables Versatility of Vector Search

Various new essential features of a vector database are provided together with the release of Milvus 2.0. Among the new features, Time Travel, attribute filtering, and delete operations are correlated as these three features are achieved by one common mechanism: bitset.

Therefore, this article aims to clarify the concept of bitset in Milvus and explain how it works to support delete operations, Time Travel, and attribute filtering with three examples.

What Is Bitset?

A bitset is an array of bit numbers (“0” and “1”) that can be used to represent certain data information. With bitsets, you can store certain types of data compactly and efficiently as opposed to storing them in Ints, floats, or chars. Bitsets work on boolean logic, according to which the value of output is either valid or invalid, usually denoted by “1” and “0” respectively. “1” stands for valid, and “0” for invalid. Since bitsets are highly efficient and can save storage, they can also be used to achieve many features such as attribute filtering, delete operations, Time Travel, and more.

Starting from version 0.7.0, the concept of bitset has been introduced in Milvus to enable the delete function. More specifically, the bitset is used to mark if each row in the segment is deleted. Deleted entities are marked with “1” in the corresponding bitset, and as a result, the deleted entities will not be computed during a search or query.

In the Milvus 2.0 version, the application of bitset is extended to enable more features, like attribute filtering and Time Travel. The general principle in a bitset remains the same. That is, if an entity is marked with “1” in the corresponding bitset, the entity will be ignored during a search or query. Bitsets are used to enable 3 features in Milvus:

  • Attribute filtering
  • Data deletion
  • Query with Time Travel

How Does Bitset Work in Milvus?

The examples below are used to illustrate how bitset works in Milvus.

Prerequisites

Suppose there is a segment with eight entities and a series of data manipulation language (DML) events that happen in the order shown in the figure below.

  • Four of the entities, whose primary_keys are [1, 2, 3, 4] respectively, are inserted when the timestamp ts equals 100.
  • The remaining four entities, whose primary_keys are [5, 6, 7, 8]are inserted when the timestamp ts equals 200.
  • Entities whose primary_keys are [7, 8] are deleted when the timestamp ts equals 300.
  • Only entities, whose primary_keys are [1, 3, 5, 7]satisfy the conditions of attribute filtering.

Order of DML events

Case One

Suppose the value a user sets for time_travel is 150. In other words, the user conducts a query on the data stored in Milvus when ts = 150. The bitset generation process is illustrated in Figure 1 below.

During the initial filtering stage, the result of the filter_bitset should be [1, 0, 1, 0, 1, 0, 1, 0] as entities [1, 3, 5, 7] are valid filtering results and marked as “1” in the bitset. However, entities [4, 5, 6, 7] were not even inserted into the vector database when ts equals 150. Therefore, these four entities should be marked as “0” regardless of the filtering condition. Now the bitset result should be [1, 0, 1, 0, 0, 0, 0, 0]. Since in Milvus, the general principle of bitset computing is that entities marked with “1” in the bitset are ignored during a search or query, the bitset result after Time Travel and attribute filtering needs to be flipped in order to be combined with the deletion bitmap. The flipped result of filter_bitset should be [0, 1, 0, 1, 1, 1, 1, 1].

As for the deletion bitset del_bitsetthe initial value should be [0, 0, 0, 0, 0, 0, 1, 1]. However, entities 7 and 8 are not deleted until ts is 300. Therefore, when ts is 150, entities 7 and 8 are still valid. As a result, the del_bitset value after Time Travel should be [0, 0, 0, 0, 0, 0, 0, 0].

Now we have two bitsets after Time Travel and attribute filtering: filter_bitset [0, 1, 0, 1, 1, 1, 1, 1] and del_bitset [0, 0, 0, 0, 0, 0, 0, 0]. Combine these two bitsets with the “OR” binary logic operator. The ultimate value of result_bitset is [0, 1, 0, 1, 1, 1, 1, 1]. That is to say, only entities 1 and 3 will be computed in the following search or query stage.

Figure 1. Search with Time Travel = 150.

Case Two

Suppose the value the user sets for time_travel is 250. In other words, the user conducts a query on the data stored in Milvus when ts = 250. The bitset generation process is illustrated in Figure 2 below.

Like in case one, the resultant filter_bitset of the initial attribute filtering stage should be [1, 0, 1, 0, 1, 0, 1, 0].

All entities [1, 2, 3, 4, 5, 6, 7, 8] are inserted to the vector database when ts= 250. Therefore, the previous result of filter_bitset remains the same. Again, we need to flip the result of the filter_bitsetand we will get [0, 1, 0, 1, 0, 1, 0, 1].

As for the deletion bitset del_bitsetthe initial value should be [0, 0, 0, 0, 0, 0, 1, 1]. However, entities 7 and 8 were not deleted until ts is 300. Therefore, when ts is 250, entities 7 and 8 are still valid. As a result, the del_bitset value after Time Travel should be [0, 0, 0, 0, 0, 0, 0, 0].

Now we have two bitsets after Time Travel and attribute filtering: filter_bitset [0, 1, 0, 1, 0, 1, 0, 1] and del_bitset [0, 0, 0, 0, 0, 0, 0, 0]. Combine these two bitsets with the “OR” binary logic operator. The ultimate value of result_bitset is [0, 1, 0, 1, 0, 1, 0, 1]. That is to say, only entities [1, 3, 5, 7] will be computed in the following search or query stage.

Figure 2. Search with Time Travel = 250.

Case Three

Suppose the value the user sets for time_travel is 350. In other words, the user conducts a query on the data stored in Milvus when ts = 350. The bitset generation process is illustrated in Figure 3 below.

We see the same as in cases one and two: the resultant filter_bitset of the initial attribute filtering stage is [0, 1, 0, 1, 0, 1, 0, 1]. All entities [1, 2, 3, 4, 5, 6, 7, 8] are inserted into the vector database when ts= 350. Therefore, the final flipped result of the filter_bitset is [0, 1, 0, 1, 0, 1, 0, 1]the same as in case two.

As for the deletion bitset del_bitsetsince entities 7 and 8 are already deleted when ts=350, therefore, the result of del_bitset should be [0, 0, 0, 0, 0, 0, 1, 1].

Now we have two bitsets after Time Travel and attribute filtering: filter_bitset [0, 1, 0, 1, 0, 1, 0, 1] and del_bitset [0, 0, 0, 0, 0, 0, 1, 1]. Combine these two bitsets with the “OR” binary logic operator. The ultimate value of result_bitset is [0, 1, 0, 1, 0, 1, 1, 1]. That is to say, only entities [1, 3, 5] will be computed in the following search or query stage.

Figure 3. Search with Time Travel = 350.

What’s Next?

In this series of blogs, we have previously introduced the design of data deletion in Milvus 2.0. In the upcoming articles in this series, we will introduce the design of data compaction, and dynamic load balance in Milvus 2.0. Please stay tuned.

.

Leave a Comment