How to Fix AWS OpenSearch Inconsistent K-NN Results Beyond Segment Replication

Select Language:

If you’ve been performing k-NN vector searches in Amazon OpenSearch Service and notice that the number of results varies even when running the same query under the same conditions, you’re not alone. This can be frustrating, especially when consistency is important for your applications.

Here’s what’s happening. When you run identical k-NN searches using the HNSW algorithm, you might see different numbers of hits depending on whether your request is routed to a primary shard or a replica. Even if you specify preferences like _primary, _replica, or leave the preference unspecified, the result counts can differ. This inconsistency arises because each shard, including replicas, builds its own HNSW graph independently. Since the graph construction isn’t deterministic, small differences can lead to variations in search results.

In practical terms, this means your graph structures might vary slightly between primary and replica shards, resulting in different traversal paths during the search process. This can cause the number of results, or even the top results, to differ slightly between shards.

One effective way to achieve consistent results is to use segment replication. With segment replication, the primary shard creates and indexes the data, including the HNSW graph, and then the replicas simply copy this exact segment. This ensures that all shard copies share an identical graph structure, leading to more consistent search outcomes.

However, segment replication isn’t without its drawbacks. It can increase network usage during replication, introduce some delay if replicas are slow to catch up, and add extra load on the primary node since it handles building and maintaining the index segments.

While adjusting parameters like m, ef_search, and ef_construction can improve the recall and accuracy of your searches, they don’t make the graph construction deterministic. Therefore, tweaking these settings alone may not fully solve the inconsistency issue.

If you’re seeking other ways to reduce this variability without resorting to segment replication, options are limited. The core challenge is the inherent non-determinism in how the graphs are built during index creation. Unfortunately, as of now, segment replication remains the most reliable method for achieving consistent search results across shards.

In summary, for the most stable and predictable k-NN search results in Amazon OpenSearch Service, implementing segment replication is our best recommendation. Beyond that, adjusting search parameters may help, but won’t completely eliminate discrepancies caused by graph construction differences.