tl;dr; fvh is marginally faster than unified and unified is a bit faster than plain.

When you send a full-text search query to Elasticsearch, you can specify (if and) how it should highlight, with HTML tags, highlights. E.g.


The correct way to index data into <mark>Elasticsearch</mark> with (Python) <mark>elasticsearch</mark>-dsl

Among other configuration options, you can pick one of 3 different highlighter algorithms:

  1. unified (default)
  2. plain
  3. fvh

The last one, fvh, requires that you index more at index-time (in particular to add term_vector="with_positions_offsets" to the mapping). In a previous benchmark I did, the total document size on disk, as described by http://localhost:9200/_cat/indices?v grew by 38%.

I bombarded my local Elasticsearch 7.7 instance with thousands of queries collected from logs. Some single-word, some multi-word. The fields it highlights are things like title (~5-50 words) and body (~100-2,000 words).
Basically, I edited the search query by testing one at a time. For example:


search_query = search_query.highlight(
-   "title", fragment_size=120, number_of_fragments=1, type="unified"
+   "title", fragment_size=120, number_of_fragments=1, type="plain"
)

...etc.

After doing 1,000 searches 3 different times per each highlighter type option, and recording the times it took I recorded the following:

(milliseconds per query, lower is better)

UNIFIED:
  MEAN  18.1ms
  MEDIAN 19.0ms

PLAIN:
  MEAN  24.5ms
  MEDIAN 27.5ms

FVH:
  MEAN  16.1ms
  MEDIAN 17.6ms

Thin marginal win for fvh over unified.

Conclusion

Conclusion? Or should I say "Caveats" instead? There's a lot more to it than raw performance speed. In this benchmark, it takes ~20 milliseconds to search on 2 different indexes, each with a scoring function and indexes containing between 1,000 and 5,000 documents with hundreds of thousands of words. So it's pretty minor.

Each highlighter performs slightly differently too, so you'd have to study the outcome a bit more carefully to get a better feel for if it works the way you and your team prefer it to work.

If there's any conclusion, other than the boring usual "it depends on your setup and preferences", the performance difference is noticeable but not blowing you away. It makes sense that fvh is a bit faster because you've paid for it by indexing more upfront (the offsets) at the expense of memory.

Comments

Your email will never ever be published.

Related posts