tl;dr ...


SELECT 
  indexrelname, 
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size 
FROM pg_stat_user_indexes 
WHERE relname LIKE 'search_terms%';

I'm working on a new table that is "search terms". A search term is constructed from extracting combinations of words, from titles, on my blog. For example, if the title is "Inspecting the index size in PostgreSQL" the search terms for that are:

  • inspecting
  • inspecting index
  • inspecting index size
  • inspecting index size postgresql
  • index
  • index size
  • index size postgresql
  • size
  • size postgresql
  • postgresql

It's created by lower casing the input and taking one, two, three, and four non-stopwords.

To search these in PostgreSQL I use trigrams using pg_trgm. For example,


SELECT id, term
FROM search_terms
WHERE
  term LIKE 'conf%'
  OR term LIKE '% conf%'
LIMIT 10

There's more to it, but this is a good start.

Anyway, I was curious how much space this is going to take up.
By indexing trigrams, you're indexing a lot of small individual strings. So lots of combinations.

The index is defined like this:


CREATE INDEX search_terms_term_trgm 
ON search_terms 
USING GIST (term gist_trgm_ops) 

After inserting a bit over 10,000 of these "search terms" the size of the index becomes:


peterbecom=#
SELECT
  indexrelname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE relname LIKE 'search_terms%';
      indexrelname      | index_size
------------------------+------------
 search_terms_pkey      | 248 kB
 search_terms_term_key  | 488 kB
 search_terms_term_trgm | 856 kB
(3 rows)

Curiously, if you set a higher siglen (e.g. GIST (term gist_trgm_ops(siglen=24))), it actually becomes smaller.

Conclusion(?)

If anything, it's not that large. In this particular example, there are only 10k of these smaller word combinations. It would be curious to see what the numbers become at 100k and 1M.

Comments

Your email will never ever be published.

Previous:
Adding client-to-server sync to PissueTracker March 20, 2025 React, JavaScript, Bun
Next:
How to SSG a Vite SPA April 26, 2025 Web Performance, React, JavaScript, Bun
Related by category:
The 3 queries I use with pg_stat_statements to analyze slow PostgreSQL queries September 30, 2024 PostgreSQL
Select all relations in PostgreSQL December 10, 2015 PostgreSQL
How much faster is Redis at storing a blob of JSON compared to PostgreSQL? September 28, 2019 PostgreSQL
How to sort case insensitively with empty strings last in Django April 3, 2022 PostgreSQL
Related by keyword:
The correct way to index data into Elasticsearch with (Python) elasticsearch-dsl May 14, 2021 Python, MDN, Elasticsearch
DateIndex in Zope doesn't have indexed attributes October 28, 2007 Zope