Instead of deleting the index and breaking search for 3.5 minutes every day, couldn’t you just store the last index date with each document, and use `delete_by_query` to delete documents not updated in the latest run? Or alternatively, put that date in index names (mdn_YYYYMMDD for example) and use index aliases (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html) to point clients to the current index? Both are simple solutions, they don’t cost much resource-wise, and having slightly stale data (how often are pages actually deleted?) for a few minutes is better than no data at all.
Comment
Instead of deleting the index and breaking search for 3.5 minutes every day, couldn’t you just store the last index date with each document, and use `delete_by_query` to delete documents not updated in the latest run? Or alternatively, put that date in index names (mdn_YYYYMMDD for example) and use index aliases (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html) to point clients to the current index? Both are simple solutions, they don’t cost much resource-wise, and having slightly stale data (how often are pages actually deleted?) for a few minutes is better than no data at all.
Replies
The idea of using aliases is discussed here: https://github.com/mdn/yari/issues/3098