One optimization you could make is to send the sentences as they come. You could watch for periods, when a sentence is complete, send it to the llm and get the response. Then when you hit the edit button it has results you can view almost immediately and the ones that havent finished yet can stream in
Comment
I bet that's how Grammarly does things. Lots of clever caching of tokenized sentences because it's "just" a (very fancy) application on top of an LLM.
(In my case, I don't actually correct blog post comments until long after they've been created)
Parent comment
One optimization you could make is to send the sentences as they come. You could watch for periods, when a sentence is complete, send it to the llm and get the response. Then when you hit the edit button it has results you can view almost immediately and the ones that havent finished yet can stream in