Mar 25
Server-side transcription + polish chaining for hosted batch models (eliminate post-stop latency)
Current behaviour in batch or non-realtime mode. When I use a hosted batch voice model such as Ultra or plain Scribe I speak then stop recording. The full audio file is sent to Superwhisper servers. The servers return the raw transcription back to my Mac. My Mac then sends that raw text in a separate request to the hosted polish language model. Only after that second step does the final polished text return and trigger Paste Result Text plus Hold Shift to Auto-Send.
This creates a noticeable 2 to 3 second delay after I stop speaking even though both the transcription and the polish model are fully hosted by Superwhisper.
What I would like. When a hosted batch voice model is paired with a hosted language or polish model the server should immediately pipe the raw transcription output directly into the polish model on the same servers. Only the final polished text should be sent back to the client in one single response.
This would eliminate or dramatically reduce the post-stop delay while keeping all existing polish prompts exactly as they are today.
Why it matters. Right now the only truly instant option is to disable polish entirely but many of us want the cleaning to remove fillers fix grammar and punctuation add blank-line paragraphs etc.
This is especially valuable for the common LLM or chat workflow long dictation into a Chrome or browser text box then stop recording then immediately switch focus to other apps on my desktop.
It would also be more efficient on Superwhisper’s servers one less network hop.
Nice-to-have. A per-mode toggle called “Server-side polish chaining when available” so users who prefer the current behaviour can keep it.
This feels like a natural optimization for hosted batch models. Happy to provide logs or do a quick screen-share if it helps.
Pending
