From Voice Dictation to Voice Interface: Product Engineering at Wispr

written by

Sahaj Garg

CTO, Wispr Flow

Date

Sep 25, 2025

READ TIME

3 mins

From Voice Dictation to Voice Interface: Product Engineering at Wispr

Our goal at Wispr is to make interacting with your devices feel as effortless as talking to a close friend. We want people to have a voice interface that they trust, that’s capable, and that really understands them on the first try.

Wispr’s approach to building voice interfaces is centered around building sticky habits that fit right into people’s lives — this is absolutely necessary to build interfaces that stick and avoid the tarpit of companies that have tried and failed to build voice assistants. Our first product, Wispr Flow, focuses entirely on voice dictation for this reason: it’s sticky. People type 100 times a day on their computers - when we can speak, in place of typing, it creates an incredibly powerful habit.

For Wispr Flow, our primary goal is to build habits and trust. We already see this happening — our median user (over time) starts to speak more and more to their computers, to the point where 6 months in, they only use keyboards for 28% of the input they do on their computers. We’re able to start breaking the mental model that voice doesn’t work, and give users something that is reliable on the first try. We still have a long way to go here: as users trust voice more and more, they push its limits, which leads to many of the R&D challenges outlined here.

‍

‍

As we build this habit and user trust, we want to expand Wispr Flow into a tool that people can use for all of the ways in which they interface with their computer. We can go from building the one habit that people do 100 times a day, to the 10 things they do 10 times a day, and eventually build an extension store (once we define how voice UX should work) for the 100 things people might do once a week.

We plan to start by solving the workflows that people do frequently. Some illustrative examples below:

Sending messages in the background. Ever opened your work messaging tool to send a message, see 20 un-reads, and immediately forget what you planned to do? That happens to me every day. Voice interactions make this intuitive an straightforward - but to make this work, it has to succeed on the first try. If it contacts the wrong Matt even 20% of the time, on the wrong platform, I’d be so frustrated as a user that I wouldn’t bother using it.
Polishing and editing what people write. One of the most common use cases for ChatGPT is to copy-paste an email, dictate a reply, go back and forth to improve it, and finally send it. This workflow takes me out of my flow, and is such a primitive way to use AI. We want all of these workflows to be embedded right in a user’s flow, and to be personalized to the way in which they communicate (no more AI slop!).
Asking questions without context switching. In the middle of work and want to understand some terms in the context of what you’re reading? Get a cryptic message? On a UI that’s confusing that you don’t know how to navigate? Why switch apps — voice interaction on that page (across your device) can make using your computer feel frictionless.
Communicating without speaking precisely. So many times, we know what we want to say, but not quite how to say it. This might take a little bit of back and forth with a tool to help craft your brilliant idea into a usable piece of communication, but finally lets you share your insight with the world.
Communication coaching. So many people have the goal of upleveling their communication at any given time — and better communication makes people more effective and fulfilled across the board. Given we slot right into how people communicate, Flow can help people get the outcomes they want from what they’re sharing.

There are SO many ways we hear that people want to use voice interfaces — these are just five of the hundreds of workflows we’ve identified.

You’ll notice we skipped the most common demo of a voice assistant: booking an Uber or flight with voice. We’ve built that before, and while it seems cool, it’s basically…. useless? People book flights once in a while, and using voice doesn’t make the workflow much faster (especially if you have a strong preference). Instead, we focus on the above because they’re workflows that people repeat many times a day, and where our understanding of how people want to communicate allows us to build a better product than anyone else in the world.

When we look at people interacting with their devices today, what we see is effort, strain, distraction, and lack of presence. It’s cumbersome. There’s so much context switching. Software grabs our attention and distracts us. It’s so far from how we relate to the people around us.

If you’d like to help us make voice interfaces a reality and invent the next generation of HCI, reach out at jobs.wisprflow.ai!

From Voice Dictation to Voice Interface

We plan to start by solving the workflows that people do frequently. Some illustrative examples below:

Start flowing