Building an eye-tracking keyboard that works at conversational speed

Lately, I’ve been working on something new - an eye-tracking keyboard designed for everyday communication. Eye-tracking keyboards are remarkable devices. They give people who can’t speak the ability to communicate. I can think of few things more terrifying than being fully conscious and aware, but unable to express yourself. Eye-tracking keyboards offer a bridge to those who would otherwise be cut off from the world. In practice, however, they are painfully slow and difficult to use. ## The speed gap Good eye-tracking requires specialized hardware. The gold standard devices - dedicated eye-tracking cameras from companies like Tobii - cost thousands of dollars, which puts them out of reach for many people who need them. Software-based eye tracking is improving, and in some cases is now built directly into operating systems. But the accuracy is still low, while most eye-tracking keyboards are designed with hundreds of tiny tap targets that require precise gaze accuracy. This makes reliable communication using software-based tracking extremely difficult. And then there’s the problem of speed. A decent typist on a physical keyboard can write around 50-80 words per minute. Even highly skilled eye-tracking users often average 15-30 words per minute. This may be enough for writing a message, but conversation is different. Speech is fluid, dynamic, and *fast*. Most conversations happen around 150 words per minute. A delay of even a few seconds can mean missing the moment - the timing of a joke, a chance to disagree, or the opportunity participate naturally. ## Predicting what you’re going to say The core idea behind **EyeSpeak** is pretty simple. Instead of selecting letters or even words, what if you could select entire sentences? Think of it a little like the dialog options in a role-playing game. Instead of seeing a keyboard, the interface shows you a few very large, simple gaze targets - maybe only five or six at a time. Each one represents a full response. If none of them fit, you give the system a hint, and it regenerates new options based on your intent. Over time, the system could learn more about you - how you speak, what you are interested in, who you are speaking to, and your personality. What might start off as a generic prompt could eventually sound like you. And if you have voice samples from before your loss of speech, it could even synthesize your own voice so that the responses *literally* sound like you. This is what excites me - the idea that we might be able to restore conversation, not just communication, to anyone who has lost the ability to speak. ## The challenges - a.k.a. The reality Predicting what someone wants to say is hard. My early prototypes work just well enough to make me feel like this is possible, but the predictions are still rough, slow, and expensive to generate. The goal I have set out - conversational level speed even with low-quality eye tracking - requires the predictions to be both accurate but to sound like you, and it’s hard to even figure out how close we are to hitting that target. What I’ve seen so far makes me confident that it is possible to achieve this, however! ## Privacy, consent, and control To make good predictions, we need context - what you’re talking about, who you’re speaking to, and access to past conversations. Getting consent from those you are speaking to, and being able to disable the prediction systems when consent is not granted, will be a challenge we need to address. It also means we need to provide a good enough keyboard experience when we don’t have access to advanced prediction models. The other challenge we need to solve for is making sure that you feel comfortable providing an AI with this data. This is why I feel like open-source is the only way to build this kind of solution. It allows us to build something that you can inspect, so you can understand exactly what the tool is doing with your data. It allows you to self host and to specify the AI models you want to work with. In the future, I’d love for everything to run fully offline and on-device with local language models. That would make it private, cheap, and available everywhere. ## Who I need help from I’d love to hear from anyone who finds this idea interesting - your encouragement, skepticism, questions, and any feedback you have. I’m especially hoping to connect with people who: - Rely on eye-tracking daily - Support or care for someone who does - Work in speech therapy, accessibility tech, UX design, or linguistics - Have experience designing similar systems - Have familiarity with language models or conversational AI Your input could have a meaningful impact on this project, and I’d love to hear from you. If so, please feel free to start a discussion in our GitHub project or to reach out to me directly: - Email: [email protected]