We have a family friend who is blind and a programmer. It's interesting to hear his perspective. His hope and expectation are that it will greatly increase usability.
I've been thrown into the usability deep end due to my wife also losing her sight due to an autoimmune disorder, and my dad losing his sight due to Macular Degeneration. Honestly, it sucks, and I mean like rage quitting, phone throwing sucks. (Try it. Turn on voice assist and close your eyes.) If Apple can improve it through AI, where someone can just talk to the phone to do a series of tasks. It will honestly change everything. The number of aging people who are going to lose their vision in the U.S. is set to go up exponentially in the coming years. This could be an unprecedented win for them, if they solve this issue with AI.
A few days ago, OpenAI released live video integration with Advanced Voice mode for ChatGPT—point your phone at something and ask what it sees, and it will tell you pretty accurately. I thought it was just a cool trick until I read the top comment on their YouTube video announcement: “I'm screaming. As a visually impaired person, this is what I was eagerly waiting for. Still screaming! Thank you, Sam, Kev and the entire team over at OpenAI.”
Google released a similar feature with Gemini 2.0 last week. While it doesn’t seem to be integrated with a smartphone app yet (at least on iOS), it can be used through the AI Studio browser interface.
I don't have experience with this kind of problem. But I don't think GenAI is the best tool for this, at least not until it's so rock-solid trustworthy that everyone uses such an interface. Even leaving aside AI questions, if I'm looking for a human personal assistant for someone who's blind, and that person will have unlimited access to their electronic life, I'm going to vet that person very, very carefully.
My point is that the user is adding another layer of abstraction, and that layer of abstraction needs itself to be trusted. When UI elements are really concrete and you can clearly see that you pressed a particular button and the thing you wanted happened, then the UI layer, at least, is a nonissue.
But in retrospect I don't know if my point was that good. The UI problem hasn't actually been solved, and an LLM-based chatbot may actually be more reliable for non-tech users since the user has to do less translation.
Sorry to hear about what is happening in your family.
I think your perspective is spot on. VUI (voice user interfaces) will absolutely change the way we interact with computers. After all, talking comes naturally to humans.
The digit divide (old people, very young people, illiterate) still exists. And will likely get bigger if VUIs don't get wide spread adoption.
> Sorry to hear about what is happening in your family.
Non-sequitur, but I cannot be the only person to find this sort performative empathy odd/out of place in this the context of HID accessiblity discussion.
While I use LLMs I also consider myself an LLM skeptic in terms of its role in upending the world and delivering the value promised by the folks hyping it up most aggressively.
However, using ChatGPT voice mode and considering the impacts on accessibility, especially if that quality of interactive voice functionality is able to be integrated well into the operating systems of devices we use every day, is very exciting.
in order to cure Macular Degeneration we have to develop many different technologies that can be used for power control, it's inevitable as our history shows cyclical nature and behaviors of humans are predefined throughout the history because conceptually the same ideas and thoughts are being encoded and rehashed and decoded by newer generations.
LLM-based AI is not needed, or even useful. We know how to make voice interfaces that work, and work well: have done since the 80s. It's just expensive; and it's an expense that nobody in the industry is willing to pay, therefore nobody needs to do it in order to differentiate their product.
What you're missing is that AI solves the expense problem. As the OS vendor you already have an overview and easy access to all interfaces that you expose and it's straightforward to feed that into an integrated AI agent. Add a bit of glue code here and there and a simple implementation is nearly free. Of course, the real value lies in ironing out all the edge cases, but compared to doing all of that manually, it should still be orders of magnitude cheaper.
It's not, because "ironing out all the edge-cases" is orders of magnitude more expensive than just designing a system without edge-cases in the first place. What's cheap is getting away with not bothering: but then you end up with a tech demo, rather than a usable product.
I've been thrown into the usability deep end due to my wife also losing her sight due to an autoimmune disorder, and my dad losing his sight due to Macular Degeneration. Honestly, it sucks, and I mean like rage quitting, phone throwing sucks. (Try it. Turn on voice assist and close your eyes.) If Apple can improve it through AI, where someone can just talk to the phone to do a series of tasks. It will honestly change everything. The number of aging people who are going to lose their vision in the U.S. is set to go up exponentially in the coming years. This could be an unprecedented win for them, if they solve this issue with AI.