We have a family friend who is blind and a programmer. It's interesting to hear ...

tkgally · on Dec 16, 2024

A few days ago, OpenAI released live video integration with Advanced Voice mode for ChatGPT—point your phone at something and ask what it sees, and it will tell you pretty accurately. I thought it was just a cool trick until I read the top comment on their YouTube video announcement: “I'm screaming. As a visually impaired person, this is what I was eagerly waiting for. Still screaming! Thank you, Sam, Kev and the entire team over at OpenAI.”

https://www.youtube.com/live/NIQDnWlwYyQ

Google released a similar feature with Gemini 2.0 last week. While it doesn’t seem to be integrated with a smartphone app yet (at least on iOS), it can be used through the AI Studio browser interface.

https://news.ycombinator.com/item?id=42394998

MobiusHorizons · on Dec 16, 2024

Is this feature somehow different than what Google has had with lens and what Apple has had with the info button in regular photos for a while now?

tkgally · on Dec 16, 2024

It uses the live video feed, and you can talk with the LLM.

jprete · on Dec 16, 2024

I don't have experience with this kind of problem. But I don't think GenAI is the best tool for this, at least not until it's so rock-solid trustworthy that everyone uses such an interface. Even leaving aside AI questions, if I'm looking for a human personal assistant for someone who's blind, and that person will have unlimited access to their electronic life, I'm going to vet that person very, very carefully.

unethical_ban · on Dec 16, 2024

I don't understand the point.

Apple users already let apple (or at least their device) know everything about them.

If a person is blind and can't read or type onto their phone, a tool that can reliably pull up messages app and send Dad a letter is a godsend.

jprete · on Dec 16, 2024

My point is that the user is adding another layer of abstraction, and that layer of abstraction needs itself to be trusted. When UI elements are really concrete and you can clearly see that you pressed a particular button and the thing you wanted happened, then the UI layer, at least, is a nonissue.

But in retrospect I don't know if my point was that good. The UI problem hasn't actually been solved, and an LLM-based chatbot may actually be more reliable for non-tech users since the user has to do less translation.

BrandiATMuhkuh · on Dec 16, 2024

Sorry to hear about what is happening in your family.

I think your perspective is spot on. VUI (voice user interfaces) will absolutely change the way we interact with computers. After all, talking comes naturally to humans.

The digit divide (old people, very young people, illiterate) still exists. And will likely get bigger if VUIs don't get wide spread adoption.

DiggyJohnson · on Dec 16, 2024

  <<<<
  digit divide
  ====
  digital divide
  >>>>

For some reason I spent a few minutes trying to understand the digit divide before realizing it was a typo.

I do think VUI as a concept is in its infancy and will (like it or not) both hasten and address the decline of written communication.

pxmpxm · on Dec 16, 2024

> Sorry to hear about what is happening in your family.

Non-sequitur, but I cannot be the only person to find this sort performative empathy odd/out of place in this the context of HID accessiblity discussion.

dpig_ · on Dec 17, 2024

Non-sequitur, but you seem like someone who says 'virtue-signalling' unironically.

mrtranscendence · on Dec 16, 2024

You sound like a fun guy.

brandon272 · on Dec 16, 2024

While I use LLMs I also consider myself an LLM skeptic in terms of its role in upending the world and delivering the value promised by the folks hyping it up most aggressively.

However, using ChatGPT voice mode and considering the impacts on accessibility, especially if that quality of interactive voice functionality is able to be integrated well into the operating systems of devices we use every day, is very exciting.

buryat · on Dec 16, 2024

in order to cure Macular Degeneration we have to develop many different technologies that can be used for power control, it's inevitable as our history shows cyclical nature and behaviors of humans are predefined throughout the history because conceptually the same ideas and thoughts are being encoded and rehashed and decoded by newer generations.

MobiusHorizons · on Dec 16, 2024

Is this generated by AI? Also how does power control or history cycles have anything to do with curing macular degeneration?

buryat · on Dec 17, 2024

yeah, I was thinking about something and got carried away, apologies

wizzwizz4 · on Dec 16, 2024

LLM-based AI is not needed, or even useful. We know how to make voice interfaces that work, and work well: have done since the 80s. It's just expensive; and it's an expense that nobody in the industry is willing to pay, therefore nobody needs to do it in order to differentiate their product.

Etheryte · on Dec 16, 2024

What you're missing is that AI solves the expense problem. As the OS vendor you already have an overview and easy access to all interfaces that you expose and it's straightforward to feed that into an integrated AI agent. Add a bit of glue code here and there and a simple implementation is nearly free. Of course, the real value lies in ironing out all the edge cases, but compared to doing all of that manually, it should still be orders of magnitude cheaper.

wizzwizz4 · on Dec 16, 2024

It's not, because "ironing out all the edge-cases" is orders of magnitude more expensive than just designing a system without edge-cases in the first place. What's cheap is getting away with not bothering: but then you end up with a tech demo, rather than a usable product.