Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Something I've noticed about chat ais vs direct search is that because a chat ai is a blackbox, I can't dig into an answer's source at all.

With a search, I can read the wikipedia sources or I know the forum, or the poster. But with an AI, it's a dead end and it sucks. I can ask people how they know something, but AI training data is invisible. It doesn't know how it knows something.

There is this oracle with a 10% chance of being wrong, and if I'm not already an expert in what it's talking about I have no idea of when that is, and no way to dig into things. It's the only source of truth, and never gives me any other threads or rabbit holes to go down.

The only recourse is asking follow up questions, so you're trapped in there in a learning box entirely in the bot's control. Not sure how I feel about that. I like that google sends me other places so I get exposed to different things.



A great observation, and I share the feeling.

From some other AI demonstrations, I recall there's usually a bunch of surface-level tags with probabilities associated that are produced alongside the output. Not sure how this looks for GPT-3, but if it could provide - alongside the answer - a list of top N tokens or concepts with associated probabilities, with N set to include both those that drove the final output and those that barely fell below threshold - that would be something you could use to evaluate the result.

In the example from the article, imagine getting that original text, but also tokens-probability pairs, including: "Hobbes : 0.995", "Locke : 0.891" - and realizing that if the two names are both rated so highly and so close to each other, it might be worth it to alter the prompt[0] or do an outside-AI search to verify if the AI isn't mixing things up.

Yes, I'm advocating exposing the raw machinery to the end-users, even though it's "technical" and "complicated". IMHO, the history of all major technologies and appliances show us that people absolutely can handle the internal details, even if through magical thinking, and it's important to let them, as the prototypes of new product categories tend to have issues, bugs, and "low-hanging fruit" improvements, and users will quickly help you find all of those. Only when the problem space is sufficiently well understood it makes sense to hide the internals behind nice looking shells and abstractions.

--

EDIT: added [0].

[0] - See https://news.ycombinator.com/item?id=33869825 for an example of doing just that, and getting a better answer. This would literally be the next thing I'd try if I got the original answer and metadata similar to my example.


Here's an example of that with a smaller BERT model: https://pair.withgoogle.com/explorables/fill-in-the-blank/


"view source"... Oh how I miss thee.


There's plenty of retrieval-based models that do cite sources. They just didn't want to deal with it for this release.[1] I'm sure it's already on the roadmap.

[1] In fact, some snooping suggests they specifically disabled that feature, but do have it in test environments. See the "browsing disabled" flag they have in the hidden prompt. That could easily be used for citations. Source: (https://twitter.com/goodside/status/1598253337400717313)


You're not trapped in there, because you're entirely free to go and research yourself. You can look up what it's telling you.

It's no more trapping than talking to a stranger who seems to be knowledgeable about a subject but doesn't hand you a list of references.


Not exactly though. With a human stranger, I can still stereotype based on their appearance, background, accents, etc. and apply whatever mental adjustments as taught to me by my societal upbringing. With an "AI" bot, the "strangers" are faceless people who curated the training sets and wrote the obscure statistical algorithms.


I'm not sure "yes but I can judge them on their appearance and accent" is a great reason, but regardless you could view it the same as an internet comment if you want.


> With an "AI" bot, the "strangers" are faceless people who curated the training sets and wrote the obscure statistical algorithms.

I think this is a feature over:

> I can still stereotype based on their appearance, background, accents, etc. and apply whatever mental adjustments as taught to me by my societal upbringing.


> AI training data is invisible. It doesn't know how it knows something

You should be accustomed to being surprised by AI. There is of course a new kind of transformer that takes input a query and outputs document IDs. Like a search engine retriever and ranker all packed into a neural net, very fast and efficient. So you can take any paragraph generated by the model and attribute it to the training set. This could be used to implement verification or retrieval augmented generation.

A Neural Corpus Indexer for Document Retrieval

https://arxiv.org/abs/2206.02743


It's safe to assume it's always wrong. Most of the code I've had it write so far has minor bugs. In some ways, it's like a child that has access to immense knowledge, it's happy to make mistakes as it tries to establish connections, some of which are surprising and interesting.


Given that most people never check the source of what they read, this is really scary. Because now everyone has the ability to write and say things that sound plausible and likely to be convincing, and the truth will be harder to access.


I think everyone has been able to tell convincing lies for quite some time before language models even existed.


but now its almost zero effort activity.


> Something I've noticed about chat ais vs direct search is that because a chat ai is a blackbox, I can't dig into an answer's source at all.

Did you try asking it for a source?


ChatGPT deflects questions about sources automatically, inherent in it's pre/post prompt-processing, on purpose. If you try to ask for a source it explains it is a Large Language Model and it is not connected to the internet and thus it cannot give you sources for its information other than it was trained on a large amount of information from the internet originally. It then says that if it were finding sources it would check to make sure they are reputable.

It is a decision from OpenAI to intervene and give this disclaimer. IMO this is one of the worst parts of this phase of the tech- it is way too confident an then when presses it currently doesn't have the ability to cite sources, because that simply ins't how deep learning works on a model like this.


I tried many ways but it will not reveal sources.

> As an AI assistant, I do not have access to external sources. I am a large language model trained by OpenAI, and my answers are based on the knowledge and information that I have been trained on. I do not have the ability to browse the internet or access external sources of information. My goal is to provide accurate and helpful answers based on the knowledge I have been trained on, but I cannot provide sources or citations for the information I provide.


This is called Hallucinations in the world of NLP. https://youtu.be/dtLsrLoopl4

ChaTGPT Hallucinations


I think another important aspect overlooked is that there probably will be AI engine optimization consultants once any AI engine gains popularity - similar to Search Engine Optimziation consultants. The original Google page rank system worked well in the beginning - prior to SEO - but now is largely not used.

AI engine optimization consultants will figure out how to game the system - likely targeting the training data sources.


So instead of gaming backlinks, you’re gaming volume of content online (ie. training data)


> because a chat ai is a blackbox, I can't dig into an answer's source at all.

It's not quite the same thing, but I've been impressed by the results when you ask ChatGPT to back up its answers or provide concrete examples for things it claims or to explain a point. While it doesn't manage all the time, it has surprised me multiple times with how good it is at this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: