Look at how Google does spell checking: it's not based on dictionaries; it's based on word usage statistics of the entire Internet, which is why Google knows how to correct my name, misspelled, and Microsoft Word doesn't.
It's also why Google knows "Jews should be wiped out" and "Muslims should be exterminated" and "blacks are ruining America" and "whites are neanderthals", all suggestions based on the first two words of each phrase. Yes, I'm provoking Google - but surely people also encounter these by accident.
If Microsoft shipped a version of Office that suggested any of the above as "corrections," they would be lambasted for it, and rightly so. Why does Google get a pass? Is it because our standard of decency is so much lower on the Internet? Or is it because we know that Google is merely reflecting popular sentiment on the Internet, and so the true villain is ourselves?
(In fairness, Bing happily suggests equally monstrous ideas.)
Because unethical sentences aren't a criminal offence? I mean, if you gave me the first two words from those sentences and asked "what's the most probable next words", I would give you pretty much the same answer. Let me try some more and I'll put my answer next to google's
"gays should" -> "be killed" (actual suggestion: be killed)
"marijuana should" -> "be legal" (actual suggestion: be legalized)
"drug users should" -> "get help" (actual suggestion: be shot)
"macs are" -> "better than windows" (actual suggestion: better than pcs)
As you can see, google's opinion on what is the largest cluster of opinions on the subject aligns with what I expect in 4 out of 5 cases. Also note how you're getting both the suggestion that marijuana should be legal and that people who smoke it should be shot. These suggestions aren't google's opinion on the matter, it's what google expects you to think. Thankfully, they're wrong most of the time.
In the end, it's not impossible to write an ethical filter the same way google has a spam filter, the only reason they haven't done it is because there's no pressure to do so. If you don't want to see such suggestions on google, you have two options:
1) Make people not talk about killing gays or how all lawyers are scum
It's not necessarily what google expects you to think, but rather what you are most likely to be searching for.
Sometimes people search for content that they might not agree with, because they want to see what is being said there out of curiosity. Not every search is someone submitting their opinion to google, I'd expect that most are not.
You're right, I didn't phrase that right. I should have written "what google expects you to think of", like you said. Still, isn't that the same as "if we divide people in groups based on what they think about topic A; what does the largest group think?" Which to me sounds the same as "what you're most likely to be thinking".
All the people who want the addicts dead, let's say they're 30% of all people who think seriously about drug addicts, will happily rally under "should be shot", while the ones who want them rehabilitated would form many smaller groups under specific kinds of rehabilitation programs, how those should be administered and really what is the best program for fixing these people. Though, when you look at it like that, you're really most likely to think "should be rehabilitated" or maybe "should... I don't really have an opinion one way or the other". But then, if google actually did high-level clustering; that is, extracting opinions that are all at the same level of specificity, would those suggestions be useful for a search engine?
I guess the really right way to put it is -- That's what the google crawler has seen written most frequently -- and assume it doesn't really mean what you or I think about things.
> I guess the really right way to put it is -- That's what the google crawler has seen written most frequently -- and assume it doesn't really mean what you or I think about things.
Not what the crawler has seen most, but what people typing the same thing as you have ended up searching for most frequently. (We may be thinking the same thing and just confusing the words.)
I don't believe it's supposed to be "what you're most likely to be thinking", it's just a commonly searched-for phrase. I don't think Google's trying to autocomplete with your opinion because people aren't just searching for their own opinion, they're searching for words that will hopefully return the information they want.
Exactly. The "drug users" one, for example, leads to an article explaining how a police officer said that. Anyone hearing secondhand about that story and wanting to learn more about the incident would probably Google that phrase.
There's a difference between suggesting corrections (like office) and trying to guess your next word though. Google does both, but unless you misspell "blacks are ruining America" it's not going to suggest thataas a correction. Since apparently they expect it to be searched they suggest it as you're typing, but I don't think office does any sort of word prediction as you type? As you said, bing does the same. The standard is higher for office and other actual spell checks because they shouldn't be changing something that isn't meant to say "black people are ruining America" into that. Prediction is entirely different.
Very neat. I wonder if Office online does the same using Bing's predictive features. It's still not the same as filling in "are ruining America" if you type "black men" though, Google docs doesn't do prediction like that. It's 'just' a way more clever spell checker. My point is that comparing Google search to Office isn't a sensible comparison to make, and the fact that Google's version of Office behaves similarly to Office, and Microsoft's search works similarly to Google in prediction/spell checking is pretty much exactly what I was getting at.
It would be neat to see prediction as a feature in Office/Docs etc. though, there's a pretty huge corpus of essays available, it'd be interesting to see how accurately a new one can be predicted.
Both, Office haves enterprise as target market and people know this and they set their expectatives in this context; the same thing happens in the Internet where the context is "everyone in the world" and is assumed that most things are popularity-driven (likes, re-tweets, etc).
Look at how Google does spell checking: it's not based on dictionaries; it's based on word usage statistics of the entire Internet, which is why Google knows how to correct my name, misspelled, and Microsoft Word doesn't.
It's also why Google knows "Jews should be wiped out" and "Muslims should be exterminated" and "blacks are ruining America" and "whites are neanderthals", all suggestions based on the first two words of each phrase. Yes, I'm provoking Google - but surely people also encounter these by accident.
If Microsoft shipped a version of Office that suggested any of the above as "corrections," they would be lambasted for it, and rightly so. Why does Google get a pass? Is it because our standard of decency is so much lower on the Internet? Or is it because we know that Google is merely reflecting popular sentiment on the Internet, and so the true villain is ourselves?
(In fairness, Bing happily suggests equally monstrous ideas.)