It's really code red for the Web Google and the Web are tightly coupled. The old...

acdha · on Dec 24, 2022

> I'm not sure Google can ever truly keep up with the volume of spam and low quality.

Are they really even trying? I see low-quality scraper domains which have ranked highly in their search results for years but never seem to be de-ranked despite just displaying content from GitHub or Stack Overflow. What those sites have in common is that they’re loaded with ad words ads, which suggests to me that there’s less willingness to act against them unless profits dip.

doctor_eval · on Dec 24, 2022

Yeah I don’t think they are trying any more. I received a Google search result the other day that was a word for word copy of a SO answer - it even had “stack overflow” in the page title. The spammers weren’t even trying to hide the fact.

Nevertheless, the result ranked a couple of lines higher than the SO article it was a copy of. I mean, come on.

acdha · on Dec 24, 2022

The amazing part is that you’ll probably get the same site a month or two from now. I am disappointed that Google doesn’t have a blocklist feature & you’d think they’d want the training data (although I imagine abuse could be a severe challenge).

geysersam · on Dec 24, 2022

They could at least let their users blacklist sites from their personal search results.

There are extensions that do this though, thank God.

alanning · on Dec 24, 2022

Which extensions would you recommend for this?

numpad0 · on Dec 24, 2022

uBlacklist allows manual blacklisting from Google Search. Somewhat useful for me.

oefnak · on Dec 24, 2022

uBlock origin in combination with a list like this one:

https://github.com/levymetal/filter-lists

wolpoli · on Dec 24, 2022

One possible reason that Google doesn't offer a blocklist is that Google only provides a limited number of results per query. Given Google typically shows more than one result from the same domain, so with a blocklist, it would actually have to devote resource to find results from different domains.

MomoXenosaga · on Dec 24, 2022

Considering the lack of competition I can't blame them for not trying. Why spend the money? Nobody is going to use Bing lol.

doctor_eval · on Dec 24, 2022

You’re right of course, despite the fact that I personally use Bing indirectly via DDG.

That said, my personal anecdata is that, a few years ago, g! Google search results would often find things that DDG didn’t - but today, those results rarely help (hence my OP).

I am not super confident DDG has got better, but it certainly feels like Google’s quality has diminished.

Valakas_ · on Dec 27, 2022

Of course we can blame them for not trying. Why spend the money? Because it's the right thing to do for humanity. But hey Google is not a charity, what matter is stock growth, therefore we're excused of throwing ethics and empathy out of the window as long as in the end shareholders have the comfortable monopoly which allows their investments to generate more wealth with minimum effort.

more_corn · on Dec 24, 2022

This is surprisingly wise. Lack of viable competition led google to become decadent. They’re now facing a change in the environment that threatens their business model. Will they change or die? Probably a combination of the two.

acdha · on Dec 24, 2022

That definitely worked for a decade but I have to say it’s rare that I find better results on Google than Duck Duck Go, which is a big change.

rightbyte · on Dec 24, 2022

Google could probably easealy manually with like 10 FTE cut the SEO spam we see alot.

It takes time to get got ranking and it takes like 5 minute to 5s research to manually downrank a site. It would be a ever winning battle for Google.

But they don't care at all.

With blacklist extensions you can improve Google search alot with almost no effort.

spaceman_2020 · on Dec 24, 2022

They don’t care because they knew they were the only real option on the market. Outside of nerdy circles, no one knows about about duckduckgo, for instance. They also have a stranglehold on the developing world market through the dominance of Android and the fact that smartphones are the only computing device for most of these users.

Plus, poor search results means more searches and clicks, which means more revenue.

The day chatGPT can roll up its product in an easy to use Android app, that’s the day Google would be truly scared.

pydry · on Dec 24, 2022

It's a metrics driven beast beset by interdepartmental rivalries. Even with competition I don't think they'd solve this problem.

It mirrors Amazon's absolute lack of care about fraud and spam on their platform.

spaceman_2020 · on Dec 24, 2022

Very good chance that it goes the way of Kodak - they have access to better tech, but it never gets implemented because it eats into their existing business.

more_corn · on Dec 24, 2022

They used to have a team called search quality. It was well respected inside the company and probably the reason google was good. I hear it has been disbanded.

scotty79 · on Dec 24, 2022

I wonder if splitting Google into two companies could help. One doing just search and one doing just ads, with the condition that payments between them must be fixed and independent of the volume of anything.

acdha · on Dec 24, 2022

I think so: Google‘s stagnation started after the Doubleclick merger. Getting the ad people not to be calling the shots seems like a key step for the long-term future of the company.

P5fRxh5kUvp2th · on Dec 24, 2022

> I see low-quality scraper domains which have ranked highly in their search results for years but never seem to be de-ranked despite just displaying content from GitHub or Stack Overflow.

I flat don't even bother doing "X vs Y" type searches anymore when looking for a compare/contrast with two things. It's just not useful anymore for exactly the reasons you've stated.

kabes · on Dec 24, 2022

The key is to add 'reddit' to your 'X vs Y' search query. That's all Google is to me these days. My Reddit search engine.

rerdavies · on Dec 24, 2022

I'd guess that 50% of my google searches have 'wiki' in them. Thanks for the tip about adding 'reddit'.

SkyPuncher · on Dec 24, 2022

Surely ChatGPT will be subject to the same issues that Google is.

* Right now, ChatGPT has a disclaimer that it doesn't know much about the world since 2021. This implies much of its training set excludes recent data.

* ChatGPT is brand new. Nobody has had a chance to reverse engineer it or game it.

* Most "search" is relatively simple queries. There's only so much you can do to differential a bunch of sites that offer the exact same facts.

I suspect, ChatGPT will reduce latency in its training data and people will figure out how to rank well against the algorithm. Then, it will be no different than just another search engine.

spaceman_2020 · on Dec 24, 2022

It’s a matter of scale. What percentage of market share does chatGPT have to take away from Google for Google shareholders to panic?

What does a 10% drop in market share does to Google’s stock price? What does a steep drop do to employee compensation? What does a drop in employee compensation at Google do to the rest of the tech industry?

JacobThreeThree · on Dec 24, 2022

Yeah, if anything Google is better positioned thanks to their years of experience dealing with spammers who try everything to game the system.

mejutoco · on Dec 24, 2022

That is a very insightful comment. I wonder if openai would offer the different checkpoints for the models, so at least the 2021 would permanently be unaffected by attempts to game it.

nprateem · on Dec 24, 2022

It's results are based on probabilities. It must surly be much harder to game except perhaps for very new words/slang etc.

seydor · on Dec 24, 2022

The web is bloated. Hopefully openAI will pay directly the information providers. Which also means that the information will not make it to the public web. Which is just as well, after 30 years of giving out our information for free thinking we are contributing to some democratized web, we ended up with the web being 4 monopolies which compete for locking down access

vineyardmike · on Dec 24, 2022

> Hopefully openAI will pay directly the information providers

It’d be nice if they tell the information providers they used the data. I think expecting them to pay is wishful thinking.

tangjurine · on Dec 24, 2022

I was thinking this a while back, but Google should have put a significant fraction of its advertising revenue towards the websites it directs to, similar to YouTube.

I keep hearing that the web is dying and that organic content keeps being replaced by ad farms.

Maybe if it was possible to make a living off of small site content like YouTubers do the web would be more resilient and people would use it more instead of going to wall gardens like Facebook and Instagram, and Google would have had more cash in the long run.

The current state of the web is like if every YouTube video needed to have a sponsored ad in it to make money, and Google put its own ad on top of that.

thrashh · on Dec 24, 2022

I don’t think SEO destroyed the web

Rather, it seems people prefer videos

There’s now plenty of YouTube videos on a lot of topics.

Wikipedia cleans house with the rest

Honestly there is access to way more good content today than 10 or 20 years ago. Just maybe not in the same form 10 or 20 years ago

unsupp0rted · on Dec 24, 2022

I sometimes have to support Windows machines.

When I Google things for Windows that require 3 bullet-point sentences to answer, inevitably I get a 12-minute video of an Indian guy with an accent that prevents me from watching at 2x audio comprehensibly, taking 6 minutes to tell me how common the problem is and 6 minutes teaching me how to download and install spyware that does what I want and much more.

P5fRxh5kUvp2th · on Dec 24, 2022

I love when you google for something and in order to stretch the content out 80% of it is completely unuseful, barely relevant, content.

"How do you change the sandpaper in a sander" and most of the article is explaining why sander X is the best sander on the market.

ilaksh · on Dec 24, 2022

I wonder if the OpenAI models have video transcripts in their training.

Or if GPT-4 or -5 (whatever upcoming model) will understand video or visual information and it's relation to text.

Over the next few years, with grounded language understanding and other capabilities, no one will be able to pretend that these systems aren't intelligent. I mean, some people always will, but it's going to be a very small percentage. Right now I am guessing we are about 50-70% of people convinced that this stuff is cognitive rather than regurgitative.

But also I think the abilities of these models clarify the nature of intelligence and the relationship between intelligence, compression and computation.

dageshi · on Dec 24, 2022

Yeah, the hobbyists that used to make niche sites on subjects they loved have all moved to youtube for publishing their content.

They were the people who made the web truly great back in the day.

bottlepalm · on Dec 24, 2022

It’s a death to the part of the web with information to drive ads. It’s a rebirth of a web before everything was crazy ad driven.

visarga · on Dec 24, 2022

Interesting perspective.

amelius · on Dec 24, 2022

Perhaps OpenAI should provide an API such that web-browsers can instantly look up any text and see if it was created by one of their GPT models. Then browsers can highlight and mark the text as such.

Perhaps it should even be a law that any AI should record its own output for later queries.

beagle3 · on Dec 24, 2022

That’s about as effective as checking if a picture was made by DALL-E2 - for a month or two, it might work; but soon enough there will be 5 commercial ones and soon after you would be able to self host.

vicentwu · on Dec 24, 2022

ChatGPT trained by current methods as a search tool might have the "spamdexing" problem too. People can still find ways, which might not be as easy as it is now, to generate content and make them be included in the training data.

But I think LLMs like GPT can be used as a great weapon against spamdexing because of their ability to "understand" text, which can help improve nowadays search engines, like Google, a lot by applying them as a filter when the spiders are crawling the web content.