Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's really code red for the Web

Google and the Web are tightly coupled. The old contract of if you wrote good content, it would rank highly on Google, was a convenient, organic economic one. You would be rewarded with visitors to your site. Google rewarded with better search results. Readers with good content.

This unfortunately created a race to the bottom.

The intense, years-long effort to DDOS Google search with spammy, low quality, but seemingly good looking SEO-curated content has destroyed the Google and Web experience. I'm not sure Google can ever truly keep up with the volume of spam and low quality. Google tried to react by putting more information on its search results page, but alas, this just meant the content creators valued their destinations that much less.

The prevailing digital marketing firm wisdom created a tragedy of the commons and a crisis in quality web content. "Put a pop up here".. "Add 3 pages of boilerplate to mention all the right keywords" etc etc. When people really want to read informational text in a non-obtrusive format.

ChatGPT does this simple thing well (informational text without the glaring headaches of random websites). So it wins in these contexts.



> I'm not sure Google can ever truly keep up with the volume of spam and low quality.

Are they really even trying? I see low-quality scraper domains which have ranked highly in their search results for years but never seem to be de-ranked despite just displaying content from GitHub or Stack Overflow. What those sites have in common is that they’re loaded with ad words ads, which suggests to me that there’s less willingness to act against them unless profits dip.


Yeah I don’t think they are trying any more. I received a Google search result the other day that was a word for word copy of a SO answer - it even had “stack overflow” in the page title. The spammers weren’t even trying to hide the fact.

Nevertheless, the result ranked a couple of lines higher than the SO article it was a copy of. I mean, come on.


The amazing part is that you’ll probably get the same site a month or two from now. I am disappointed that Google doesn’t have a blocklist feature & you’d think they’d want the training data (although I imagine abuse could be a severe challenge).


They could at least let their users blacklist sites from their personal search results.

There are extensions that do this though, thank God.


Which extensions would you recommend for this?


uBlacklist allows manual blacklisting from Google Search. Somewhat useful for me.


uBlock origin in combination with a list like this one:

https://github.com/levymetal/filter-lists


One possible reason that Google doesn't offer a blocklist is that Google only provides a limited number of results per query. Given Google typically shows more than one result from the same domain, so with a blocklist, it would actually have to devote resource to find results from different domains.


Considering the lack of competition I can't blame them for not trying. Why spend the money? Nobody is going to use Bing lol.


You’re right of course, despite the fact that I personally use Bing indirectly via DDG.

That said, my personal anecdata is that, a few years ago, g! Google search results would often find things that DDG didn’t - but today, those results rarely help (hence my OP).

I am not super confident DDG has got better, but it certainly feels like Google’s quality has diminished.


Of course we can blame them for not trying. Why spend the money? Because it's the right thing to do for humanity. But hey Google is not a charity, what matter is stock growth, therefore we're excused of throwing ethics and empathy out of the window as long as in the end shareholders have the comfortable monopoly which allows their investments to generate more wealth with minimum effort.


This is surprisingly wise. Lack of viable competition led google to become decadent. They’re now facing a change in the environment that threatens their business model. Will they change or die? Probably a combination of the two.


That definitely worked for a decade but I have to say it’s rare that I find better results on Google than Duck Duck Go, which is a big change.


Google could probably easealy manually with like 10 FTE cut the SEO spam we see alot.

It takes time to get got ranking and it takes like 5 minute to 5s research to manually downrank a site. It would be a ever winning battle for Google.

But they don't care at all.

With blacklist extensions you can improve Google search alot with almost no effort.


They don’t care because they knew they were the only real option on the market. Outside of nerdy circles, no one knows about about duckduckgo, for instance. They also have a stranglehold on the developing world market through the dominance of Android and the fact that smartphones are the only computing device for most of these users.

Plus, poor search results means more searches and clicks, which means more revenue.

The day chatGPT can roll up its product in an easy to use Android app, that’s the day Google would be truly scared.


It's a metrics driven beast beset by interdepartmental rivalries. Even with competition I don't think they'd solve this problem.

It mirrors Amazon's absolute lack of care about fraud and spam on their platform.


Very good chance that it goes the way of Kodak - they have access to better tech, but it never gets implemented because it eats into their existing business.


They used to have a team called search quality. It was well respected inside the company and probably the reason google was good. I hear it has been disbanded.


I wonder if splitting Google into two companies could help. One doing just search and one doing just ads, with the condition that payments between them must be fixed and independent of the volume of anything.


I think so: Google‘s stagnation started after the Doubleclick merger. Getting the ad people not to be calling the shots seems like a key step for the long-term future of the company.


> I see low-quality scraper domains which have ranked highly in their search results for years but never seem to be de-ranked despite just displaying content from GitHub or Stack Overflow.

I flat don't even bother doing "X vs Y" type searches anymore when looking for a compare/contrast with two things. It's just not useful anymore for exactly the reasons you've stated.


The key is to add 'reddit' to your 'X vs Y' search query. That's all Google is to me these days. My Reddit search engine.


I'd guess that 50% of my google searches have 'wiki' in them. Thanks for the tip about adding 'reddit'.


Surely ChatGPT will be subject to the same issues that Google is.

* Right now, ChatGPT has a disclaimer that it doesn't know much about the world since 2021. This implies much of its training set excludes recent data.

* ChatGPT is brand new. Nobody has had a chance to reverse engineer it or game it.

* Most "search" is relatively simple queries. There's only so much you can do to differential a bunch of sites that offer the exact same facts.

I suspect, ChatGPT will reduce latency in its training data and people will figure out how to rank well against the algorithm. Then, it will be no different than just another search engine.


It’s a matter of scale. What percentage of market share does chatGPT have to take away from Google for Google shareholders to panic?

What does a 10% drop in market share does to Google’s stock price? What does a steep drop do to employee compensation? What does a drop in employee compensation at Google do to the rest of the tech industry?


Yeah, if anything Google is better positioned thanks to their years of experience dealing with spammers who try everything to game the system.


That is a very insightful comment. I wonder if openai would offer the different checkpoints for the models, so at least the 2021 would permanently be unaffected by attempts to game it.


It's results are based on probabilities. It must surly be much harder to game except perhaps for very new words/slang etc.


The web is bloated. Hopefully openAI will pay directly the information providers. Which also means that the information will not make it to the public web. Which is just as well, after 30 years of giving out our information for free thinking we are contributing to some democratized web, we ended up with the web being 4 monopolies which compete for locking down access


> Hopefully openAI will pay directly the information providers

It’d be nice if they tell the information providers they used the data. I think expecting them to pay is wishful thinking.


I was thinking this a while back, but Google should have put a significant fraction of its advertising revenue towards the websites it directs to, similar to YouTube.

I keep hearing that the web is dying and that organic content keeps being replaced by ad farms.

Maybe if it was possible to make a living off of small site content like YouTubers do the web would be more resilient and people would use it more instead of going to wall gardens like Facebook and Instagram, and Google would have had more cash in the long run.

The current state of the web is like if every YouTube video needed to have a sponsored ad in it to make money, and Google put its own ad on top of that.


I don’t think SEO destroyed the web

Rather, it seems people prefer videos

There’s now plenty of YouTube videos on a lot of topics.

Wikipedia cleans house with the rest

Honestly there is access to way more good content today than 10 or 20 years ago. Just maybe not in the same form 10 or 20 years ago


I sometimes have to support Windows machines.

When I Google things for Windows that require 3 bullet-point sentences to answer, inevitably I get a 12-minute video of an Indian guy with an accent that prevents me from watching at 2x audio comprehensibly, taking 6 minutes to tell me how common the problem is and 6 minutes teaching me how to download and install spyware that does what I want and much more.


I love when you google for something and in order to stretch the content out 80% of it is completely unuseful, barely relevant, content.

"How do you change the sandpaper in a sander" and most of the article is explaining why sander X is the best sander on the market.


I wonder if the OpenAI models have video transcripts in their training.

Or if GPT-4 or -5 (whatever upcoming model) will understand video or visual information and it's relation to text.

Over the next few years, with grounded language understanding and other capabilities, no one will be able to pretend that these systems aren't intelligent. I mean, some people always will, but it's going to be a very small percentage. Right now I am guessing we are about 50-70% of people convinced that this stuff is cognitive rather than regurgitative.

But also I think the abilities of these models clarify the nature of intelligence and the relationship between intelligence, compression and computation.


Yeah, the hobbyists that used to make niche sites on subjects they loved have all moved to youtube for publishing their content.

They were the people who made the web truly great back in the day.


It’s a death to the part of the web with information to drive ads. It’s a rebirth of a web before everything was crazy ad driven.


Interesting perspective.


Perhaps OpenAI should provide an API such that web-browsers can instantly look up any text and see if it was created by one of their GPT models. Then browsers can highlight and mark the text as such.

Perhaps it should even be a law that any AI should record its own output for later queries.


That’s about as effective as checking if a picture was made by DALL-E2 - for a month or two, it might work; but soon enough there will be 5 commercial ones and soon after you would be able to self host.


ChatGPT trained by current methods as a search tool might have the "spamdexing" problem too. People can still find ways, which might not be as easy as it is now, to generate content and make them be included in the training data.

But I think LLMs like GPT can be used as a great weapon against spamdexing because of their ability to "understand" text, which can help improve nowadays search engines, like Google, a lot by applying them as a filter when the spiders are crawling the web content.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: