Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Bots can complete CAPTCHAs quicker than humans (theregister.com)
71 points by magoghm on Aug 15, 2023 | hide | past | favorite | 100 comments


> New approaches are needed, like more dynamic approaches using behavioural analysis

Does this set off alarm bells for anyone else? Of course the best way to know if a visitor is a human or a bot is to deeply analyze their behavior. But that's at odds with the right of us humans not to be analyzed by every website we visit. What happens if we do reach a standoff where the bots become good enough at mimicking human behavior that the only way to tell us apart is unacceptable and illegal behavioral analysis?


I recently wondered whether the good reviews on WWW shopping sites, are actually written by the bots. The market for astroturfing is so competitive, that the paid reviews probably learned a long time ago you need to leave quality reviews to get repeat customers.

They also 'care' more than actual customers in many cases. Real customer -> "Stop sending me review reminders. It was a comb. Block." Bot -> "Dutifully review all kinds of products. 500 words on the life changing experience of hair brushing with this comb. A+ reviewer."

I find it difficult to believe that the bot networks would not have just immediately rolled every single generative AI advance into their networks (write convincing reviews, generate convincing product examples without buying, beat captchas more reliably, automated screen clicking, human eye scan impersonation). Need to be better than every other group doing paid reviews. Need to be better than actual humans. They might write critical reviews.

Also, lot of sites already doing some behavioral analysis. Popup every time you consider clicking 'leave' on websites lately? "Before you go..."


> I recently wondered whether the good reviews on WWW shopping sites, are actually written by the bots.

Probably, but if these are just like, Amazon reviews/etc, they likely violate FTC regulations. Enforcement is lacking, but I'd still be very hesitant to break the law.


Maybe the ultimate solution is to make people pay. As in microtransactions

Human or bot is not really the problem, spam is the problem, and bots makes spam so cheap that admins can't deal with it. So, bots are banned. Human spammers can still get in, and you can pay people to solve captcha, but humans are more expensive, so there are less of them and moderators can deal with them.

If we had people (or bots) pay a few cents to access a service, it could be enough to keep spam to a manageable level.

The problem is, people don't like to pay, and unlike with phone numbers, the web doesn't have a good microtransaction architecture so behavioral analysis it is.


Any payment system will be used to track and unmask users. Then they'll double dip selling your even better identified user data to whoever wants it. Probably while still showing you ads.

You can slowdown bots with proof of work. A crypto-miner seems like the only possible payment method that would resist tracking, I think Brave tried something similar. Not sure I like that idea!


I think “slowing down the bots with proof of work” is essentially what captchas were, back when they weee easy for us and hard for computers.

But now, when I see a captcha, I hit back, press unsubscribe, and find a new vendor. The work is harder for me than it is for a computer and so I won’t do it.

Accordingly, we can see that proof of work is the opposite of a solution.


Well... no. Proof of work doesn't mean you the human have to do work. Bitcoin is a proof of work system but you don't calculate the numbers by hand.

The question there is if an acceptable amount of work (e.g. cpu/gpu work) on a mobile phone is a large enough deterrent for a bot.


I’ve been rereading your post here and I think I’m coming around. Captcha is not proof that I did the work; it’s an inference that because I was able to do the work, it must have been easy for me (because I’m a human).

I let my my cryptoskepticism run a little too freely here. Thanks for making me think harder.


Yeah, I don't think anyone is going to pay 5 cents to see a website. That number is likely to get big real fast if you like news aggregation sites.


If the point is to stop spam it doesn't need to be implemented like that. One way to do it could be, pay $5 to create an account. Your money will be returned to you as you post on the site. If someone determines that you're spamming, any money that hasn't been returned to you is lost.


The idea is not to make people pay 5 cents to see a website. Instead replacing captchas with a toll for about how much it would cost to have a human solve the captcha for you.


If you just open HN once per day and open all promising links in a new window you're already out a lot of cents.

This will only work with mechanisms that let you pay like 0.05 cents. Should be enough to deter bots that practically run for free these days.

Too bad any intermediary will want 0.30 dollars per transaction for the 0.05 cents :)


I mean... you just batch those cents and have cash out limits. Preload accounts, etc.

* note that the word just does an obscene amount of lifting in that sentence.


There are already 'solve captcha as a service' sites all over, with highly developed APIs for their use. Lots of people use them for sneaker bots and ticket bots etc.

This is alosing idea out of the gate.


No fundamental reason a micro transaction has to be more than a tiny fraction of a penny.


How do you combine microtransactions with the need to be indexed by search engines?

Microtransactions solve the issue of bad bots, and possibly websites monetization. But then do you want to give free pass to search engine crawlers? The big ones will be strong enough to refuse to crawl your site if you don't. The small ones will be financially unable to crawl if you don't. If you allow them all, you're back to step 1. If you allow only one or a few, you basically freeze search engine innovation.


Isn't the big issue not bots reading but bots generating? You could allow bots in a read only fashion.


Spammers will have more disposable funds than me and more utility from making payments to spread their message than I will. Essentially this is more likely to exclude poor people while waving spammers on in through the ticket gate.

Not to mention credit card fees making sub $1 payments a no-go and crypto being it's own barrel of nightmares.


There are the recently announced hardware root of trust solutions[1] that aren’t behavior driven.

But they just trade one privacy issue for potentially another, depending on your view

It does seem sadly unavoidable. Perhaps the internet has to go full circle and we need real identities if we want to ensure we’re not talking to machines?

[1] https://blog.cloudflare.com/eliminating-captchas-on-iphones-...


It is not only a privacy issue, "certified by entrenched gatekeeper mega-corporation" is a nightmare for user freedom. The first ones who will be impacted are the minority who root their devices and compile their own software, but in the long run there are detrimental effects of a monopoly at the gate, like the ease to implement surveillance, censorship and drm, that will apply to everyone.


Yeah, at second glance it actually (as proposed - huge caveat) might even be better for privacy.

I’m can’t begin to theorize how the future will play out if you need a PAT to access most web destinations. Do cloudflare or Apple engineers use Linux machines ever? Surely they do, and either know this is bad or have some plan to make it work?


> Perhaps the internet has to go full circle and we need real identities if we want to ensure we’re not talking to machines?

If that's the case, then I'll be done using the web entirely.


Yeah, I think I might be too. I mean, I’m headed that way to a degree anyhow.

Although arguably most major web players already 100% know who you are - just maybe not your name.


> I’m headed that way to a degree anyhow

As am I, which is why it's not that big of a deal to take this stance.

> most major web players already 100% know who you are

I have no doubt about this, but there's also a whole internet full of others who I want to remain pseudonymous with. I've been using a handful of online identities for over 30 years now, and have never tied them to my real world identity.

The reasons for avoiding that hold more true now than ever before. Having to tie my online identities to my actual identity is unthinkable.


George Hotz is correct in that we all need our own AI. This is the only way to give us a fighting chance against bad actors.


It's the new Gun.

You don't need one, but if you don't have one, you're at a disadvantage against those who do.


How would that help?


He explains that it's like having a body guard. It will protect you from spam, psyop,scams, etc.


They are working on that: See the Web Integrity API. One of the goals was to separate the humans from the robots.


Then we require a phone number for everything (it's not easy to make unlimited new phone numbers) and use OIDC to authenticate to one of a couple providers. You won't be able to do anything on the internet without logging in first, but the login is safe at the identity provider.

If you think about it this is no different than showing your ID to get into a bar.


The problem is that this time, it's not a bar, it's just any store (or really, anywhere) you go.


and they're recording your ID number and sharing that with other stores to track your purchases.


I trust any bar more about privacy than anyone on the internet. Their incentive to maximize consent to stalk me and my behavior is close to non-existing.


Phone numbers are a cheap and reusable resource. Pushing the problem on another site with OIDC doesn't help either, if their CAPTCHAs have the same limitation.


"Real" cellular phone numbers are a very finite pool and require nontrivial amounts of money, a physical phone (with a burned-in hardware identifier), an in-person interaction, and government ID that validated against a state database.


"Real" cellular phone numbers are a very finite pool and require nontrivial amounts of money, a physical phone (with a burned-in hardware identifier), an in-person interaction, and government ID that validated against a state database.

You'd think so, but no.

I signed up for T-Mobile service early this year with no ID, and paid cash.

The store is so eager to complete the transaction that it keeps a government ID document in a drawer and the sales people whip it out whenever anyone looks queasy about providing information.

I didn't resist giving my information. All I did was pause because I wasn't sure if I brought my ID with me. Even that little hesitation was enough for the clerk to say, "Don't worry about it. I got you covered" and he pulled out the ID.

So I have a T-Mobile account that I can pay for with cash and no ID on file, and someone else's address.

Now, if a government was really interested in me, it could probably pull the security camera video or follow the signal around or whatever. But it turns out that KYC is easily bypassed when the incentives are right.


Some places might require all that but a lot don't need in-person, id card. And there's also sms verification services that charge you a few cents per verification, and they use "real" numbers.


In the US, you can walk in a T-Mobile store and get a SIM card for cash, no questions asked.

Also once you are sitting on a few dozens phone numbers, you can use them again and again to spam or abuse different services (possibly for sale). It's not like CAPTCHA solutions that you have to do every time.


> Then we require a phone number for everything (it's not easy to make unlimited new phone numbers)

There are services that allow you to verify for as low as $0.03/activation and their stock is massive and diverse so that's not a solution


It is though. It's how every major company validates new customers without captcha.


> What happens if we do reach a standoff where the bots become good enough at mimicking human behavior that the only way to tell us apart is unacceptable and illegal behavioral analysis?

Sophisticated bots are already good enough at this that a variety of behavioral-based bot analysis tools exist and are in semi-widespread use. They're not illegal.


Can you give any examples of these tools?



It's also short-sighted because any behavior analysis that is stored in a database somewhere could be used to train a new AI model.



Thankfully, soon we will have the web integrity API to verify that a visitor is human.

Apple devices already support something like this when connecting to websites behind cloudflare and fastly, and as cloudflare explains this "vastly improves privacy by validating without fingerprinting"[1].

https://blog.cloudflare.com/eliminating-captchas-on-iphones-...


Please tell me you are not honestly cheering that atrocity on.


I tried to register for twa^Hitter yesterday and couldn't fill in all the captcha it was throwing at me. First it was some innocent point a train to station a letter shown in picture on the left and after I barely completed that it was throwing at me 6 pictures of different string knots where I was supposed to point out which picture has two strings or maybe just one. Funny thing is that after first 5 it throws 20 more and after 40 or 50 filled in total I just gave up - no twitter for me I guess and that is probably for the best.


The best one is NPM. You have to pick two identical icons that are overlayed on other images. But if the username you picked is taken then the whole form resets, and you have to solve the captcha again. There is no way to check if a username is taken before solving the CAPTCHA, even though npm usernames are public.


I almost never get cat has on the first try any longer. It seems obvious that AIs will get better than humans at these things. It’s just a question of model cost and timing.


> [it was] 6 pictures of different string knots where I was supposed to point out which picture has two strings or [one]

What?! I've never seen this, is this proprietary to The Site Formerly Known As Twitter?

Users with cognitive impairments would struggle with this I speculate. (All humans, really)


Same thing happened to me yesterday. I switched to my phone after it said 20 more, and it verified without a captcha.


You appear to have trouble spelling xitter...


You're _seriously_ not missing anything


> twa^Hitter

genuinely, what did you mean by this?


They made the joke of having first typoed the name of the company as "Twatter", before fixing it up to "Twitter". Visualizing a backspace in text as ^H is an old joke rooted in the backspace control character being ascii 08, which also maps to Ctrl-H.

The joke is intended to be funny because "twat" is a vulgar and generally derogatory term, and the author almost but not quite applied it to either a large company or (transitively) to its users.

I hope explaining the joke made it even funnier!


Thanks for the excellent explanation!


ah, I was unfamiliar with the connotation between ctrl-H and backspace.


Another cliched^Wcommon joke relates to the werase ("word erase") character in TTYs. As you'd guess, it kills the previous word, and is typically bound to ctrl-W.


Possibly they’re implying they wanted to write Twatter but corrected it.

“Twat” is a mild to medium insult (idiot, asshole etc depending on tone)


The article being reported on does not make this conclusion. The study authors are interested only in the time it takes for (human) users to complete CAPTCHAs, and did not examine the speed at which bots solve them.

The fact that bots can solve them -- and solve them fast -- is apparently a well-established fact in the literature. There is a table in the article comparing its (human) participants' solve times to a number of previous studies which examined how fast/accurate bots can be.

The Register (and the New Scientist, which most of this is cribbed from) is looking for a headline, so whatever. But the study's authors say that the "surprising" part is that "solving time and user perception are not always correlated" for human users. Game-based CAPTCHAs with sliders may take longer, but the users in the study still enjoyed them more than image-selection-based ones.


If you ask me, the whole idea of trying to prevent bad actors from acting badly by throwing up barriers to EVERYONE trying to get access to your system is... weird.

Better to deploy some light measures (tarpitting, RBLs etc.) on entry, then weed out the bad actors once they start acting bad inside the system, no? I mean CAPTCHA for everyone? Come on.


CAPTCHAS exist precisely because those were inadequate 15 years ago.

You may not have been around for it, but it's not like everyone was super duper excited to put these things on their web sites. It was something people were dragged into kicking and screaming, and even today there's a lot of those technologies deployed even so.

You are probably underestimating the willingness of bad actors to make efforts to avoid these things. Is your model of a "bad actor" on the web some malicious guy writing a program and running it on his personal laptop from his home connection? Because in 2023, your threat model should be something more like a guy who rents a botnet out with millions of computers of all sorts on it (the difficulty of this rental being somewhat higher than AWS, but only somewhat so, it's not that hard at all really), collaborates with other bad actors to work out how to best bypass filtering, creates websites to do things like CAPTCHA proxying so that humans fill out the CAPTCHAs in return for free porn or something, trades rootkits and other exploits around both for home computers and for compromising web servers for their campaigns (for the URL cred), and so on. You're not up against some guy, you're up against a honed and tuned machine with years of experience, internal division of labor and skillsets, basically an entire parallel predator economy.

Tarpitting and RBLs are not dead, but they became just one layer a long time ago.


In my experience Captchas are used a lot by inexperienced developers. As you stated, they are not particularly hard to circumvent, but they are incredibly easy to implement.

So developers just install a Captcha and outsource the problem to Google.

I think the primary way to deal with the problem should be to design services in a way to make them unsuitable for spammers.


"For distorted text fields, humans took 9-15 seconds with an accuracy of just 50-84 percent. Bots, on the other hand, beat the tests in less than a second with 99.8 percent accuracy."

I'm guessing part of the answer is (and most likely already implemented in things like reCAPTCHA) is rate limiting and detecting bots when they solve these too quickly.


The bots would just slow down then. Their time is ~free.

reCAPTCHA is one of the better captcha because they do a decent amount of browser fingerprinting and their captcha are interactive.

Still there are services for solving them. Fun thing is you only need to pay those services for first ~50K captcha and then you can train your own solver using the data you collected.

Ultimately, captchas only serve to increase the cost of running bots. If what ever your trying to protect is worth more you will fail.


I have a suspicion that truly effective browser fingerprinting breaks GDPR along with other privacy laws.


only if you can tie together the fingerprint and the identity.

;)


Note that an IP address counts as an identity for its purposes.


Modern captchas are mostly about letting google or cloudflare track you anyway aren't they? When I am forced to actually click on stoplights (most of that stuff should be easy object detection) I usually just get stuck in an infinite loop. Actually mastering the cognitive task is not really important now.


No, of course they aren't mostly about that.

You can tell it because it's not actually Google or Cloudflare installing captchas on websites of third parties. They in fact cannot do it. It's done by the people operating the websites and who desperately need to protect it against abuse, and for whom letting a company track you is not even a hypothetical motive.


I mean that determining if it's a human is based primarily on tracking.


Ah, fair enough! Sorry for the misunderstanding.

Even in the case where you're getting some kind of behavior or reputation verdict from past behavior (and possibly across multiple surfaces), you probably want a progressive set of outcomes rather than just a binary allow or deny. Even if some requests are clearly best just blocked and others should obviously be allowed, there's always going to be a grey area where you're not sure. You need something to do with those requests. Making an arbitrary choice is one option, but pretty harsh on the legit users. A captcha in another.

Sometimes you have options for that gray area that are much better than captchas that you can do, e.g. request a phone number and do an SMS challenge. But that's both expensive and will lead to a massive dropoff for most sites as people won't be willing to give out their phone number to every site.

(Also, the act of solving the puzzle can give you additional signals of whether the request is from a bot or not. Signal collection is kind of the entire point of the slider captchas in the first place.)


How else do you do it? There's significant overlap between the smartest AIs and the dumbest humans.


We're going to need to build AI resistant bins at some point, aren't we.


Most CAPTCHA's today also look at meta signals.

You can significantly increase your success rate by adding some "random human" actions:

- wiggle the mouse a bit, don't go in a straight line

- click multiple times in quick succession on a image

- wait a bit before clicking the Submit button after selecting matching images

- match a wrong image and then immediately unmatch it (as if a mistake)

- click randomly on the page around the CAPTCHA

- resize the window a bit, rotate the scroll mouse a bit

And don't do the exact same thing every time, improvise, pick randomly 2-3 actions from the list.


I regard these as more or less slave labor and in the interest of polluting training data I've been intentionally making a minority of incorrect selections on these for years.

I was surprised how rarely that I have to make more than one submission in spite of intentionally making incorrect selections.

I'm looking forward to Google getting sued after a Waymo tries to make a right on red at a pontoon boat.


Interesting. The authors of the captcha solving bot papers claim 100% accuracy for reCaptcha and 98% accuracy for hCaptcha.

That google does this is not really a surprise, since they earn money by letting bots through (bots are then counted as humans and google can bill for ads shown to the bot), but hCaptcha at least advertises the fact that their interest is actually detecting bots.


I personally know someone who has specifically designed bots to defeat both, and also the ones that defeat the "drag the puzzle piece" and similar "bot-defeating" technologies!


at this point I think your best bet is security through obscurity. with the state of generally available AI tools and processing power, is there any general format can't be solved?

surely the best option nowadays is to make your own or find an obscure one, and hope it's unusual enough that ready-made software doesn't exist that can easily solve it. then if and when it gets cracked to the degree it's impacting your content, move onto another one


> with the state of generally available AI tools and processing power, is there any general format can't be solved?

putting in a credit card?

I'm wondering if paid services have problems with bots, or if they mind, since they are being paid.


Ah! Here is the funny thing: Now your issue is card testers blowing up your dispute rates!


Putting in a CC for a trial to access your service doesn't end up deterring anybody because you can just buy a massive CSV with stolen details for relatively cheap. Even if they're frozen/cancelled, the card numbers are still valid numbers. If you want to pay to authenticate each and every one of them, you'll probably run yourself dry.

Can also just use old empty VISA gift cards, ethically sourced from relatives and friends of course.


with that, besides the issues other commenters raised, you're also putting off a very significant portion of legitimate users, who for any number of reasons may not want or be able to put a credit card into your site, even if they completely trust you


> Google's implementation, reCAPTCHA, eventually did away with much of these shenanigans to make the browser identify low-risk human users in the background, but the image verification method still pops up occasionally if risk cannot be ascertained.

Can’t remember the last time I clicked reCAPTCHA and didn’t have to do the challenge, come to think of it - I can’t remember a single time, so if it has happened it is very rare, whereas the Cloudflare one always lets me through.


Try using a browser which resists tracking more, you'll see it. And a lot of the time you see it it's not actually solvable: the captcha system has already decided you're a bot and will just try to tarpit you with ever slower-appearing challenges which will always fail.


The worst captcha are the ones on 4chan. I don't think I've ever gotten it right on my first try. It really discourages participation from all but the most dedicated of people. I swear it was added with the ultimate goal of reducing activity and getting regulars to pay for gold.


Really? Half of /g/ was using an auto solver on it. It's broken now I think still but the one that slides the letters works.

When you get used to it it doesn't seem that hard... Twitter on the other hand is incredibly confusing with the visual one but the audio is easy


I prefer 4chan captcha to the bicycle selection though. Once you get an eye for it, it’s almost impossible to get it wrong


I find them easier than the stoplight clicking or puzzle piece dragging captchas, which is not saying much.

>getting users to pay for gold

Gold... a 4chan pass?


CAPTCHAs have wasted a lot of my time that's why I choose not to enable it for my websites.


What about proof of work based CAPTCHA like https://github.com/mCaptcha/mCaptcha ? Since CAPTCHAs can be solved by bots, at least make it more costly for them.


Say, is there a Firefox extension that will solve captchas for me when i click on them? Don't need auto solving as in never see the captcha, I just don't want to look which of these blurry photos has palm trees and which has hills.


Sometimes the value of the action a bot is trying to perform is simply so low, that even a simple obstacle is effective. Like how much compute you want to spend to write one spam message through a contact form.


has this not been the case for a very long time? besides a sneaky way for Google to train its models, I had the impression that Captchas are more like a way to increase the friction rather than anything that would actually stop a determined actor?

they increase the processing/energy cost/set-up time/difficulty to the point where it may no longer be profitable to access that content, but no one really thought a powerful computer with the right software couldn't actually solve them at pace, right?


Yes and No. They used to be much harder to solve but modern AI has made them much easier to solve in the last couple of years.

Profitability. That is what it comes down to. The price of solving these have plummeted in last couple of years.


powerful image recognition software has been around longer than the last couple of years though, right?


I’ve noticed that in order to defeat captcha sometimes I have to slow down and click the crosswalks one at a time, pausing between each, then count to 3 and click submit.


Source/submitted few days back:

https://news.ycombinator.com/item?id=37071490


"Please submit a DNA Sample to continue"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: