Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Why isn't there a 100% undetectable automatable browser?(off the shelf)
10 points by throwawayadvsec on July 21, 2023 | hide | past | favorite | 16 comments
I've tried plenty of automatable browsers over the years(nightmare, puppeteer+puppeteer-stealth, playwright, fakebrowser, selenium, secret-agent...)

Unless you tweak them a lot, they're all detectable.

I'm not talking about headless browsers because they're all detectable no matter what you do(at least with timing attacks, with the way things are rendered... you need to run them in a real window to make them undetectable)

Why is there not one 100% undetectable browser out of the box?

Just a regular chrome/firefox with some way to fake clicks and inputs without changing any of the behaviors of the browser besides this?

I've had multiple ideas, like running a real browser with a fake uBlock/adblock extension that get the coordinates/values of elements, transmits those to another process, and then moves a real mouse cursor through the screen and send keyboard events to the actual OS. or just use OCR on the screen without doing anything to the browser. (Of course clicks and keyboard events needs to emulate the timings and movements of a human, but that's another problem entirely.)

But it would seem really painful to make those reliable.

Do you know of a package/browser that is 100% undetectable and just exactly behaves like a real browser?



Selenium, Puppeteer, Playwright, Cypress, etc can all drive "real" browsers - although note by default config is altered for performance / testing reasons and this can be detected in some cases.

But assuming you're driving a real browser you're probably being "detected" because of your behaviour.

Humans are slow. They follow certain browsing patterns. They interact with the site in a certain way (scrolling, moving the mouse cursor, keyboard presses).

If a client is hitting a site many times a minute without much scrolling or mouse movement and seemingly doing things in an unusual or systematic way it will often trigger security measures like Captchas.

What you're describing here is the product of an arms race between people who want to scrape and exploit websites using automated tools and the websites themselves who want to offer the best service possible to legitimate users.

It's partly why so much of the internet today requires people to login and verify phone numbers / email addresses. It's also why we see captchas and other tactics to slow users down and screen out bots.

If you built something completely undetectable the bad guys would love it and sites would then need to find new ways to detect / stop whatever you're doing.


“Of course clicks and keyboard events needs to emulate the timings and movements of a human, but that's another problem entirely.”

I don’t think that’s a different problem. As you know, real browsers are operated by humans, and humans have limited processing capacity and somewhat predictable navigation patterns.

To be undetectable, you have to mimic both (almost) perfectly.

So, you have to both limit the speed at which a fake user requests pages _and_ request then in a believable sequence.

The speed issue can be circumvented by using multiple machines to make requests from, but the second is the real problem. A collection of fake users would have to either stay under the radar by not making many requests or follow human patterns.

Combined, I would think it’s not possible to efficiently scrape lots of pages in a way that cannot be detected.


From my experience, it's a completely different problem.

For 99% of websites there is no serious behavioral analysis, and when there it's easily bypassable.

It's as simple as:

wait some random time between 500 and 3000ms before clicking and always make the clicks randomly slightly off from the center of the element

wait for some random time between 50 and 200ms between keystrokes

use ghost-cursor to make realistic movements using bezier curves

Most of the time you get spotted because of some obscure value in some random part of the browser or request headers isn't what it's supposed to be, not because your actions aren't realistic


Or, save yourself the hassle of pretending to be using a mouse, by pretending to be using a touchscreen.


well it's also pretty hard to simulate a phone, even a lot harder than a browser imo

for example I'm pretty sure the GPU/canvas fingerprint can't be faked


I've been thinking about this lately.

I would think that a fully automatable browser might be an accessibility need, and therefore the viable use whose free access may be protected via ADA regulation. Of course, you might need someone with a disability to push a legal threat in the direction of one of the offending companies in order to get them to play nicely.

I admit I don't like that approach. I would rather just find a solution that works (or perhaps build one and market it).


Applescript + accesibility mode + Safari should get you pretty far, I'd think. Applescript is a terrible language to work with, but if you want a real browser, you want a real browser.


interesting, but I should have probably mentioned that something scalable would be nice

mac instances are 30 bucks a day on AWS


I thought you wanted to look like a person, why are you running on aws?

Pick up a used mac somewhere for a weeks worth of aws rental and you can run that from home on residential internet.


You can use proxies from AWS that's not an issue

I need to scale, using a physical machine is not feasible


What issues are you running into with mentioned browsers? Captcha?

You likely need, in the order of importance:

- a signed-in google account

- mobile proxy

- captcha solver plugin

- randomized offsets and delays

- virtual screen, screenshot and ocr tools for specific cases


I knew about all that besides the google account

do a lot of websites try to check if you have a google account?

The worst thing I came across was chaff bugs[0] that only happened in puppeteer.

I'm not really looking for a method to not be blocked, I'm looking for a browser controllable from code that is indistinguishable from a regular browser out of the box, there is a slight difference.

[0]: https://www.csoonline.com/article/566227/what-is-a-chaff-bug...


What is your use case for this? Is there a specific website that you're targeting?


I do a lot of scraping/automation for many different websites, so any hard to automate websites. Websites protected with cloudfare/datadome, FAANGs, microsoft websites, or websites with advanced custom protections...


It is more about IP address range. There are apps, that give users cash, in exchange using their home connectivity.

If you want 100% undetectable, use real browser, with image recognition and auto clicker.


IPs aren't enough in a lot of cases

"If you want 100% undetectable, use real browser, with image recognition and auto clicker. " Do you know an open source package that does this reliably?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: