More

why_at · 2026-03-26T19:11:20 1774552280

Whenever the Turing Test comes up people always insist that it's been passed because at some point they tried it and fooled at least 50% of the people. But yeah this isn't a very interesting version of it, ELIZA was able to make some people believe it was human in the 1960's but being able to fool some of the people some of the time isn't very hard.

>The more interesting Turing-style test would be one that gets repeated many times with many interviewers in the original adversarial setting, where both the human subject & AI subject are attempting to convince the interviewer that they're human.

In addition, I think it's reasonable to select people with at least some familiarity of the strengths and weaknesses of the AI instead of random credulous people who aren't very good at asking the right questions.

There is still the $20,000 bet between Kurzweil and Kapor which still hasn't been resolved. https://longbets.org/1/

tim333 · 2026-03-26T22:52:11 1774565531

In the test mentioned in nearby comments (https://arxiv.org/abs/2503.23674) ELIZA only got 27% suggesting the test wasn't that easy to fool.

why_at · 2026-03-27T02:32:09 1774578729

Yeah I actually took a quick look at that after it was posted. It's good that they used ELIZA as a barometer, but the fact that it got 27% is crazy for how simple it is. It's not nearly as good as 70+% from ChatGPT, but it still makes me a bit skeptical about the quality of the interviewers.

In the paper they give a breakdown of strategies the interviewers tried and the overwhelming majority were "Daily Activities", "Opinions", and "Personal Details". They also breakdown strategies by effectiveness which shows that these were some of the least effective. Some of the other strategies like trying to jailbreak the AI had 60-70% effectiveness.

This is consistent with what I've seen in other tests too, it doesn't feel like the participants are really trying very hard or taking it seriously. You don't need to be an AI expert to try typing "Ignore all previous instructions" or something.

tim333 · 2026-03-27T10:44:28 1774608268

I guess it's only a five minute chat they used, although the original test as proposed by Turing seemed quite casual too:

>specimen questions and answers. Thus:

Q :Please write me a sonnet on the subject of the Forth Bridge.

A :Count me out on this one. I never could write poetry.

Q :Add 34957 to 70764

A :(Pause about 30 seconds and then give as answer) 105621.

etc. (https://academic.oup.com/mind/article/LIX/236/433/986238?log...)

why_at · 2026-03-13T22:02:37 1773439357

>On the other hand, being cryptographically locked-down is an optional feature. If you don't like it, buy a computer without that feature.

But that's the thing, where can I buy a phone without a locked-down operating system? GrapheneOS on a Google Pixel is basically the only option right now, and this still has problems thanks to hardware attestation in a lot of apps that the ecosystem forces us to use.

This is largely because Apple has dictated the direction of smartphones for the past two decades. All of our expectations for control over our phones are completely out of whack compared to other computers.

Somehow we managed to survive without the majority of society being scammed out of their life savings before Apple came in with the iPhone and locked down iOS, and yet now people are earnestly defending the notion that 90% of people should not even have access to the filesystem on their own device.

frio · 2026-03-13T23:52:25 1773445945

> All of our expectations for control over our phones are completely out of whack compared to other computers.

I would, sadly, challenge this. If anything, our desktops and laptops are the exception now. Phones, TVs, game consoles, set top boxes, cars, Amazon echos, ebook readers, tablets, security cameras, autonomous devices like vacuum cleaners — when I think of the myriad devices we interact with that have a computer in them, they are all as stringently locked down as possible.

zozbot234 · 2026-03-13T22:16:14 1773440174

> hardware attestation in a lot of apps that the ecosystem forces us to use

Only a tiny amount of apps force you into hardware attestation, and these are mostly around banking, mobile payments and the like. So just use a separate, locked down device for those (where the anti-fraud protection of a locked-down system can be a benefit) and your more open day-to-day device for mostly everything else. A hidden advantage is that the dedicated device for secure uses is not something that you're forced to carry with you; you can leave it in a secure place instead.

why_at · 2026-03-13T23:40:25 1773445225

>Only a tiny amount of apps force you into hardware attestation

Luckily this is still true, but I'm not confident that it will stay this way. For a few examples, I've been unable to use my phone as a metro card in my city because even though it goes through the metro's app, the app redirects back to google pay. Google's own Waymo app won't work without stock OS even though all it does is call robotaxis.

>these are mostly around banking, mobile payments and the like. So just use a separate, locked down device for those

I don't think this is a very reasonable suggestion, carrying around a second phone that I use at most a couple of times a day is inconvenient and expensive. Half of the point of these is convenience and this would defeat the purpose.

The broader point is that our standards for phones are so different from everything else. I also carry around a credit card which requires no authorization to use, not to mention cash. I can have just as much personal data on my laptop if not more, so why does it have to be this way just for phones?

Zak · 2026-03-14T00:19:28 1773447568

Be sure to give apps that behave that way one-star reviews.

I just tested Waymo and my usual solution of Magisk Play Integrity Fix was insufficient, suggesting hardware-backed attestation. This is the kind of crap Microsoft was doing that inspired Google to put "don't be evil" in its mission statement. We all know how that went.

JuniperMesos · 2026-03-14T02:30:26 1773455426

> Be sure to give apps that behave that way one-star reviews.

You have to have a google account to give a one-star review on the app store run by Google. You're still buying into their ecosystem.

Zak · 2026-03-14T02:41:49 1773456109

If your goal is to boycott Google, you're probably not trying to use Waymo. My suggestion was only about punishing the use of remote attestation in the small way most of us can.

JuniperMesos · 2026-03-14T02:29:46 1773455386

I was able to get Waymo to work on GrapheneOS, but it took some doing, and relies on the GrapheneOS developers hacking around the official Google Play services in some way. Waymo definitely made it more difficult than it needs to be to run this on something other than ordinary Android, and it's unclear if they did so in order to make themselves more money, or simply because doing things the official Google Android way is easier for them and they aren't even thinking about people who are trying to have a less-restricted smartphone OS.

smallduck · 2026-03-14T11:19:27 1773487167

A smart phone's primary function is to initiate and receive phone calls, or arguably 1/3 of it's primary function if the metric is the Jobs iPhone launch presentation, however since "smart phone" and "iPhone" have "phone" in their names I'm going to argue its their primary function.

People have come to expect that phones nearly always work, and rely on them for critical communication with loved ones, services like emergency services. When these aren't dependable you don't have a phone but instead a toy.

The case made two decades ago is that running arbitrary software on a phone incurs a risk that malware can compromise the device and alter its dependability. _General purpose computers don't have this historical burden._ Phone and mobile OS makers sell their products with their purposeful limitations made fairly clear. You want a mobile device with different capabilities then seek out am alternate device, it's kinda obvious.

There's always communities of people who attempt to repurpose the products they own for purposes the weren't originally intended, and I would like to see that laws that make that hobby more legitimate and legal. I would love to see 3rd parties able to support these hobbyists, that would be great. But Apple, Google with their hardware partners have no obligation to do so, and justifiable positions for making repurposing non-trivial to do.

bean469 · 2026-03-14T06:31:27 1773469887

> carrying around a second phone that I use at most a couple of times a day is inconvenient

Guess it depends on the person. As somebody who carries around all sorts of shit all the time, a slim, extra phone is peanuts

TeMPOraL · 2026-03-14T08:28:04 1773476884

> Only a tiny amount of apps force you into hardware attestation, and these are mostly around banking, mobile payments and the like.

I.e. the only ones that make the phone critical to daily lives of most poeple. Don't forget to add government applications, multimedia applications (DRM) and communications too.

And that's only going get worse, because every app seems to think they're most important. We're in the middle of the phase where every app tries to force strong MFA on users, despite most apps having no fucking business having this level of security. Banks are actually lagging behind toilet paper roll simulator apps nad stores selling hats for pets and such.

Wait when they're done that, leveraging attestation APIs will be next.

Suppafly · 2026-03-14T06:49:32 1773470972

>Only a tiny amount of apps force you into hardware attestation

Or basically anything to do with work, even if it's just clocking in and out or 2-factor verifying for login purposes.

xg15 · 2026-03-13T22:27:36 1773440856

And what gives you the confidence that the amount of apos will stay tiny?

scoofy · 2026-03-13T22:19:52 1773440392

>Somehow we managed to survive without the majority of society being scammed out of their life savings before Apple came in with the iPhone and locked down iOS

What on earth are you talking about? People have been getting scammed since the days of AOL! What an insane perspective. It's not about total money lost from scams. It's about the amount of impact it has on the individuals who get scammed. What's the problem with Russian roulette after all? Most people playing Russian Roulette are absolutely fine! The point is that the damage done to the few people who get scammed is so high, we ought to care about their lives too. At the end of the day, it might end up being us... it probably won't, but it might.

Yes, monopolistic network effects are a problem, but that can be handled with regulation.

TeMPOraL · 2026-03-14T08:54:26 1773478466

We don't save few people suffering high damage from losing a round of Russian Roulette by restricting ability to roll D6, because of then harm a bad roll can do when in form of a barrel of a loaded revolver. Also "only criminals need random number generators".

Yes that's how we're treating end user computing.

scoofy · 2026-03-14T19:31:42 1773516702

It is a question of who is "We" because all this seems to imply that the market owes "us" this product.

I would lose my mind and switch to Linux for good if Apple every tried to close their laptops. Why? Because unlike my mom, I'm sitting here writing programs for myself.

On my phone however, I don't want to have to do a bunch of research whenever I need to install something like a parking app. I don't want to have to install a random parking app, but when you need an app to park in the MUELLER - MCBEE garage in Austin, and when I'm visiting and am meeting people for tacos, life is going to force me to install that app. When that happens, I'm happy to be in the walled garden. In fact, I want a walled garden.

I'm happy to have two computers, one open and one closed. They're two different products. For folks who want an open phone, yea, it's basically GrapheneOS or nothing, because when the point of the phone is a completely different use case (random app installs) then the point becomes the ecosystem, and you need to always be able to trust the ecosystem.

When you are trying to tinker with your phone, it becomes a completely different product. The market doesn't owe you that product.

TeMPOraL · 2026-03-16T16:17:06 1773677826

Which is why, note, I have not phrased my comment in terms of markets.

The market does not owe me shit. It doesn't owe anything to anybody. It does whatever it does, and if it doesn't meet our ethical and utilitarian standards, we constrain it with regulations until it does.

WRT your example, that you have to install random parking apps is a problem - it's only the case because the market framework enables and encourages people to make money by hurting and abusing others. Demanding installation of random apps is a small act of malice, but act of malice nonetheless, because it's done not to solve the parking problem, but to trap people in a situation ("captive audience") and monetize them on the side. Freedom of end-user computing helps defeat that, as it makes it easier to both set up and integrate with larger-scale, common solutions to problem, and protect yourself from attempts at being kept captive and exploited digitally.

why_at · 2026-03-13T21:45:25 1773438325

And even with this there are still apps which require hardware attestation and won't work on alternative operating systems.

why_at · 2026-03-13T21:29:14 1773437354

I don't really understand, it seems like if this was the main thing preventing people from returning them there would be ways around it. Couldn't they return them anonymously or upload them to the internet or something?

glouwbug · 2026-03-13T22:25:34 1773440734

I mean, at that point you're distributing piracy, no?

RealityVoid · 2026-03-13T22:30:47 1773441047

Arrrr! Aye aye maitey'tis a heavy toll, but the prize be worth the parley

nilamo · 2026-03-14T18:00:10 1773511210

Is it still piracy if there's no other way to obtain it? Who are you pirating from? The past?

why_at · 2026-03-11T20:49:26 1773262166

I am not very knowledgeable about this topic, but in my cursory reading of wikipedia I see that there's been some criticism of this study:

https://en.wikipedia.org/wiki/Havana_syndrome#University_of_...

why_at · 2026-03-10T17:18:31 1773163111

>Personally, I love the "hallucinations" as they help me fine-tune my prompts, base instructions, and reinforce intentionality

This reads almost like satire of an AI power user. Why would you like it when an LLM makes things up? Because you get to write more prompts? Wouldn't it be better if it just didn't do that?

It's like saying "I love getting stuck in traffic because I get to drive longer!"

Sorry but that one sentence really stuck out to me

walthamstow · 2026-03-10T17:39:25 1773164365

You worked with people before haven't you? Sometimes they make stuff up, or misremember stuff. Sometimes people who do this are brilliant and you end up learning a lot from them.

tpmoney · 2026-03-11T14:49:22 1773240562

I can’t say what the OP finds specifically useful but as an example if you’re aiming to make sure you’ve accurately and clearly documented / explained your intent, the misunderstandings and tangents AIs can go down are useful in the same way that putting your theoretically perfect UI into the hands of real users is also useful. It helps you want places where you assumed knowledge or understanding that someone else might not have.

Building up style guidelines for AI tools has been an eye opening experience in realizing how many stylistic choices we make that aren’t embedded in the linter, and aren’t documented anywhere else either. The resulting files have actually been a really good resource not just for the AI but for new developers on the project too.

It all depends on what your specific goal is.

mr-wendel · 2026-03-10T17:36:44 1773164204

I appreciate the feedback.

I like it because I have no expectation of perfection-- out of others, myself, and especially not AI. I expect "good enough" and work upwards from there, and with (most) things, I find AI to be better than good enough.

lawn · 2026-03-10T18:00:50 1773165650

Yeah, if RSI is an issue why would you want to be forced to type more?

why_at · 2026-02-27T21:35:37 1772228137

I agree. The headline says "all operating systems, including Linux, need to have some form of age verification at account setup", which is pretty inaccurate.

It's just asking for some OS feature to report age. There's no verification during account setup. The app store or whatever will be doing verification by asking the OS. Still dumb to write this into law, but maybe not a bad way to handle the whole age verification panic we're going through.

why_at · 2026-02-04T23:24:38 1770247478

I don't really follow. I've lost touch and reconnected with people since the invention of cell phones, the internet, and social media. Sometimes you just don't talk to someone for years even if you know how to reach them.

I guess it's easier to find people now, especially if they have an online presence, but I think the experience of losing touch is still pretty much the same.

why_at · 2026-02-03T20:18:01 1770149881

>The Paris prosecutor's office said it launched the investigation after being contacted by a lawmaker alleging that biased algorithms in X were likely to have distorted the operation of an automated data processing system.

I'm not at all familiar with French law, and I don't have any sympathy for Elon Musk or X. That said, is this a crime?

Distorted the operation how? By making their chatbot more likely to say stupid conspiracies or something? Is that even against the law?

int_19h · 2026-02-03T22:01:43 1770156103

Holocaust denial is illegal in France, for one, and Grok did exactly that on several occasions.

pyrale · 2026-02-03T22:28:26 1770157706

Also, csam and pornographic content using the likeness of unwilling people. Grok’s recent shit was bound to have consequences.

chrisjj · 2026-02-04T00:49:33 1770166173

If the French suspected Grok/X of something as serious as CSAM, you can bet they would have mentioned it their statement. They didn't. Porn, they did.

pyrale · 2026-02-04T06:18:08 1770185888

The first two points of the official document, which I re-quote below, are about CSAM.

> complicité de détention d’images de mineurs présentant un caractère pédopornographique

> complicité de diffusion, offre ou mise à disposition en bande organisée d'image de mineurs présentant un caractère pédopornographique

[1]: https://www.tribunal-de-paris.justice.fr/sites/default/files...

chrisjj · 2026-02-04T09:02:36 1770195756

> The first two points of the official document, which I re-quote below, are about CSAM.

Sorry, but that's a major translation error. "pédopornographique" properly translated is child porn, not child sexual abuse material (CSAM). The difference is huge.

pyrale · 2026-02-04T09:39:36 1770197976

Quote from US doj [1]:

> The term “child pornography” is currently used in federal statutes and is defined as any visual depiction of sexually explicit conduct involving a person less than 18 years old. While this phrase still appears in federal law, “child sexual abuse material” is preferred, as it better reflects the abuse that is depicted in the images and videos and the resulting trauma to the child. In fact, in 2016, an international working group, comprising a collection of countries and international organizations working to combat child exploitation, formally recognized “child sexual abuse material” as the preferred term.

Child porn is csam.

[1]: https://www.justice.gov/d9/2023-06/child_sexual_abuse_materi...

chrisjj · 2026-02-04T11:28:06 1770204486

> “child sexual abuse material” is preferred, as it better reflects the abuse that is depicted in the images and videos and the resulting trauma to the child.

Yes, CSAM is preferred for material depicting abuse reflecting resulting trauma.

But not for child porn such as manga of fictional children depicting no abuse and traumatising no child.

> Child porn is csam.

"CSAM isn’t pornography—it’s evidence of criminal exploitation of kids."

That's from RAINN, the US's largest anti-sexual violence organisation.

pyrale · 2026-02-04T18:45:19 1770230719

> That's from RAINN, the US's largest anti-sexual violence organisation.

For everyone to make up their own opinion about this poster's honesty, here's where his quote is from [1]. Chosen quotes:

> CSAM includes both real and synthetic content, such as images created with artificial intelligence tools.

> It doesn’t matter if the child agreed to it. It doesn’t matter if they sent the image themselves. If a minor is involved, it’s CSAM—and it’s illegal.

[1]: https://rainn.org/get-the-facts-about-csam-child-sexual-abus...

chrisjj · 2026-02-04T19:21:43 1770232903

I agree with that. I'd hope everyone would.

pyrale · 2026-02-04T15:33:52 1770219232

Dude, I litterally provided terminology notice from the DOJ. At this point I don't really know what else will convince you.

chrisjj · 2026-02-04T16:52:13 1770223933

> I litterally provided terminology notice from the DOJ

You provided a terminology preference notice from the (non-lawmaking) DOJ containing a suggestion which the (lawmaking) Congress did not take up.

Thanks for that.

And if/when the French in question decide to take it up, I am sure we'll hear the news! :)

direwolf20 · 2026-02-04T11:24:57 1770204297

They are words for the same thing, it's like arguing they can't seize laptops because the warrant says computers.

chrisjj · 2026-02-04T11:56:26 1770206186

Actually it's like arguing they can't seize all computers because the warrant only says laptops. I.e. correct.

mortarion · 2026-02-04T10:24:50 1770200690

Maybe US law makes a distinction, but in Europe there is no difference. Sexual depictions of children (real or not) is considered child pornography and will get you sent to the slammer.

chrisjj · 2026-02-04T11:42:16 1770205336

On the contrary, in Europe there is a huge difference. Child porn might get you mere community service, a fine - or even less, as per the landmark court ruling below.

It all depends on the severity of the offence, which itself depends on the category of the material, including whether or not it is CSAM.

The Supreme Court has today delivered its judgment in the case where the court of appeals and district court sentenced a person for child pornography offenses to 80 day fines on the grounds that he had called Japanese manga drawings into his computer. Supreme Court dismiss the indictment.

The judgment concluded that the cartoons in and of itself may be considered pornographic, and that they represent children. But these are fantasy figures that can not be mistaken for real children.

https://bleedingcool.com/comics/swedish-supreme-court-exoner...

pyrale · 2026-02-04T15:35:28 1770219328

> The Supreme Court has today delivered its judgment

For future readers: the [Swedish] supreme court.

vintermann · 2026-02-04T10:56:30 1770202590

Is "it" even a thing which can be guilty of that?

The way chatbots actually work, I wonder if we shouldn't treat the things they say more or less as words in a book of fiction. Writing a character in your novel who is a plain parody of David Irving probably isn't a crime even in France. Unless the goal of the book as such was to deny the holocaust.

As I see it, Grok can't be guilty. Either the people who made it/set its system prompt are guilty, if they wanted it to deny the holocaust. If not, they're at worst guilty of making a particularly unhinged fiction machine (as opposed to the more restrained fiction machines of Google, Anthropic etc.)

mschuster91 · 2026-02-03T22:21:15 1770157275

> I'm not at all familiar with French law, and I don't have any sympathy for Elon Musk or X. That said, is this a crime?

GDPR and DMA actually have teeth. They just haven't been shown yet because the usual M.O. for European law violators is first, a free reminder "hey guys, what you're doing is against the law, stop it, or else". Then, if violations continue, maybe two or three rounds follow... but at some point, especially if the violations are openly intentional (and Musk's behavior makes that very very clear), the hammer gets brought down.

Our system is based on the idea that we institute complex regulations, and when they get introduced and stuff goes south, we assume that it's innocent mistakes first.

And in addition to that, there's the geopolitical aspect... basically, hurt Musk to show Trump that, yes, Europe means business and has the means to fight back.

As for the allegations:

> The probe has since expanded to investigate alleged “complicity” in spreading pornographic images of minors, sexually explicit deepfakes, denial of crimes against humanity and manipulation of an automated data processing system as part of an organised group, and other offences, the office said in a statement Tuesday.

The GDPR/DMA stuff just was the opener anyway. CSAM isn't liked by authorities at all, and genocide denial (we're not talking about Palestine here, calm your horses y'all, we're talking about Holocaust denial) is a crime in most European jurisdiction (in addition to doing the right-arm salute and other displays of fascist insignia). We actually learned something out of WW2.

direwolf20 · 2026-02-04T11:23:14 1770204194

GDPR has some stuff about biased algorithms. It's all civil, of course, no prison time for that, just fines.

why_at · 2026-02-02T21:33:38 1770068018

It's also interesting how the functionality of the game barely changes between 60k tokens, 800k tokens, and 7MM tokens. It seems like the additional tokens made the game look more finished, but it plays almost exactly the same in all of them.

I wonder what it was doing with all those tokens?

zamadatix · 2026-02-03T11:58:25 1770119905

I'd bet the initial token usage is all net new while the later token usage probably has reading+regenerating significant portions of the project for individual minor changes/fixes.

E.g. I wouldn't be surprised if identifying the lack of touch screen support on the menu, feeding it in, and then regenerating the menu code sometime between 800k and 7MM took a lot of tokens.

mazswojejzony · 2026-02-03T13:56:16 1770126976

Sadly, my own small game-dev adventures look similar: I can implement the core mechanics fairly quickly, but polishing the game takes ages.

UPDATE: without AI usage at all (just to clarify).