Hacker Newsnew | past | comments | ask | show | jobs | submit | FeepingCreature's commentslogin

My own view is, I thought we were all agreed that the idea that Microsoft can restrict Wine from even using ideas from Windows, such that people who have read the leaked Windows source cannot contribute to Wine, was a horrible abuse of the legal system that we only went along with under duress? Now when it's our data being used, or more cynically when there's money to be made, suddenly everyone is a copyright maximalist.

No. Reading something, learning from it, then writing something similar, is legal; and more importantly, it is moral. There is no violation here. Copyright holders already have plenty of power; they must not be given the power to restrict the output of your brain forever more for merely having read and learnt. Reading and learning is sacred. Just as importantly, it's the entire damn basis of our profession!

If you do not want people to read and learn from your content, do not put it on the web.


> No. Reading something, learning from it, then writing something similar, is legal; and more importantly, it is moral.

Machines aren’t human. Don’t anthropomorphize them. The same morals and laws don’t apply.


If you want people to read and learn from each other, you should incentivize people to make content worth reading and learning from. Making LLM training a viable loophole for copyright law means there won’t be incentives to produce such work.

I don't think that's the case.

People getting better at writing is only going to increase the quality of the output.

Increasing both competition and tooling (by providing every writer with the world's greatest encylcopedia/thesaurus/line-editor/brainstormer/planner/etc) is only going to make writers better.

Will there be lots of people who misuse the system? Are there lots of people who use thesaurus words without knowing what they're talking about? Can't you tell the difference?

I see in LLMs a lowering of the ground floor making it easier for people to get in. This will increase the total availability of content.

I also see in LLMs a raising of the top bar making it harder to be the best. If more people are writing and more people are trying to be the best, the best is going to get better.

Consider chess. Have we suddenly stopped playing chess now that a phone can beat 95+% of people? No. The market is stronger than ever and still growing. The greatest player in the world use the chess algorithms to refine their play and the play keeps expanding in new and interesting ways.

In both writing and chess, yes, there is an explosion of low and middling play. But since when have we not always had people producing content and playing chess that when compared to the masters of the field is generally viewed as substandard?

But here's the kicker. Some people's favorite genre is badly editted fanfic. Some people genuinely derive actual pleasure from things that you or I might call garbage. And what's wrong with that? Who am I to say that you can't love clutzy firecop loves suburban housewife paperbacks? Or Zelda/Harry Potter crossfics or whatever.


Re-reading your comment, I think we’re both generally anti-corporate-fuckery. I view the current batch of copyright pearl clutching to be an argument about if VCs are allowed to steal books to make their chatbots worth talking to, and the Wine/MSoft debate about if it should be legal to engage in anticompetitive behavior by restrictive use of copyright. In both of these cases the root of the issue isn’t really the copyright as an abstract- it’s the bludgeoning of the person with less money by use of overwhelming legal costs to have a day in court.

I agree that's bad at any rate. However, I genuinely think that reading and learning without literal reproduction is not (should not be) a violation of copyright and does not (should not) require an additional grant for content that has been made publicly available. I think that regardless of whether a company is the subject or the actor.

you're totally right about not being theft, but we have a term. you used it yourself, "distributed denial of service". that's all it is. these crawlers should be kicked off the internet for abuse. people should contact the isp of origin.

Firstly, since this argument is about semantic pedantry anyways, it's just denial-of-service, not distributed denial-of-service. AI scraper requests come from centralized servers, not a botnet.

Secondly, denial-of-service implies intentionality and malice that I don't think is present from AI scrapers. They cause huge problems, but only as a negligent byproduct of other goals. I think that the tragedy of the commons framing is more accurate.

EDIT: my first point was arguably incorrect because some scrapers do use decentralized infrastructure and my second point was clearly incorrect because "denial-of-service" describes the effect, not the intention. I retract both points and apologize.


ah, no fun, I was going to continue the semantic deconstruction with a whole bunch of technicalities about how you're not quite precisely accurate and you gotta go do the right thing and retract your statements.

boo. took all the fun out of it ;)


Sufficiently advanced negligence is indistinguishable from malice. There is a point you no longer gain anything from treating them differently.

The first is incorrect, these scrapers are usually distributed across many IPs, in my experience. I usually refer to them as "disturbed, non-identifying crawlers (DNCs)" when I want to be maximally explicit. (The worst I've seen is some crawler/botnet making exactly one request per IP -_-)

I think the second is incorrect too. DDoS is a DDoS no matter what the intent is.

I think one could argue that one. Is a DDoS a symptom? In which case the intent is irrelevant. Or is a DDoS an attack/crime? In which case it is. We kind of use it to mean both. But I think it's generally the latter. Wikipedia describes it as a "cyberattack", so actually I think intent is relevant to our (society's) current definition.

The semantics that make sense to me is that "DDoS" describes the symptom/effect irrespective of intent, and "DDoS attack" describes the malicious crime. But the terms are frequently used interchangeably.

> So as of today, the Copyright system does not have a way for the output of a non-human produced set of files to contain the grant of permissions which the OpenBSD project needs to perform combination and redistribution.

This seems extremely confused. The copyright system does not have a way to grant these permissions because the material is not covered under copyright! You can distribute it at will, not due to any sort of legal grant but simply because you have the ability and the law says nothing to stop you.


This all relies, as the article points out, on everyone looking directly at code that both looks like and works like the only extant codebase for EXT4 and nonetheless concluding that in fact the computer conjured it from the aether. If I wrote a program that zipped up the Linux kernel source, unzipped it, and grepped -v for comments it would not then be magically transformed into unattributable public domain software.

Under the premise advanced in the quote, copyright is not being violated because there is none. Thus, the quote makes no sense as stated. It may be that, additionally, copyright is in fact being violated (I don't believe it myself), but if so that's a separate argument.

The premise of the quote does not contain the assumption that there is no copyright to the code. In fact the various contributors do not advance an opinion about whether code written by an AI can be granted copyright. Rather they are saying that it is obviously derivative of code that is under copyright, that is only distributed under terms which, however many dry cleaners process it, will still conflict with the license under which they publish their software.

Different people advance different arguments in the thread. The BSD argument is "we cannot distribute it because it is not copyrightable, thus we cannot put it under a BSD license." This is simply incoherent.

> Rather they are saying that it is obviously derivative of code that is under copyright

Derivatives are not subject to copyright, unless they are close to, and contain substantial verbatim copies from, the original. It's a virtual certainty that a vibe-coded Ext4 FS is none of the above.

Redefining copyright as some weird patenting of similar ideas is absurd.



> If I wrote a program that zipped up the Linux kernel source, unzipped it, and grepped -v for comments it would not then be magically transformed into unattributable public domain software.

That's not the case here. A re-implemented piece of software that does not contain meaningful verbatim excerpts from the original is not subject to the copyright of the original.


that is not certain. if you read code and then reimplement it using the original code as reference, the claim has been made that this falls under the copyright of the original because the new code is derived from the old code. unfortunately this particular situation has not yet been tested in court. but clean room implementations are done specifically to avoid the risk reading the original code poses. if this was clear cut then clean room development would not be needed.

this is similar to creating an extension to some program, because the extension could not be written without the original even if the interface the extension is using is a public API. the claim has been made that the copyright of the original program applies. i think the linux kernel is an example here.

see also these questions on stackexchange:

https://softwareengineering.stackexchange.com/questions/2087...

https://softwareengineering.stackexchange.com/questions/8675...


> this is similar to creating an extension to some program

There's no such thing as "an extension to some program". A derivative work is a work that contains the original. Using the privileges provided by copyright law, the creator may impose licensing restrictions on how the original work is used - but that's contract law, not copyright.

For example the GPL and the AGPL define different sets of use restrictions, none of that matters in this case because the original work is not being reproduced or used per se.

As I already said in my other, down-voted comment - copyright is only about verbatim, or near verbatim copies, in whole or in part - it's the spirit that both judgment and the letter of the law are supposed to follow. Copying of functionality is not subject to copyright.

For example, one can use the same topic for a work of poetry for a similar aesthetic effect and that doesn't infringe other poems.

The GPL used a hack to stretch copyright law into a near opposite but stretching it further goes into absurd territory, achieving the opposite of what the GPL claims to protect.


a kernel driver is an extension to the kernel. yet, even with a clearly defined API it is a derived work of the kernel.

one can use the same topic for a work of poetry for a similar aesthetic effect and that doesn't infringe other poems

because the new poem does not depend on the original.

the kernel driver is useless without the kernel


> a kernel driver is an extension to the kernel. yet, even with a clearly defined API it is a derived work of the kernel.

Maybe, in some alternative universe, that could be correct but it isn't anywhere on Earth.

You can write a BSD-licensed driver as a Linux module and distribute it separately all you want - copyright law is OK with that.

The moment you insert the module into the kernel the whole thing, kernel + driver becomes a derivative work and you're forbidden from using it by the GPL - the license, not copyright... Copyright only gives the creators of the kernel the privileged power to impose that contractual restriction.

Long time ago, some BSD guys were trying to convince me that the GPL was primarily a weapon against BSD and other less restrictive licenses but I didn't believe it back then... boy, was I wrong.

You showed me how the GPL can be used for threats against the free modification of software by arguing for the addition of new, absurd powers to copyright - the opposite of what the GPL proponents are promoting it for. It's indeed a license that must be avoided at all cost.


not in an alternate universe, but it's a claim made by some free software people. i don't have time to search for a quote right now.

yes, it is disputed, and the claim has not been tested in court. but it is an argument being made.

the GPL was primarily a weapon against BSD.

It's indeed a license that must be avoided at all cost.

well, it depend on whose side you support. i am on the side of protecting the rights of the user to modify their software. BSD licenses don't do that. they give me the right, but they don't protect it.

more importantly, i am also on the side of the developer to protect their ability to make a living. for that the BSD license is completely useless. GPL is better, AGPL even more, but even those are not restrictive enough to prevent unfair competition by large corporations.

i am not interested in allowing those companies to benefit from my work if they are not required to pass that forward.


> but it's a claim made by some free software people.

In other words, you don't know what you're talking about... Everything I write is verifiable, have you heard of AI chat bots? Why are you going around asking old ladies for the latest gossip?

> yes, it is disputed, and the claim has not been tested in court.

Why don't you test in court? Do it, let's see what happens. Why did Linus wave middle fingers like a confused clown when Nvidia's lawyers stuffed the GPL2 with their driver? There was no lawsuit, only buffoonery in place of the promised "protection".

> but it is an argument being made.

There are millions of "arguments being made", 99.9% of them are BS, if you can't defend your arguments with facts, logic and court decisions don't waste e-space by regurgitating useless gossip, especially on HN.

> BSD licenses don't do that. they give me the right, but they don't protect it.

So, that's your reason to go on a crusade against the rights provided by BSD licenses.

Oh, that's sneaky - "Let's protect people from a license that gives them more rights than ours"

Your "protection" amounts to shilling for an absurdly extended interpretation of copyright powers while it's being sold as a defense against these very powers - this kind of diabolical nonsense is the opposite of protection.


you may want to read the discussion here: https://lwn.net/Articles/998382/ "Is the GPL actually viral across dynamic linking?"

to quote one commenter there:

there's something socially wrong with taking someone's gift and ignoring the terms under which it was given. If you want a system where you can load any modules, use a BSD kernel. [...] If the creators of a GPL kernel label some items as an external API for anyone's use, and other items as GPL hooks for functionally internal code loaded externally, respect that.

me talking about "protection" is a call for solutions. if the interpretation of the GPL here is absurd, then the problem is not that it is wrong, but that the GPL does not provide enough protection. if you don' want that protection, fine, that's your choice. i do want that protection, and i am looking for solutions. if you are not interested in solving that problem then we don't need to continue this discussion.


in response to - https://news.ycombinator.com/item?id=47568753

> you may want to read the discussion "Is the GPL actually viral across dynamic linking?"

What does that have to do with the price of tea in China? We are talking about an independent implementation of a BSD driver for Ext4-strucutred storage but you keep bringing up unrelated random pieces of chatter from around the web.

> but that the GPL does not provide enough protection.

But you don't understand the difference between copyright and contract. The GPL, or any other license based on copyright, cannot prevent the creation of the driver in question because it doesn't involve any copying of the kind protected by copyright law.

> if you are not interested in solving that problem then we don't need to continue this discussion.

Except, that's not the problem we are discussing.

Indeed, there's no point in continuing this discussion, you don't understand the basics, cannot follow the line of reasoning and keep getting lost in hallucinations.


What if one reverse engineered the original logic, for example translating the assembly code into a higher level language. They didn't use or look at the original code. Does that still count as "clean room"? What's the legal difference between that and deriving the logic just from observing how the running program acts?

there is no legal precedence that clarifies what clean room development is. clean room development is a precaution to stay away as far as possible from the original code in order to reduce the risk of infringement. clearly, not looking at the assembly code is better than looking at it.

Just because you can distribute something doesn't mean you aren't violating someone else's copyright. You cannot assume that just because a language model popped out some code for you that it is clear of any other claims.

This is just lazy copyright whitewashing.


Eh … the argument will likely be things created by Thing at the behest of Author is owned by the Author. It’ll take a few cases going through the courts, or an Act of Congress to solidify this stuff.

Just like we settled on photographers havin copyright on the works created by their camera. The same arguments seem to apply

The US Copyright Office has published a piece that argues otherwise, but a) unless they pass regulation their opinion doesn't really matter, and b) there is way too much money resting on the assumption code can be copyrighted despite AI involvement.


It's not settled. The monkey selfie copyright dispute ruled that a monkey that pressed the button to take a selfie, does not and cannot open the copyright to that photo, and neither does the photographer who's camera it was. How that extends to AI generated code is for the courts to decide, but there are some parallels to that case.

https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...


But with the monkey there are two levels of separation from the artist: the human makes the creative decision to hand the camera to a monkey, who presses the trigger, and the camera makes the picture. Compared to the single layer of separation of a photographer choosing framing and camera parameters, pressing the trigger and the camera taking the picture. Or the zero levels of separation when the artist paints the picture.

A programmer writing code would be like the painter, and the programmer writing a prompt for Claude looks a lot like the photographer. The prompt is the creative work that makes it copyrightable, just like the artistic choices of the photographer make the photo copyrightable

You could argue that the prompt is more like a technical description than a creative work. But then the same should probably be true of the code itself, and consequently copyright should not apply to code at all

The copyright office's argument is that the AI is more like a freelancer than like a machine like a camera. Which you might equate to the monkey, who's also a bit freelancer like. But I have my doubts that holds up in court. Monkeys are a lot more sentient than AIs


The copyright office is pretty clear on this if you read: https://www.copyright.gov/ai/Copyright-and-Artificial-Intell....

There is case law surrounding the fact that just because you commission a work to another entity doesn't give you co-authorship, the entity doing the work and making creative decisions is the entity that gets copyright.

In order for you to have co-authorship of the commissioned work you have to be involved and pretty much giving instruction level detail to the real author. The opinion shows many cases that its not the case with how LLM prompts work.

The monkey selfie case is relevant also because since it also solidifies that non-persons cannot claim copyright, that means the LLM cannot claim copyright, and therefore it does not have copyright that can be passed onto the LLM operator.


The law is whatever it needs to be to satisfy monied interests with the degree of acceptable of adaptation being a function of the unity of those interests and the political ascendancy of those in favor.

Overwhelmingly this is in favor of treating ai as a tool like Photoshop.

Even those against AI disagree on different matters and will overwhelmingly want a cut not a different interpretation.


This filesystem driver was made by a human using AI, not a monkey.

Haven't there already been a few cases, each of which found that mechanically-produced works are not copywritable?



> This seems extremely confused. The copyright system does not have a way to grant these permissions because the material is not covered under copyright!

This opinion is simplistic. LLMs are trained with pre-existing content, and their output directly reflects their training corpus. This means LLMs can generate output that matches verbatim existing work. And that work can very well be subjected to copyright.


Language models are good at translation and retrieval. This also extends to computer languages. LLMs translate from GPL to other licenses the same way Google translate turns French to English, except that the source material is implicitly stored in the LLM.

this is disputed. see my comment here, especially the stackexchange links: https://news.ycombinator.com/edit?id=47557250

fun fact, you can kill all firefox background processes and basically hand-crash every tab and just reload the page in the morning. I do this every evening before bed. `pkill -f contentproc` and my cpu goes from wheezing to idle, as well as releasing ~8gb of memory on busy days.

("Why don't you just close firefox?" No thanks, I've lost tab state too many times on restart to ever trust its sessionstore. In-memory is much safer.)


Yeah, I found this out the other day when my laptop was toasting. In hindsight, probably related to archive.today or some Firefox extension.

You have to close Firefox every now and then for updates though. The issue you describe seems better dealt with on filesystem level with a CoW filesystem such as ZFS. That way, versioning and snapshots are a breeze, and your whole homedir could benefit.


FWIW: the Tab Stash extension has worked well for me.

"your favorite band's favorite band"

This is how you end up with a Big Star album.

that's Pile!

Source: a bad study from 2023.


Because they exclusively used a model that was about as big as the original GPT-2.

Which, I mean, fair enough within these constraints, but it's cited like it's a universal law.

Really all that can be taken away from the study is "we trained a very small model on data generated from it in a particular way, and this was eventually harmful for the model."

Also note that models are nowadays trained on massively self-generated data (task RL post-training) and it seems to significantly improve their performance.


The whole problem with wayland is this mistaken absurd belief that the security standards of a desktop are equivalent to those of a website.

LLMs massively reduce the cost of "let's just try this". I think trying to migrate your entire repo is usually a fool's errand. Figure out a way to break the load-bearing part of the problem out into a sub-project, solve it there, iterate as much as you like. Claude can give you a test gui in one or two minutes, as often as you like. When you have it reliably working there, make Claude write up a detailed spec and bring that back to the main project.


Claude is surprisingly good at GUI work I've been learning, not just getting stuff working but also creating reasonably tasteful and practical designs. Asking claude in the browser to mock up a GUI and then having claude code implement it is a surprisingly powerful workflow.

I’m far away from a web developer or a web designer. But I think I intuitively understand how to put myself in the shoes of the end user when it comes to UX.

I noticed that Claude is awful at understanding what makes good UX even as simple as something as if you have a one line input box and button that lets you submit the line of text, you should wire it up so a user can press return instead of pressing the button or thinking about them being able to tab through inputs in a decent order


yeah as it's not using its own flow you have to give it a bit of feedback. so it goes with any dev work... I think you underestimate how bad programmer uis are.

the goal would be to write it a reusable prompt. this is what AGENT.md is for.


IMO the biggest omission in Unicode are game controller button and keyboard emojis, as very frequently arise in game tutorials.


U+20E3 COMBINING ENCLOSING KEYCAP


Oh shit I didn't know! Amazing! Is there a COMBINING ENCLOSING BUTTON too? We'd also need SHOULDER, SHOULDER BIG and THUMBSTICK, then we'd have something.


Most games aren't shipping with full-fat unicode support or typefaces that could display those icons, though. Plus it'd start to break down with controllers that aren't simple A/B/X/Y.


By "game tutorials", I think they mean modern successors to the role GameFAQs used to play.

There is a combining character that, by its description, sounds like it should be implemented to do the desired thing (U+20DD Combining Enclosing Circle), but my fonts don't render it very well when I stuff geometric characters matching the PlayStation buttons into it.

Without spaces: △⃝□⃝×⃝○⃝

With two spaces between each one so you can see how "enclosing" is getting interpreted: △⃝ □⃝ ×⃝ ○⃝

For the Markdown renderer I'm working on to replace WordPress for my blog, I resorted to shortcodes which resolve to CSS styling the `<kbd>` tag with `title` attributes to clarify and the occasional bit of inline SVG for things where I didn't want to specify a fixed font to get sufficient consistency, like PlayStation button glyphs.

https://imgur.com/a/1EPm7QV

(In all fairness, it's a nerd-snipe made based on the idea that I'll be more willing to blog about things I have nice tools for. I don't currently typeset button presses in any form.)


I was thinking of ingame tutorials, but now that you mention it GameFAQs and forums would be a great usecase.


*nod* As-is, we're stuck with hacks like custom shortcodes and emoji.

...though, given the inconsistent naming of consistently laid-out buttons, I think anything that makes its way into Unicode should include something that follows the lead of what Batocera Linux does on their Wiki and with custom emojis in their Discord.

See https://wiki.batocera.org/configure_a_controller for an example of how they look inline but the gist is that it's an outline of the SNES-originated diamond of action buttons that pretty much everyone but Nintendo uses these days and which is embodied in XInput and the SDL Gamepad API, with one of the circles filled in to represent the button in question.


With more and more players expecting emoji support in text entry boxes, more games are starting to ship with full unicode support. Also I've noticed optional ligatures that OpenType supports have become a big style thing in certain game genres/by certain studios. Harfbuzz's "Old MIT" license shows up more often in the credits of games and game engines.

I don't know if that's a good reason to push to standardize controller glyphs to Unicode, though.

(ETA: Plus the other obvious reason more games and game engines are bringing in full Unicode support is localization, especially for Arabic and CJK. We're a bit past the point where AAA games only feel a need to support EFIGS. The Middle East and Asia are huge markets, especially for mobile games.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: