Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Suppose you the human are working on a clean room implementation of C compiler, how do you go about doing it? Will you need to know about: a) the C language, and b) the inner working of a compiler? How did you acquire that knowledge?




Doesn’t matter how you gain general knowledge of compiler techniques as long as you don’t have specific knowledge of the implementation of the compiler you are reverse engineering.

If you have ever read the source code of the compiler you are reverse engineering, you are by definition not doing a clean room implementation.


Claude was not reverse engineering here. By your definition no one can do a clean room implementation if they've taken a recent compilers course at university.

Claude was reverse engineering gcc. It was using it as an oracle and attempting to exactly march its output. That is the definition of reverse engineering. Since Claude was trained on the gcc source code, that’s not a clean room implementation.

> By your definition no one can do a clean room implementation if they've taken a recent compilers course at university.

Clean room implementation has a very specific definition. It’s not my definition. If your compiler course walked through the source code of a specific compiler then no you couldn’t build a clean room implementation of that specific compiler.


There is no specific definition of clean room implementation. Please provide source for your claim otherwise.

There are many well known examples of clean room implementation. One example that survived lawsuits is Sony v. Connectix:

During production, Connectix unsuccessfully attempted a Chinese wall approach to reverse engineer the BIOS, so its engineers disassembled the object code directly. Connectix's successful appeal maintained that the direct disassembly and observation of proprietary code was necessary because there was no other way to determine its behavior - [0]

That practice is similar to GCC being used here to verify the output of the generated compiler, arguably even more intrusive.

[0] -https://en.wikipedia.org/wiki/Clean-room_design


“clean room implementation” is a term of art with a specific meaning. It has no statutory definition though so you’re technically right. But it is a defense against copyright infringement because you can’t infringe on copyright without knowledge of the material.

>During production, Connectix unsuccessfully attempted a Chinese wall approach to reverse engineer the BIOS, so its engineers disassembled the object code directly.

This doesn’t mean what you think it means. They unsuccessfully attempted a clean room implementation. What they did do was later ruled to be fair use, but it wasn’t a clean room implementation.

Using gcc as an oracle isn’t what makes it not a clean room implementation. Prior knowledge of the source code is what makes it not a clean room implementation. Using gcc as an oracle makes it an attempt to reverse engineer gcc, it says nothing about whether it is a clean room implementation or not.

There is no definition of “clean room implementation” that allows knowledge of source code. Otherwise it’s not a clean room implementation. It’s just reverse engineering/copying.


Again, reverse engineering is a valid use case of clean room implementation as I posted above, so you don't have a point there.

> “clean room implementation” is a term of art with a specific meaning.

What is the specific meaning you are talking about? If I set out to do a clean room implementation of some software, what do I need to do specifically so that I will prevail any copyright infringement claims? The answer is that there is no such a surefire guarantee.

Re: Sony v. Connectix, clean room is to protect against copyright infringement, and since Connectix was ruled not infringing on Sony's copyrights, their implementation is practically clean room under the law, despite all the pushbacks. If Connectix prevailed, I'm sure the C compiler in question would have prevailed as well if they got sued.

Finally, take Phoenix vs. IBM re: the former's BIOS implementation of the latter's PC:

Whenever Phoenix found parts of this new BIOS that didn't work like IBM's, the isolated programmer would be given written descriptions of the problems, but not any coded solutions that might have hinted at IBM's original version of the software - [0]

That very much sounds like using GCC as an online known-good compiler oracle to compare against in this case.

[0] - https://books.google.com/books?id=Bwng8NJ5fesC&pg=PA56#v=one...


You’re getting confused because you are substituting the goal of a clean room implementation for its definition. And you are not understanding that “clean room implementation” is one specific type of reverse engineering.

The goal is to avoid copyright infringement claims. A specific clean room implementation may or may not be successful at that.

This does not mean that any reverse engineering attempt that successfully avoids copyright infringement was a clean room implementation.

A clean room implementation is a specific method of reverse engineering where one team writes a spec by reviewing the original software and the other team attempts to implement that spec. The entire point is so that the 2nd team has no knowledge of proprietary implementation details.

If the 2nd team has previously read the entire source code that defeats the entire purpose.

> That very much sounds like using GCC as an online known-good compiler oracle to compare against in this case.

Yes and that is absolutely fine to do in a clean room implementation. That’s not the part that makes this not a clean room implementation. That’s the part that makes it an attempt at reverse engineering.


Why do you say it reversed engineered gcc instead of llvm? If you read the code it has much more of llvm concepts than gcc.

Because they used gcc output as a reference spec.

> you are by definition not doing a clean room implementation.

This makes no sense. Reverse engineering IS an application of clean room implementation. Citing Wikipedia:

“Clean-room design (also known as the Chinese wall technique) is the method of copying a design by reverse engineering and then recreating it without infringing any of the copyrights associated with the original design”

https://en.wikipedia.org/wiki/Clean-room_design


There are many ways to reverse engineer a piece of software.

A clean room implementation is one such method of reverse engineering.

A clean room implementation is always reverse engineering. Reverse engineering is not always done using a clean room method.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: