Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Wikileaks To Leak 5000 Open Source Java Projects (steve-yegge.blogspot.com)
487 points by rimantas on July 28, 2010 | hide | past | favorite | 126 comments


In a global header long time ago, a friend of mine (and big "C" lover) did this to his best buddy once (a C++ and especially boost lover):

#define class struct

#define private public

#define protected public


This gets even more fun in languages which allow unicode names. Did you know there are dozens of identical-looking ways to spell "true"?


continued "fun" (read dangerous) redefines:

  #define true false
  #define false true

  #define true 0
  #define false !true

  #define if while
  #define continue break
:)


  #define true false
  #define false true
Did you know that if you do both of those, they cancel each other out? True story.


My head hurts. Explanation, please?


There's a rule in the preprocessor to prevent infinite recursion: only replace each symbol once. The first time it encounters 'true', it's replaced with 'false'. The first time it encounters 'false, it's replace with 'true'. It's already seen 'true', so it throws up its hands.

The consequence of this rule is that cycles of #defines don't do anything. Edit: Well, that's only true if there aren't other symbols introduced during expansion. Here's a counterexample that prints 3:

   #include <stdio.h>

   int x = 1;

   #define x 1 + y
   #define y 1 + x

   int main(int argc, char *argv[])
   {
     printf("%d\n",x);
     return 0;
   }


Ah .. Makes sense. Thanks.


Dunno about explanation, but experiment shows it to work at least with two preprocessors - Microsoft's and GCC:

First make a file a.c that has in it:

#define true false

#define false true

int A = true;

int B = false;

then from the command line, do (with the Microsoft compiler): D:\>cl -E a.c

int A = true;

int B = false;

Now with gcc (I'm using cygwin's gcc): D:\>sh -c "gcc -E a.c"

int A = true;

int B = false;

I've stripped the other stuff.


  #define TRUE random()%2


You are truly devious


#define struct union


That is by far the evilest of the bunch. }:-D


I have an innate fascinated horror of the C preprocessor already, but this really drives home exactly how horrible it can be!

(While being quite funny, mind you!)


Immense power and immense horror go hand in hand.


"With Great Power Comes Great Responsibility" :)


I once opened Squeak and said "True := False"

The image halted immediately.


"If I buy you a house and put the title in your name, but I mark some of the doors 'Employees Only', then you're not allowed to open those doors, even though it's your house. Because it's really my house, even though I gave it to you to live in."

Love it!


This actually reminded me of Apple...


The difference between phones, and all other products, is that phones don't work unless you're currently in a business relationship with some telecom or another. It's a bit more like selling someone a nuclear reactor—they're beholden to your rules as long as they have no other source of fissile materials.


In 15 minutes, with 15 euros, I could be out the door and have a new 'business relationship' with a different telecom provider. It's not that big a deal: you just pop in a new sim card.

Sure, in the US people have all these locked up phones, but that's the price you pay for getting subsidized hardware.


Indeed, that is the elephant in the corner of the room. One might almost say the typical consumer doesn't understand the difference between free and in speech and as in beer...


Not to mention the higher rates, thanks to virtually zero regulation of the telcos here in the US.


Higher rate my foot. Cell phones in Switzerland and Germany are crazy expensive. It costs people like 0.20 cents to call someone with a phone, and I pay like 0.25 cents or something. When I used to have a phone in Germany, I used to spend upwards of 90 euros a month calling like I was used to in the states, where I basically have as much calling as I need for $50 (including tax and all that sneaky bullshit). Cell phones in europe are a little cheaper if you don't talk much, but if you really want to use it as a primary phone, they are insane expensive. Thats why everybody texts like crazy here, but even texts are expensive.


How much do you call to get your bill up to 90 euros?

In Finland, 3000 minutes of talk time and 3000 text messages costs about 38 € on a major operator. You can often get a switcher discount to bring the price further down.


Is that 0.25 cents (as written) or 25 cents ?


I hate it when I do that! Obviously 25 cents. http://verizonmath.blogspot.com/2006/12/verizon-doesnt-know-...


He probably meant 0.25 hectacents :)


Having worked on both telephones and nuclear reactors, I can say for sure they have nothing in common.


They're both the subject of vast public ignorance in regard to the hazards. :D


On one hand clever, on the other hand, misjudging the probabilities of personal danger is subject to 2 or 3 common cognitive biases, so the public everywhere always has been ignorant of the true hazards.


The both produce harmful radiation!


That is not even wrong. Look up non-ionising radiation. Either that, or great troll.


Can't we have one conversation on HN that doesn't turn into an Apple / iPhone conversation...

This article was about something totally unrelated.


What about:

"I worked hard to deprecate that code that I worked hard to create so I could deprecate some other code that I also worked hard on,"

Why would you want to work to be features that are superseded and should be avoided?


Yeah, it's fine for our software but not their music


Is open source music a particularly big phenomenon?


Open-source is irrelevant here. It's (1) "I own this copy of music/software and can do anything I wish with it" vs (2) "I licensed this copy of music/software from you on the terms we have agreed on, and I will adhere to these terms".

Both music and software are usually licensed (2), but people often disrespect music license thinking they are entitled to more rights for some reason.


It's getting there. Check out http://ccmixter.org/ .


[deleted]


As far as I know, sheet music is usually licensed under traditional copyright and remixing/sampling is usually licensed under fairly complicated closed licensing terms, for the cases where samples & remixes are cleared legitimately.

Creative Commons licensing of sheet music & samples would count, but I don't think that's what the parent comment was alluding to - it was a misunderstanding between the concept of "open sourcing" something and the concept of licensing it from the author under more restrictive traditional copyright terms.

Wow, this is a lot less snappy when you have to spell it out.


...run through a Perl script that removes all 'final' keywords except those required for hacking around the 15-year-old Java language's "fucking embarrassing lack of closures."

This guy feels my pain! :)


I like it, but making fun of Java (and Java's "thought leaders") is just too easy. It's like shooting fish in a barrel. With a grenade launcher.

Eclipse is another matter: far more ridicule is called for. Far, far more.


Exactly: Or Smalltalk or Ruby for that matter, which both facilitate information hiding via private access. It's great to feel superior, isn't it?


Yeah except in Ruby (I don't know about smalltalk) private access has the loophole of the send method. FreeHouse.send(:open_employees_only_door)

Matz would never deny us of our right to snoop around where we don't belong.


Christ, this guy never ceases to amaze. His talk at I/O a couple years ago was absolutely fantastic: http://www.youtube.com/watch?v=BttI-y9VzXQ


One of the points he makes is that VMs are obvious for the purpose of language interop. Seems like C and Lisp (and D, another language he mentions) have been interoperating with each other without a VM in common for quite a while.


But it's not a many to many mapping, that's the point he was making. Lots of languages interop with C, but you can't just import a ruby module into a python script very easily. In fact if you want to link to C libraries from D you have to modify the C header files to turn them into D modules.


I think Microsoft actually did a reasonable job with the "original" COM (ie OLE 2.0 - the stuff that let you put an Excel 95 spreadsheet into a Word 95 document).

It had a bit too much bolted on, probably due to its origins in OLE, and I think "dynamic OLE" was a tragic mistake. And its reference counting approach was a lot more appropriate in the early 90s than it is today (where a compacting GC may be the smart choice, performance-wise). But for making calls between objects in C, C++, VB6, Delphi, and pretty much every other language that lived on the Windows platform, it worked really well.


For a compacting GC, you'd have to hunt down a complete set of references from running programs written in unknown languages, with strange errors if the set is underapproximated even once. I'd rather have reference counting.


Sure, but it doesn't seem intractable to be able to design native languages to be able to interop. Lisp has been around since the late 50s, C since the 70s. The fact that they interop at all is kinda awesome.


It works but it is WAY HARD. Interact from ruby to common lisp. On a VM its easy a call from Jruby to clojure is not to hard. Its stupid to implement your own GC with every knew language and there are way more things that every C-Based language has to reimplement. Thats why it always took around 10 years until a language got really good and fast.

Today you can have a language in 2 years with the biggest set of libs out there.


It's interesting to see how important compiler-enforced access specifiers were to C++ and Java. It seems they are going out of fashion. In the Python world they were always considered a bit silly.


The languages have fundamentally different goals.

If you explained Unix file permissions to a Windows guy, he'd probably think it was silly. They exist for a purpose though- just not a purpose the Windows guy has spent much time with.


This reads to me as "let's make all state mutable and remove all indication as to which bits of the code are the interface, and which are implementation details". I like immutability and minimal interfaces, they make it easier to reason about what the code will do, freeing brain cycles to work on more interesting things than "but what if someone modifies foo between invocations of bar".

If you need access to a library's private or final fields, the library API is badly designed. Or, if you're that keen on wanting to change how the library works, fork it. That's kind of the point of open source.


You've got a point, but I think final class modifier which prevents extension often hinders Unit Testing and never adds much value.


final class modifier is pretty useless.

private final member variables? Invaluable.


Can someone please explain what this means to people not familiar with Java? I'm confused because I thought open source meant open source, as in "all of the source-code is available." What does 'open source' mean in this context, if not "open'??? When I read the title I thought it was a joke. Thanks and sorry for my ignorance, I'm a lowly perl hacker.


A big part of OO languages is encapsulation. Private fields encapsulate state and private methods encapsulate implementation details. The intention there is to make the code more robust to restrict the way collaborating classes can interact with it.

Example, if you have a type representing a game score, you may want to implement a public Increment() method instead of letting other types access the score value directly.

The author of the post is pointing out the irony of people saying "this code is totally open" yet forcing you, ostensibly for your own good, to interact with it in a prescriptive way.


Thanks for the response, but can you not see the source code of these private methods? And wouldn't that allow you to re-write them yourself to behave however you want?


A sign of experience with open source: You become incredibly reluctant to hack directly on the source of a popular library. With great power comes great responsibility.

In Drupal, for example, there is a saying: "every time you hack core (or a module you didn't write) god kills a kitten." In the spirit of open source, we probably borrowed that saying from some earlier project, because it is generally true.

The standard Java String library is the same for everyone. [1] If you download Random Java Library X, and X works with the String library, you can probably be assured that X has been tested with the standard String library. As soon as you change one line of the library this is no longer the case. You must now face the possibility that your "minor" change will lead to side effects when combined with other things, and the responsibility for finding those side effects is now entirely yours.

Plus, the sheer mechanical tedium of preserving your patch, making sure to apply it to every new version of the library as it comes out, relearning how the patch works every few months, porting the patch when it fails to apply cleanly to a new version, figuring out how to distribute your personalized package to others because they can no longer simply `apt-get` your package from the canonical repository, dealing with the fact that the standard docs and the published books might not cover your variation...

Tools like Github have made all this stuff much easier, but it's still a bad idea to tinker with others' libraries without a good reason.

The more typical advantage of open source is that you can read exactly what your library is doing, which makes it easier to figure out how to work around it without actually editing it.

---

[1] Until it isn't. But at that point it will generally get a different version number, and an official release notice, and it will have a community that is aware of the change and will promptly coordinate to find and fix any new incompatibilities with other libraries.


Yes.

If you over-analyze the joke too much it becomes less fun :)


I agree, this is a worrying possibility. Luckily, with a good reflection library, you can have other obfuscated parts of the code that check for such malicious tampering and abort.


And than you will accept that either you're now forked it and won't be able benefit from future releases as in drop in replacements, or you'll have a hard time implementing the same changes you did before over and over again for each new release.


I sympathize, yet encapsulation is appropriate for some projects. Not all users of a library want to frequently update to their own code. Forbidding encapsulation and deprecation would increase this cost. Also, it's comforting to be able to refactor the guts of a class without harming users.


The issue is less about encapsulation and more about language-enforced encapsulation. Encapsulation is good, but encapsulation that's enforced by the language is debatable.


If the language isn't enforcing the encapsulation, how is it encapsulated?

For me, encapsulation comes down to "What I hide, I can change. What I expose, other types may couple to in an inappropriate way."


> If the language isn't enforcing the encapsulation, how is it encapsulated?

What do you mean by "enforce"? Java's private modifier doesn't enforce encapsulation. Javascript's objects do not have a private modifier, but still provides encapsulation via closures. It's hard to have a meaningful discussion when loose terms like "enforce" are thrown around.


Sure, the private access modifier doesn't strictly "enforce" encapsulation. Perhaps the term should be "language supported".

I guess, for me, that the point of private is to clearly communicate the intent of the interface (small "i" interface) of a type. That intent is generally "don't use this, use this other part instead" or "if you couple, to this, it may break on you".

There are other ways of expressing that intent, I just really like having the compiler help me and my collaborators from making stupid mistakes.


Most dynamic languages use convention, which seems to work pretty well in practice. My take is that if other developers are accessing the encapsulated parts of your library, then your API or your documentation is broken - possibly both - and you should, like, fix that. Not use language features to lock them out.


One of the things that I have used access controls for is to simplify the exposed surface that collaborators work with. This isn't a condescending "I don't trust you" intention. It's more along the lines of "of all the types and methods here, you only need to know about this small subset". It's customer service, I tell ya!

And if the API isn't sufficient by not exposing enough, that's fine. It's always easier to expose something later than to make it private later.


Maybe easier for you (very debatable), but never easier for the guy who needed something exposed today to get his job done.

I have never been bitten by over-promiscuous code entries in Python. The times I've gone beyond the published API, I knew I was doing it so I knew I had to keep track of it. And I've gone deep here (replacing Django's database handling in their unit testing framework).

On the other hand, I can't count how many times I've been stuck in Java figuring out how to get around somebody's final class or private method that I really needed to tweak just a little bit or, worse, I needed access to a field I can see in my debugger. Needing to reflect through to get at it is STUPID.


You may have a planes-that-crash bias there. You don't notice all the times you benefit from a well encapsulated service because it just works the right way. The rare handful of times when something that would be useful is marked private is what sticks out in your mind.

I know that I would prefer to work with an interface with 10 public classes with 50 public methods than an interface with 100 public classes with 5000 public methods.

The conceptual weight of wading through all of that stuff has a cost. There is often value in not knowing or caring about implementation details.

Although I do agree that final/sealed is generally just mean-spirited and pointless.


Your argument assumes that you need language-enforced mechanisms in order to define a well-encapsulated API. That is simply not that case, as the many well designed APIs in Python, Ruby, and Lisp can attest (and many poorly designed APIs in Java as well). You can use documentation, conventions, and other mechanisms to define the publicly exposed API so one doesn't have to know the guts.

But when I need to deviate from that API, for whatever reason is important to me but not to the library designer (from a bug to a weird environmental issue specific to me), if I can't get done what I need to, the language is getting in my way instead of helping me.


I totally agree that language-enforced mechanisms aren't the only way to get meaningful encapsulation. People can (and do) write good code or bad code in any language.

I was just trying to point out that there are both benefits and drawbacks to language-enforced encapsulation and the benefits may be less obvious than than the drawbacks, which are generally more painful.


You have a good point there. You generally have to think about what you are exposing if the language demands it. And I think it is great when a language guides you into doing the right thing. Guides being the operative word.

Given the choice, I'll choose consenting adults. :)


> It's always easier to expose something later than to make it private later.

...if you're using Java, since (AFAIK) you can't switch between public fields and getters/setters. Python, however, has properties: http://www.python.org/download/releases/2.2.3/descrintro/#pr... . Point 4 of http://dirtsimple.org/2004/12/python-is-not-java.html covers why they're a good idea, though you can probably figure it out from their description.


Yes, I do a lot of work in C#, which has properties that are code-compatible with public fields. The last time I did a Java project, I was surprised at how much I missed them.


Encapsulation is completely orthogonal to access modifiers.


Your view differs from some of the OO language designers. For example, Stroustrup defines encapsulation as "the enforcement of abstraction by mechanisms that prevent access to implementation details of an object or a group of objects except through a well-defined interface. C++ enforces encapsulation of private and proteced members of a class..."

http://www2.research.att.com/~bs/glossary.html#Gencapsulatio...

Since this idea is not obvious, would someone mind explaining?


Every useful software engineering term is actually undefined, until you define it for the purpose of some discussion. Encapsulation, strong typing, object orientation, you name it, it's undefined. By undefined I do not mean "completely without meaning", but that the term is used so many ways that the information content of pointing at something and calling it "encapsulation" is actually very low.

Some OO traditions choose to combine encapsulation with language-enforced access control. Some schools then teach that if you don't have enforced access control, you don't have encapsulation. They're right... by their definition. They are not right by all definitions. If you don't lay out the definitions you are using when you explain whether one is necessary to the other, you're just making undefined statements. And usually one will be related to the other by definition, which means the other basic alternative is to make a vacuous argument.

I say that like a lot of other things that are mistaken as language features, encapsulation is an attribute of the program, not the language. Encapsulation is when there is a clear boundary of code that accesses a certain data structure. I have seen many C programs that have perfectly well encapsulated data structures, despite the lack of language support for access control enforcement. But that's just my definition. It is not the definition.


> Your view differs from some of the OO language designers.

Oh come on, that's not true at all. C++ just happens to use access modifiers to provide its brand of encapsulation. However, languages without the `private` reserve word can still provide encapsulation -- they're not the same thing.

What if Java had no notion of private? It would be very difficult to provide data hiding (not that they're hidden anyway, but that's beside the point), so instead you would be forced to put little flags on your names and warnings in your documentation delineating the parts that people shouldn't touch. If they did then that's their fault no?


Encapsulation can be achieved quite efficiently if you have closures ... you can have good encapsulation (i.e. preventing access to the implementation details of an object) even in Javascript.

Many dynamic languages also have tools you can use to make your life easier ... with Python I'm using pylint/pychecker to keep me honest. My Emacs instance screams at me whenever I access a protected field of some object.

Also ... private/protected fields or final classes have caused much trouble for me. Overriding the behavior of a class is the easiest way to workaround various bugs without modifying the original source ... which in some cases is a PITA, while in other cases is impossible.

I once worked on a Java project that used a commercial library with no source-code ... to fix a stupid bug I had to manipulate the bytecode at runtime. Which shows again that private/protected/final access modifiers are pretty useless as guarantees ... a determined developer can get passed them.

It's just that you begin to hate life a little bit more.


Some context:

WalterGR commented on reddit that this rant is actually in response to this tweet by Marco Tabini:

-------------------------

"@ijansch Private has absolutely no useful role in open-source code." ( http://twitter.com/mtabini/status/18867470296 )

-------------------------

Marco is the co-founder of "a consulting firm that specializes in information architecture, code and security auditing, large-scale deployments and optimization".

More information:

http://www.reddit.com/r/programming/comments/cusyw/wikileaks...


First note: I have never dealt with the internal workings of the Java VM, so this is just speculation. I also haven't tested any of this, but its still fun to speculate.

In java private variables and functions are not accessible from outside the class. This means that the compiler would be able to make some assumptions about the nature of these members in the effect of optimization. When calling a public function of another class in java, I suspect that the name of the function is mapped to the actual bytecode at invocation time after being looked up in some sort of trie/tree/hashtable. So, for every function call or varaible access, there would have to be a lookup. On the other hand, if the members were declared private, the compiler could directly link a caller to the function and skip the lookup. If this is the case, then setting all these projects' sources to use only public would mean a substantial performance loss.

I would love to be corrected if this is not the case, as I haven't taken a course in OO compilers yet.


In java private variables and functions are not accessible from outside the class.

Not quite - using reflection you can dig into private fields and do your worst to them, you just need to tag each private member's Field object with with field.setAccessible(true) before accessing it reflectively.

The only hitch is that you might get a SecurityException, but you can avoid that if you're running your code on your own JVM (by default, it should work just fine, it's if you're deploying applets or something like that where you might get the exception due to the different sandboxing rules).


All Java method calls use the virtual function table, if for no other reason than to be able to throw a "MethodNotImplementedException" (or whatever the actual exception is, can't remember now). You can also inject code into the call stack using aspect programming.


Clojure has tools to make it easy to bypass private/protected :

http://richhickey.github.com/clojure-contrib/java-utils-api....


Granted it's just using Java calls (no magic), but still fun.


Are those called wall-hack for the reason I think they are? If so, awesome.


It's a spoof alrticle -a joke - and "private" and "protected" were never about the legality of letting other parties access things - they were just conveniences for the developers and the tools to make it clear what could be optimized how, and what should and should talk to what.


Hm? I was referring to the Clojure link in the comment above. (And they're the functions linked are no joke)


I have to say, for me private methods and fields are not a security thing. They are a way to keep the public interface clean and only confront the user with the information they need.


Not just for cleanliness but more importantly change protection: denying direct access to the internals assures that you can rewrite those internals and not break any external dependencies.


I thought Java had interfaces for public APIs?


And that's the problem.


Why? It conveys a piece of information. Would you feel better about it if instead of writing "private" in front of it, I would write a comment "it is probably a bad idea to use this method in your own code"? What would be the gain - it is simply more verbose?


There is no reason to use a private/protected method. If the library/class you are using doesn't fulfill the qualifications you want, expand the library, not the code that uses the library.


The doubly great thing about this article is that maybe Steve Yegge will start blogging again. :)


Personally I would prefer a script that made all variables private and removed setters, but I suppose that would be more useful than funny.


The problem with blocking the intended side-effects in side-effect-driven programming is that the programs will break. A script to re-write all Java into Clojure would be useful, too, but really kind of not the point.


And all classes final by default.


Awesome. Tweaks Java, Agile League of Agile Methodology Experts (LAME), Wikileaks, Oracle on one short non-steve-yegge-length post.


It wasn't until I re-read this that I noticed the two fake 'more news' entries.


If you really try to make the OO as beautiful as humanly possible, you neither make the member variables public nor do you offer getters and setters. Everything is realized by functions which have a meaning, you really give the class the power to manage itself, not be managed from the outside with getters and setters! If you successfully adopt this thinking pattern then you have so many problems less...


"If I make something private, it means that no matter how desperately you need to call it, I should be able to prevent you from doing so, even long after I've gone to the grave."

I am pretty sure this can be defeated without taking private out. Reflection will get you there with the security checking turned off.


I can't decide whether this is serious or not. Don't doubt the actual wikileaks announcement just the comments from the various sources seem too funny to be true.

favorite: "But use it exactly how I tell you to use it, because fuck you, it's my code. I'll decide who's the goddamn grown-up around here."


Really? Why would wikileaks care about the source to open-source java applications


For a second I was actually hoping someone leaked Google's internal stack and all of it was GPL.


Even if it was GPL, it doesn't mean they would have to give anyone the source code.


I think this could be a good example for making a program that creates jokes for programmers. In essence, substitute words

Goverment => Java Privacy => Private Method Public Alarm => ...


This is the best thing to happen to Java. Hopefully, this gives that community a ladder to catch up to the innovation around true open source communities.


"and turns all fields without getters/setters into public fields."

Why not all fields perid. Even those with getters,setter?


Does Java have something similar to .Net's reflection?


I'm not sure how .NET's reflection works, but it does have reflection, where you can say stuff like "tell me all of String's methods" and for each of those you can find out public vs private vs protected, final vs not, etc.

See http://download-llnw.oracle.com/javase/1.3/docs/api/java/lan...


It looks very similar, albeit a bit tidier than .net's implementation.


Considering the history, don't you mean "Does .NET have something similar to Java's reflection?"


No, if I meant that, I would have written that. I know much about .net, and not a great deal about java.


I don't understand the up-votes on this. He asked a reasonable question and phrased it with respect to his knowledge base. Your reply assumes he knows the answer to the question he's asking. Huh?


My comment was a joke. (Which I thought appropriate given the root of this thread.)


As a (mostly) .Net developer, I was curious, so I looked it up. Yes, you can use reflection in Java to invoke private methods and access private fields in much the same way you can in .Net


Or... Did the .NET/C# folks copy the reflection from Java or invented it themselves?


This reads like an Onion article.


Modulo the funny.


You don't think the Onion is funny?


he's saying this reads like the Onion without the funny.


[deleted]


Easy. Because I'm laughing at it.


Ahhh... That was the sigh of vindication after reading this post and confirming my opinion of Steve Yegge.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: