Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, basically everything has to be that way.

At the super fast speeds you start running into things like:

* why won’t the devirtualizer trigger?

* these object headers are sure wasting a lot of cache

* there’s a lot of forced pointer indirection

And you just end up spending vast amounts of time and effort trying to shave off those few microseconds you’re wasting in the JVM. Once you add in all the effort trying to get around the GC, it’s bleak picture for competing with quality C++ without spending far more effort.



Thanks for the insights - I must admit I only briefly ever even used Java so I am not an expert. Do you have any recommendations for documents or a PDF that covers the JVM memory model and scheduler in any depth that makes it appropriate, possibly with applications to hard-realtime systems?

Again, thanks for the response, this was insightful and I'd love to learn more.

Also, an aside - do you have any thoughts on the use of Rust in these same systems? It's a little bit more bleeding edge, but I'm curious to hear an expert's thoughts!


What do you mean by ‘devirtualizer’ and why would you want that triggered? Sounds like something you wouldn’t want to trigger?


The devirtualizer (maybe this is the wrong JVM terminology, it’s basically what clang/GC call it) is part of the optimizer which sees that you have a virtual function call (like most in Java) where there is a unique caller, so you can replace the virtual call with a direct one and possibly inline it.

In the JVM I think this can only be done speculatively (you have to double check the type), but it still matters.


Ah right yes conflict of terminology.

In the JVM devirtualising means making a virtual object a full object again, so the opposite of what you want to be happening.

I don't think the JVM really names the optimisation you're talking about, but it does do it, through either a global assumption based on the class hierarchy, or an inline cache with a local guard.


What do you mean by “virtual object”, an object which has been broken up on the stack instead of a “real” object living as a single entity on the heap?


> What do you mean by “virtual object”, an object which has been broken up on the stack instead of a “real” object living as a single entity on the heap?

Yes... but let's not say 'broken up on the stack' - the object's fields become dataflow edges. The object doesn't exist reifed on the stack - fields may exist on the stack, or in registers, or not at all.


LLVM calls the process of breaking a struct/object into dataflow variables "Scalar Replacement of Aggregates".

https://llvm.org/doxygen/classllvm_1_1SROA.html#details


Yes that's what the JVM calls it - and then the node the ties together the state is the virtual object.


Remember back in cs101 where you had class animal, with subclasses dog and cat. You can call speak() on animal and get either ruff or meow depending on what animal is. Animal is a virtual object. The system needs to do some work to figure out which speak to call every time, the work isn't too bad, but it turns out that cpus basically never have the right thing in the cache, so it ends up being very slow compared to most of what you are doing.

When we say devirtualize we mean that we know animal is always a cat so we don't have to look up which speak to use.


Please don't take others for complete morons.

And that makes literally no sense in-context, you're mapping those words directly to the concept of virtual methods but that's the opposite of the way chrisseaton uses them (and they're clearly familiar with the concept of virtual calls and it's not what they're using "virtual" for), hence asking them what they mean specifically by this terminology.


the jvm names the optimization exactly like he said


IIRC it calls it monomorphisation.


See Aleksey Shipilёv on JVM method dispatch: https://shipilev.net/blog/2015/black-magic-method-dispatch/


Why not just write it in C? And pre-allocate all the data structures, make them global, put them into a queue, and constantly reuse them. No more need to instantiate objects.


This is why C++ is used over Java for very fast stuff, or “just” fast stuff if you don’t want to bother fighting the JVM


Why not just write it in assembly while you’re at it?

The answer to both questions is that many people prefer writing in higher-level languages with more safety guarantees.


This is a bit of a false equivalency -- there aren't many perf benefits in rewriting a C program in asm. The cost you pay for that rewrite is also much greater.

Yes, of course $HIGH_LEVEL_LANG is preferable for many many use cases. In this context, we're discussing "high speed trading systems", for which native implementations are going to be the favorite.


> for which native implementations are going to be the favorite

...but the point of the article is they aren't always the favourite.


They mention not even using most of the Java features (exceptions, built in GC) that make it safe. So I think it's a pretty fair question, since they are essentially using a very stripped down version of the language that removes most of the compelling reasons to use it, and seemingly fighting the language runtime along the way.


The reason these people still use Java is that everything else can be nice high-level Java code.

So they use regular Java for the build system, deployment, testing, logging, loading configuration, debugging, etc etc. There's a small core written in this strange way... but everything else is easier. And things like your profiler and debugger still work on the core as well.


I write C++ for now almost 30 years, and always for the workloads where the cost of allocations was plainly visible, especially in the critical parts of the applications. So I have never stopped writing "non idiomatic" C++, at least from the point of view of typical language lawyers (and C++ attracted them a lot through the years). And I'm surely not the only one: there were different environments where it was recognized, through the times, first, that creating and destroying much objects was very bad, and later, with the growth of the C++ standard library, that even not everything in the C++ standard library should be treated the same, and that are better solutions than what's already available, and that the third party libraries are often bringing even more potential danger.

Depending on the goal, one has to be very careful when choosing what one uses. The good side is, C++ kept its C features: if I'm deciding how I'll do something, I don't have to follow the rules of the "language lawyers." I can do my work producing what is measurably efficient. And compiler can still help me avoiding some types of errors -- others can anyway be discovered only with testing (and additional tools). At the end, knowing good what one wants is the most important aspect of the whole endeavor.


> I don't have to follow the rules of the "language lawyers."

I'm not exactly sure what specifically you are trying to imply here. When you proclaim to ignore language lawyers, it sounds like you are knowingly breaking the rules of the C++ standard. That takes a lot of faith in compilers doing what you meant to do, despite writing code that is incompatible with the standard those compilers implement...


Yep. I always say start with the "better C" part of C++ to get stronger type checking and then add in other features only as needed. All abstractions should be with minimal overhead with a strict pay only what you need policy.


Yep, the same thing I do - just write in C++ as if it was good old C, with very occasional use of templates, containers and exceptions.


To those who miss the point, nobody is here denying that C++ has something to bring, it's just that what it brings isn't what those who promote some fashion would claim that is to be universally used, and, honestly, there's no actual reason to believe such claims, for they being not more true now than at the times where "making complex OOP hierarchies" was the most popular advice -- I remember these times too. Or the times when managers wanted to believe that everybody will just use Rational Rose to draw nice diagrams and the actual programming won't be needed, at all. Every time has its hypes:

http://www.jot.fm/issues/issue_2003_01/column1/

https://wiki.c2.com/?UmlCaseVultures

One size doesn't fit all. Some solutions to some problems could be and are provably better than those typically promoted or "generally known" at some point of time.

If that all still doesn't mean anything to you, please read carefully and very, very slowly "The Summer of 1960", seen on HN some 9 years ago:

https://news.ycombinator.com/item?id=2856567

Edit: Answering the parallel post writing "When you proclaim to ignore language lawyers, it sounds like you are knowingly breaking the rules of the C++ standard."

No. The language lawyers, in my perception, religiously follow everything that enters the standard and proclaim that all that has to be used, because it's standardized. Including the standard libraries and some specific stuff there that isn't optimal for the problem I'm trying to solve. And especially that whatever is newer and more recently entered the standard is automatically better. It's understandable that they support their own existence by doing all that -- it's about becoming "more important" just by following/promoting some book or some obligatory rituals (it's an easy and time proven strategy through the centuries, and that's why I call it "religiously", of course -- and I am also not surprised that somebody who identifies themselves with being one of the "lawyers" wouldn't like this perspective -- you are free to suggest a better name). But it should also be also very obvious that it's not what's necessarily optimal for me to follow, as soon as I can decide what I'm doing. And yes, it's different in the environment where the "company policy" is sacred. There one has the company "policy lawyers", and typically every attempt of change can die if one isn't one of them.


I guess this is for you, in case you haven't seen it already,

"Orthodox C++"

https://gist.github.com/bkaradzic/2e39896bc7d8c34e042b


Haven't seen, but thanks. Lived and worked while following some of the ideas mentioned.

E.g. at the whole bottom of the page in some comment is a link to:

"Why should I have written ZeroMQ in C, not C++ (part II)"

https://250bpm.com/blog:8/

where the author writes "Let's compare how a C++ programmer would implement a list of objects..." and then "The real reason why any C++ programmer won't design the list in the C way is that the design breaks the encapsulation principle" etc.

I have indeed more than once used intrusive data structures in non-trivial C++ code, and the result was easy to read, and very efficient. There I really didn't care about some "thou shalt not" "breaking the encapsulation principle" because whoever thinks at that level of "verbot" 100% of times is just wrong.

The "encapsulation principle" is an OK principle for some levels of abstraction, but nobody says that one has to hold to it religiously (exactly my point before). I would of course always make an API which would hide what's behind it. But where I implement something ("the guts" of something), I of course have my freedom to use intrusive elements, if that solves the problem better. I have even created some "extremely intrusive" stuff (with a variable number of intrusive links in the structures). It worked perfectly. Insisting on doing anything as "ritual" all the time is just so, so wrong.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: