All the sub-2 second ones are. There is an impressive 3.2s solution that uses ne...

anonymoushn · on Jan 29, 2024

It's sort of wild that there's no vector load or store that doesn't include bounds checks. Even if you make a MemorySegment from 0 to 2^63-1 (~2^15 times larger than addressable memory on x64!) you'll pay for bounds checks on every load and every store.

pron · on Jan 29, 2024

Only for random access. The compiler will hoist bounds checks out of more regularly-accessed loops (when the inlining is right).

In fact, given that the top safe result was very close to the top unsafe result (especially considering the standard deviation), it looks like you need to write truly exceptional code to feel the cost of the random-access bounds checks. So it doesn't seem wild at all that the default is to give everyone safety at the expense of a performance cost that will only be felt by very few (and who can then enable unsafe code if they feel they need that last extra boost).

anonymoushn · on Jan 30, 2024

Bounds checks in a MemorySegment that contains the whole addressable range 2^15 times provide no safety. Obviously.

You've posted that users can use unsafe to avoid the bounds checks, but you're posting specifically about methods provided for MemorySegment that have no equivalent for Unsafe, so you're just incorrect.

"Only for random access" is probably also incorrect. We could maybe have a look at the codegen for any of the top openjdk submissions which currently perform sequential access but increment the read pointer by ctz(movemask(eq(*ptr, c)))+k at each step. It's trivial for a person to prove that the bounds checks are redundant with the loop termination condition given the MemorySegment's length and given that ctz is at most 64, but it seems unlikely that the runtime manages it. It is of course also trivial for a person who knows that the MemorySegment is so large that it only disallows addresses that differ from allowed addresses by up to two bits not used for addressing to prove that all possible addresses are allowed, or at least that they address memory which is allowed to be accessed by a different address if the user masks off the high two bits first.

pron · on Jan 30, 2024

> Bounds checks in a MemorySegment that contains the whole addressable range 2^15 times provide no safety.

That's why safe code can't obtain such a MS in the first place. To get such a segment you have to use a flag (enable-native-access) that enables unsafe operations. We're contemplating offering a MS without bounds checks in those cases (for off-heap only), but that depends on how much that would be useful.

> you're posting specifically about methods provided for MemorySegment that have no equivalent for Unsafe

I'm talking about unsafe code, not the Unsafe class specifically. The internal vector intrinsics don't perform bounds checks.

> but it seems unlikely that the runtime manages it

We're doing okay and, as always, we expect to improve.

Anyway, my point is that the top safe result was so far ahead of most submissions, that it's clearly not "wild" to prioritise safety given how few manage to get to the point where unsafety can buy them a significant boost. It also allows us to trust more invariants and perform more optimisations (once we complete "integrity by default") so that the overall performance impact is clearly positive.

anonymoushn · on Jan 30, 2024

> The internal vector intrinsics don't perform bounds checks.

VECTOR_ACCESS_OOB_CHECK controls some bounds checks performed by ByteVector.rearrange and similar methods. With these bounds checks enabled, ByteVector.rearrange costs ~40x more than pshufb rather than only ~30x more.

This comment of mine is pedantic and silly. Of course I agree there aren't any bounds checks on the memory addresses you access if you are using the intrinsics that don't access memory.

pron · on Jan 30, 2024

It's not a matter of pedantry but about a fundamental view of performance. Operation X can be 100x fasterthan operation Y yet its impact on performance may be anywhere between -3% and 10000%. The precise makeup of trainers' soles may make the difference between Usain Bolt breaking the world record or not, and yet have negligible impact on people's running speed. Saying that it's "wild" that not every trainer has the exact same soles as those of Usain Bolt's confuses performance with specialists' gear. Making safety stronger to allow, say, better constant folding can improve performance overall and at the same time not be the best choice in some special circumstances.

Improving performance demands that there would be no non-delineated code that could result in miscompilation or undefined behaviour. We don't want to tell people, well, we've introduced a new default optimisation but it means that it's possible that some transitive dependency you may not even know about could result in undefined behaviour. That might be okay for C (where deep dependency graphs are very, very rare), but it's certainly not what people want from Java.

anonymoushn · on Jan 30, 2024

I don't think you're replying to anything in the comment under which you posted.

pron · on Jan 30, 2024

Oh, I'm sorry; perhaps I've gone on too many tangents. The original comment said something about it being "wild" that the vector operations perform bounds checks, and I explained why that's the most sensible thing to do in Java (indeed, we currently offer no standard API for any on- or off-heap operation without bounds checks; this isn't unique to vectors), and added that we're considering adding a bounds-checks-free MemorySegment for off-heap access that would turn them off (that would, of course, require opt-in because it would be unsafe, just as the "universal" MS you mentioned requires opt-in as it, too, is unsafe [1]).

Is there anything else you wanted to know and I didn't answer?

[1]: You may ask, why that universal MS doesn't disable bounds checking, and the answer is that it complicates the implementation, but a specialised MS could do it.

anonymoushn · on Jan 31, 2024

No questions. I'll try to bug the mailing list about SIMD kernels that use a lot of rearranges. That MS sounds exciting!

pron · on Jan 29, 2024

To me, the more interesting aspect is not the fastest solutions, but the variance. There's an almost 2-orders-of-magnitude difference between the fastest safe solution and the baseline entry. When looking at an earlier run (before people started copying off solutions from one another), the standard deviation was 10s, which is 3x bigger than the fastest safe solution; even the standard deviation for the top 20 was more than 4s. In other words, clever algorithms and "mechanical sympathy" makes a much bigger difference than unsafe code, and only few of even the top performers got to the point where unsafe code makes a significant difference.