Anyone experimented with this yet? I'd like to know how this is resolved when the architecture doesn't support Intel's SIMD approach, they map the objects pretty close to the instructions (SIMD.float32x4.sub and the likes).
I'm trying to figure out what happens when you port this to ARM NEON, and how you catch it with architectures that don't support NEON (they often lack them in Marvell and Allwinner).
I'm a Mozilla engineer involved in this. NEON support is very important and we're designing the spec to support it well.
CPUs that lack SIMD units can support the functionality (though not the performance of course), and there's even a polyfill library that can lower this API into scalar operations for SIMD-less browsers too.
It would be great, if you could detect SIMDable operations in classic JS (e.g. in loops) and use SIMD for interpret them. I think that adding low-level features into a high-level language is not good practice.
We will probably do that too at some point, but it won't replace explicit SIMD, just as widely-available auto vectorization support in C++ hasn't eliminated the need for explicit SIMD extensions there either.
One thing to keep in mind is that most programmers probably won't want to use this feature directly; it'll be used in libraries that expose higher-level APIs. It's still true that every feature we add increases overall clutter, but SIMD seems sufficently useful and sufficiently self-contained that it's worth the tradeoff.
The primitives are pretty generic, just a few new vector types based on typed arrays. Operations on those types are supported on CPUs without a SIMD unit, they're just slower, but not any slower than coding with non-SIMD operations.
What about 8 and 16 bit ints? How about signed vs unsigned? Or what about pixel like data that clamps instead of overflows? What about 64-bit IEEE? What id the SIMD unit is 64 bits wide? Or 256? It just seems so not future and varying implementation proof.
> architectures that don't support NEON (they often lack them in Marvell and Allwinner).
I'm probably nitpicking here, but:
* All Allwinner SoCs have NEON[0]
* Most current ARMv7 processors have NEON. Of the current ARM cores, only Cortex-A5 and Cortex-A9 don't have mandatory NEON support (it's optional). Cortex-A5 is intended for embedded applications. Of the existing Cortex-A9 processors, AFAIK the only somewhat popular one without NEON support is NVIDIA Tegra 2, which is retired. Out of the third party cores, all Qualcomm and Apple ones have NEON support.
Yes, and as jmpe said they have quite a few SoCs without NEON (I think basically everything apart from ARMADA 1500 plus). They seem to be targeting devices like smart TVs and STBs nowadays, so I guess it's not a big deal.
Correct, it was wrong of me to point to "architectures" that lack Neon; you're correct in your reply. I should have mentioned specific implementations. My experience with Allwinners without Neon indeed comes from smart TVs and STB. You know your stuff ;)
It happens in the browser, right? I think ultimately there just needs to be a unified API, or maybe more domain-specific APIs (like BLAS) that map to NEON or SSE instructions as appropriate, and do everything the slow way if they aren't available
SIMD.float32x4 and SIMD.int32x4 classes are available in Firefox Nightly, but without Float32x4Array and Int32x4Array loads and stores are horribly slow. About 100x slower than normal JavaScript in my tests.
I'm trying to figure out what happens when you port this to ARM NEON, and how you catch it with architectures that don't support NEON (they often lack them in Marvell and Allwinner).