Uh, that's not what NIMBY means. "NIMBY" means we all agree that something a Good Thing™, but I want it placed in someone else's neighborhood instead of mine. It doesn't apply to any possible objection to one's neighborhood.
That's what amuses me about this whole saga. Out of the successful tech companies, Amazon has by far the worst reputation as an employer (mediocre pay compared to other top tech companies, and poor work-life balance for engineers - never mind work conditions for warehouse workers), yet all these cities were bending over backwards to entice Amazon to move to their cities.
Totally, I don't know why anyone these days would accept a pitiful $140,000 total compensation pay as a new grad software development engineer these days. That's practically slave labor... \s
People can still be taken advantage of with a mighty sum of money, surprise surprise. Money doesn't make everything right always. It's all about opportunity cost (how long does one's 20s and early 30s last?), the value this young fresh labor provides, how much life is sucked out of their enthusiasm in the process (i.e. jaded expectations carry on for a long time, emotionally), and the naivete combined with ample time to devote to the Bezos cult that's exploited with perks and status that amount to, sincerely, working for the man, in the clearest sense possible (I mean you could up that game with something like Palantir.)
If one's real happy working for Amazon though, I wish them luck and continued prosperity, and a bit of willful ignorance to carry on. For sure it looks real good on the resume for the next gig.
Exactly. NIMBY has become one of those largely meaningless cheap shots like "politically incorrect" that gets used to tar anyone who wants something different from what you do. It's much more properly applied to things that no one really wants next to them but that have to go somewhere.
Your proof is flawed. The CPU has access to the complete current program state, and also complete knowledge of its own hardware. A static compiler has neither. Therefore, it's not at all clear that a compiler can do whatever the CPU can do.
Example: the best order to run a sequence of instructions could depend on which inputs happen to be in the L1 cache at the time. This could differ from one execution to the next. There's no way for a static compiler to get this right.
You don't need access to the full current program state, most of OOE can be done with simple graph coloring and the knowledge of the number of registers. A static compiler will know the number of registers available as well as several other hardware intrinsic features (after all it has to use them).
On a VLIW a lot more features would be necessarily exposed and the compiler will have to take advantage of them.
Your example can be optimized by a compiler trivially by optimizing for cache-locality, something compilers already do. It simply means that if you access memory address X and your code accesses this code elsewhere, the compiler will try to keep those two closer together.
Making a simple prediction about cache contents is trivial for compilers and as mentioned, already happens. You can use graphs to build up dependencies on memory and then reduce the distance between connected node points in the execution path. Since this is VLIW and we might be able to tell the CPU which branch is likely we can even not do this in favor of optimizing the happy path better.
A modern optimizer is a very complex beast, it can certainly know some things about the state of the program during runtime and it will make some assumptions about it (enable -O3 if you want to test). Most certainly it is able to optimize your example in atleast a minimal fashion on more aggressive settings.
The CPU pipeline to my knowledge does not optimize by L1 cache as checking contents of the L1 cache is still rather expensive and the lookahead in the command queue is usually limited to a few hundred instructions. Hitting L1 is still a magnitude slower than hitting a register and very expensive to do for every memory access instruction. The pipeline tends to favor using branch predictors and register dependencies, which is simpler and faster, as well as some historical data about previous code run.
> FENCE.I does not ensure that other RISC-V harts’ instruction fetches will observe the local hart’s stores in a multiprocessor system. To make a store to instruction memory visible to all RISC-V harts, the writing hart has to execute a data FENCE before requesting that all remote RISC-V harts execute a FENCE.I
Yikes. That sounds cumbersome for multithreaded code patching systems, like modern JIT compilers. (A "hart" here is a hardware thread.) Sounds like all threads must poll periodically to check whether they should run a FENCE.I, and when they do so, they report that they've done it. Doesn't sound like a lot of fun to implement, though maybe better in software than hardware?
I don't know enough to say if this is accurate or not. However there are working groups reviewing the memory model[1] (also implementing fast ISRs[2]) so if there are performance problems in this area then they're being looked at.
No. The point of the article is that it's ok to undo an abstraction that has done more harm than good. Abstraction and DRY is still a good default mode of thinking.
Agreed. To me, the real point is not to be afraid to undo an abstraction that has proven to be more trouble than it's worth. DRY is still a good default mode of thinking.