Hacker Newsnew | past | comments | ask | show | jobs | submit | infberg's commentslogin

Hey, big fan of the vantage instance finder. Would it be possible to add instance type labels similar to what AWS calls them on their website - "storage-optimized", "general-purpose" etc.?

I find this often useful to quickly compare similar instance types, e.g.: m7g vs. m8g vs. m9g.


Good feedback! Having someone on our team get this completed for you.


Not sure whether you are aware but this issue seems to affect different vendors like Lenovo as well who can't find a solution neither.

The same issue is being reported all over lenovo linux forums as well and for example on https://github.com/erpalma/throttled/issues/255 you see lots of lenovo people reporting the same.

I personally suffer the same issue on my X1G9, the best fix so far is the unload/load all related kernel modules one which makes throttling at least only drop to 1.2 Ghz.


https://devblogs.microsoft.com/dotnet/loop-alignment-in-net-...

Is a good post in regards to code alignment effects.


That's an article worthy of its own HN post. Thanks


Care to explain? -O3 generates larger code than -O2?


Yes, -O3 tends to include a lot of features that increase code size, like aggressive loop unrolling. If you are jumping around a large amount of code, -O3 generally performs more poorly than -O2, but if you are running a tight loop (like HPC code), -O3 is better.

In the past, at a time when I worked on a very performance sensitive codebase that was also limited in scope, we compiled with -Osize and did all the loop optimizations we wanted manually (and with pragmas). That produced faster code than -O2 or -O3.


Regarding unrolling, -O3 contains -funroll-and-jam but not -funroll-loops. You may want one or the other, maybe both, depending on circumstances. I don't see much benefit from the available pragmas on HPC-type code unless for OpenMP, and "omp simd" isn't necessary to get vectorization in the places I've seen people say it is. Mileage always varies somewhat, of course. (Before second-guessing anything, use -fopt-info.)


Modern x86 CPUs have micro instr caches to store small loops (about 50 instr) and medium loops (~2k instr). Also, the bottleneck is usually the instruction decoding (Alder Lake made huge changes on that, so this might change).

In other words, loop unrolling is, more often than not, harmful.


It’s a shame that Osize can sometimes produce truly awful code. There are a few optimisations in there that trade a byte for a massive slowdown.


You asked for minimum size, and that's what you got. I'd say that's working as it should.

A more granular control over optimisation would be good, however.


Probably just some tweaks to O2 would be enough, after all people are selecting Os over O2 because they see better performance, and that should not be happening.


You can enable/disable individual optimizations. How much more granular do you need?


Surely a profile-guided build should be able to only apply -Os to those functions where it doesn't cause a lot of problems.


In the application I referred to, PGO was also used. However, that only applies -Os to cold code, and if what you're doing is very branchy, it can help even in the hot path.


I agree with you that one can very often get distraced by single events, however knowing that you are frontend/backend bound isn't all that more helpful either.

For frontend you can guess that PGO, BOLT, huge tables might probably help but it's still a blind guess without knowing what to look at next.

Intel's TMA is the only helpful thing here really. Bit sad that AMD and ARM don't provide a way to calculate something TMA-like themselves.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: