infberg's comments

infberg · 2025-12-11T10:51:21 1765450281

Hey, big fan of the vantage instance finder. Would it be possible to add instance type labels similar to what AWS calls them on their website - "storage-optimized", "general-purpose" etc.?

I find this often useful to quickly compare similar instance types, e.g.: m7g vs. m8g vs. m9g.

StratusBen · 2025-12-11T15:58:58 1765468738

Good feedback! Having someone on our team get this completed for you.

infberg · on Feb 14, 2022

Not sure whether you are aware but this issue seems to affect different vendors like Lenovo as well who can't find a solution neither.

The same issue is being reported all over lenovo linux forums as well and for example on https://github.com/erpalma/throttled/issues/255 you see lots of lenovo people reporting the same.

I personally suffer the same issue on my X1G9, the best fix so far is the unload/load all related kernel modules one which makes throttling at least only drop to 1.2 Ghz.

infberg · on Nov 25, 2021

https://devblogs.microsoft.com/dotnet/loop-alignment-in-net-...

Is a good post in regards to code alignment effects.

hyperman1 · on Nov 25, 2021

That's an article worthy of its own HN post. Thanks

infberg · on Oct 17, 2021

Care to explain? -O3 generates larger code than -O2?

pclmulqdq · on Oct 17, 2021

Yes, -O3 tends to include a lot of features that increase code size, like aggressive loop unrolling. If you are jumping around a large amount of code, -O3 generally performs more poorly than -O2, but if you are running a tight loop (like HPC code), -O3 is better.

In the past, at a time when I worked on a very performance sensitive codebase that was also limited in scope, we compiled with -Osize and did all the loop optimizations we wanted manually (and with pragmas). That produced faster code than -O2 or -O3.

gnufx · on Oct 17, 2021

Regarding unrolling, -O3 contains -funroll-and-jam but not -funroll-loops. You may want one or the other, maybe both, depending on circumstances. I don't see much benefit from the available pragmas on HPC-type code unless for OpenMP, and "omp simd" isn't necessary to get vectorization in the places I've seen people say it is. Mileage always varies somewhat, of course. (Before second-guessing anything, use -fopt-info.)

boibombeiro · on Oct 18, 2021

Modern x86 CPUs have micro instr caches to store small loops (about 50 instr) and medium loops (~2k instr). Also, the bottleneck is usually the instruction decoding (Alder Lake made huge changes on that, so this might change).

In other words, loop unrolling is, more often than not, harmful.

jleahy · on Oct 17, 2021

It’s a shame that Osize can sometimes produce truly awful code. There are a few optimisations in there that trade a byte for a massive slowdown.

userbinator · on Oct 17, 2021

You asked for minimum size, and that's what you got. I'd say that's working as it should.

A more granular control over optimisation would be good, however.

jleahy · on Oct 18, 2021

Probably just some tweaks to O2 would be enough, after all people are selecting Os over O2 because they see better performance, and that should not be happening.

kevin_thibedeau · on Oct 17, 2021

You can enable/disable individual optimizations. How much more granular do you need?

jhgb · on Oct 17, 2021

Surely a profile-guided build should be able to only apply -Os to those functions where it doesn't cause a lot of problems.

pclmulqdq · on Oct 17, 2021

In the application I referred to, PGO was also used. However, that only applies -Os to cold code, and if what you're doing is very branchy, it can help even in the hot path.

infberg · on July 12, 2021

I agree with you that one can very often get distraced by single events, however knowing that you are frontend/backend bound isn't all that more helpful either.

For frontend you can guess that PGO, BOLT, huge tables might probably help but it's still a blind guess without knowing what to look at next.

Intel's TMA is the only helpful thing here really. Bit sad that AMD and ARM don't provide a way to calculate something TMA-like themselves.