More

esgwpl · on May 2, 2022

This one? https://lore.kernel.org/lkml/1521584015.31174.3.camel@med.un...

esgwpl · on Jan 3, 2022

In fact, sending too many patches to the mailing list is frowned upon:

https://lkml.org/lkml/2005/7/11/336

esgwpl · on Oct 6, 2021

>"run this daemon that tries to beat the kernel OOM to the punch"

Considering that the kernel OOM killer tends to be way too late in doing its thing I don't see how this is inelegant, maybe there's a reason you can't just have the kernel kill processes earlier in the face of memory pressure.

ladyanita22 · on Oct 6, 2021

The kernel OOM is just plain broken. I can't understand how can it be possible that, on Windows, whenever I run out of RAM the OS kills whatever process is consuming too much and the computer keeps running flawlessly. However, on Linux, my computer just... freezes. It freezes and stops responding. Not even the mouse moves. Having to use a userspace OOM is the most inelegant thing I've seen. So I need to have 2 OOMs so that the good one can beat the bad one? It's so redundant, it's plain stupid. How come NOBODY is doing anything. If I knew c++ I would for sure send a patch.

rnhmjoj · on Oct 6, 2021

The reason the OOM killer never kicks in is because you actually never run into an OOM, or almost never.

What usually happens is that in near-OOM conditions, the kernel starts reclaiming memory pages backed by file (sometimes called "trashing"). This operation manages to keep some extra memory available, but it makes the system almost unresponsive because it's constantly copying memory back and forth from the disk. It may take anywhere between minutes to hours before the system finally OOMs and the OOM killer is invoked.

This problem has been there forever but has been made worse by the improved speeds of modern storage technologies: with slower disk I/O, the OOM condition was reached sooner.

There are several solutions:

- Buy more RAM: if your system routinely goes nearly OOM something is not right.

- Add a (small) swap. It doesn't have to be a partition: nowadays most filesystems support swap file. Just create an empty file and mark it as swap.

- Limit the amount of thrashing or protect some pages from being reclaimed. This has been proposed by Google first and several other people since then, but AFAIK it has never been implemented in the mainline kernel.

Regarding the latter solution, there is a patchset called le9-patch[1] that is included in some alternative Linux kernels and it should be relatively safe to use.

[1]: https://github.com/hakavlad/le9-patch

yunohn · on Oct 6, 2021

I’m not sure if you misread the previous comment, but their point is quite commonly experienced by people who use Linux vs MacOS/Windows.

All hardware being the same (RAM, SSD, CPU) as OOM is reached, Linux will freeze, whereas Windows continues to run smoothly. All OSes try to reclaim memory pages, just Linux seems to hang the user space while doing so.

As someone who has dual-booted Windows and Linux for a decade, I can 100% attest to this glaring problem.

ladyanita22 · on Oct 6, 2021

Absolutely.

This could be fixed. Android doesn't behave like that, and it's not too far from the mainline kernel nowadays.

kdmytro · on Oct 6, 2021

I am sure that this distinction between a near-OOM condition and an actual OOM condition matters to someone familiar with the current kernel implementation. You seem confident describing what happens when the memory gets closer to full, so I believe you. However, the user experiences the PC freeze during certain conditions, however you choose to name them, and it is during that freeze period the user needs a program to be killed to free some memory and prevent the freeze. I would take one crashed program over power cycling the entire PC any day of the week.

rnhmjoj · on Oct 6, 2021

> I am sure that this distinction between a near-OOM condition and an actual OOM condition matters to someone familiar with the current kernel implementation. You seem confident describing what happens when the memory gets closer to full, so I believe you.

I'm not a kernel developer or anything like that, I've just spent some time investigating why this issue happens and has been happening for more than 10 years now.

> the user experiences the PC freeze during certain conditions, however you choose to name them

I'm not trying to defent the Linux kernel, I just described how it works. In particular it's not true that the OOM killer "takes too long" or doesn't work: it's just not invoked at all. If you invoke it manually (enable the magic SysRq with`sysctl kernel.sysrq=1` and press `alt-sysrq-f`) it does its job and solves the OOM instantly.

So, if you don't want to deal lockups and don't like an OOM userspace daemon (I don't), these are the possible solutions.

> I would take one crashed program over power cycling the entire PC any day of the week.

On a laptop or desktop PC, you don't need to power cycle in a near-OOM: use the magic SysRq key.

hollerith · on Oct 6, 2021

>On a laptop or desktop PC, you don't need to power cycle in a near-OOM: use the magic SysRq key.

Thanks for the tip! If my Linux were ever to start locking up regularly, I will apply it.

But right now (so I don't have to give up the use to which I currently put my SysRq key) I would prefer some method for determining after I forcefully powered down the computer, then powered up again, whether the lockup or slow-down that motivated the force-power-down was caused by a near-OOM condition.

Do you happen to have a tip for that?

rnhmjoj · on Oct 6, 2021

I don't think so, sorry. The kernel emits a few messages when an OOM is detected, including the tasks killed to free memory, but in a near-OOM probably nothing: the system is technically still working normally, though very slowly.

wongarsu · on Oct 6, 2021

> Add a (small) swap. It doesn't have to be a partition: nowadays most filesystems support swap file. Just create an empty file and mark it as swap.

My experiences with swap on Linux have been similarly bad. If even brief memory pressure forces the kernel to move things to swap, the only way to revert that in any reasonable timeframe is to unmount the swap partition or to restart the machine.

Meanwhile using Windows with a swap file of twice the size of physical RAM runs smooth as butter. I have a 200GB swap file right now and no problems.

curt15 · on Oct 6, 2021

>I can't understand how can it be possible that, on Windows, whenever I run out of RAM the OS kills whatever process is consuming too much and the computer keeps running flawlessly.

I've long wondered this too. How does Windows handle memory pressure differently?

notriddle · on Oct 6, 2021

One of the other suggestions was to create a swap file. Windows has a page file by default, right?

kevincox · on Oct 6, 2021

IIRC windows has a managed set of page files that grows as necessary.

Bender · on Oct 7, 2021

I avoid swap since it needs to be encrypted to protect sensitive data written out from memory to disk. Instead I reserve more memory for the kernel vm.min_free_kbytes based on the installed ram and also based and some redhat suggestions, reserve more memory in vm.admin_reserve_kbytes and vm.user_reserve_kbytes, adjust vm.vfs_cache_pressure based on server role and finally set vm.overcommit_ratio to 0. This worked well on over 50k bare metal servers with no swap. OOM was extremely rare outside of dev. OOM basically only happened with automation had human induced bugs that deployed too many java instances to a server. All of the servers had anywhere from 512GB to 3TB ram and nearly all the memory was in use at all times.

0x000000001 · on Oct 7, 2021

Swap files can be dangerous as some filesystems require memory allocations to write and if you're out of memory...

cmurf · on Oct 6, 2021

The kernel OOM killer is only concerned about kernel survival. It isn't designed to care about user perception of system responsiveness.

That's what resource control via cgroups is about. Fedora desktop folks (both GNOME and KDE) are working on ensuring minimum resources are available for the desktop experience, via cgroups, which then applies CPU, memory, and IO isolation when needed to achieve that. Also, systemd-oomd is enabled by default. The resource control picture isn't completely in place yet, but things are much improved.

jeffbee · on Oct 6, 2021

cgroups often make the situation worse, not better, by insisting that a small memcg drop caches because that control group is full while the system overall has plenty of resources. This can lead to a system severely swapping for no apparent reason.

Putting desktop apps into individual cgroups is one of the more counter-productive ideas that has cropped up lately.

ladyanita22 · on Oct 6, 2021

Okay, this is a good explanation of what's going on.

AshamedCaptain · on Oct 6, 2021

Huh? I have never seen desktop Windows killing process due to out of memory -- does it even do that?

It does thrash much more gracefully than Linux, though. In fact the "your computer is low on memory" prompt actually can show up even when severely thrashing, something utmost impossible in Linux (even starting something like zenity may take hours..).

umanwizard · on Oct 6, 2021

The Linux kernel is written in C, not C++.

ladyanita22 · on Oct 6, 2021

o...kay?

xioxox · on Oct 6, 2021

You can already disable the linux memory overcommit feature if you want linux to never allow more memory to be allocated than exists. However, you may run into problems with programs which rely on the ability to allocate more memory than they need, or if you computer has low amounts of memory.

fowl2 · on Oct 7, 2021

The reason is that Windows doesn’t have fork(), and therefore doesn’t have to promise huge multiples of the available memory only to be left holding the bag when that fiction failed. Look up “overcommit” if you’re interested.

esgwpl · on Sept 26, 2021

Yes, please. I added a vote on AUR as well.

esgwpl · on Sept 4, 2021

"The inode64 and inode32 names are used based on existing precedent from XFS."

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

esgwpl · on April 6, 2021

>and yet the defensiveness is so reflexively strong

It's hard not to get defensive because this reddit post is basically flamebait:

>use of words like sapping and deadend

>"there's little to no payoff for the rest of humanity"

Like you said we can probably discuss this question without getting defensive but it's impossible thanks to the way it's worded.

jrm4 · on April 6, 2021

Not at all. I'm aware that it has the potential to put people off, but sometimes you have to shake people up to get the discussion going. There is actually a huge difference between "forcing people to do something" and merely "asking a question, albeit in a tough way," and people don't really seem to get this difference around here.

esgwpl · on March 23, 2021

>The kernel is different from userspace projects - more difficult in some respects (we use a lot of very odd header files that pushes the boundary of what can be called "C"), but easier in many other respects (mainly in the sense that the kernel is fairly self-contained, and then doesn't rely on other projects for the final binary).

I'm interested in what Torvalds meant by these odd header files, does anyone know?

cransoon · on March 23, 2021

  /*
   * This returns a constant expressionn while determining if an argument is
   * a constant expression, most importantly without evaluating the argument.
   * Glory to Martin Uecker <Martin.Uecker@med.uni-goettingen.de>
 */
  #define __is_constexpr(x) \
          (sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))

ufo · on March 23, 2021

Some explanation from Linus about how this trick works: https://lkml.org/lkml/2018/3/20/845

120bits · on March 23, 2021

"sizeof((void )1)"

"- this will break the minds of everybody who ever sees that expression."

It indeed broke my mind!

Koshkin · on March 24, 2021

I am sure you meant

  sizeof((void*)1)

alblue · on March 24, 2021

Given the italics, it wouldn’t surprise me if the original was in fact:

sizeof(*(void*)1)

PixelOfDeath · on March 23, 2021

The dark side of c is a pathway to many abilities, some considered to be unnatural

nmfisher · on March 23, 2021

Jesus, can someone unravel that for me?

Koshkin · on March 24, 2021

I can try... In short, this is about how a C compiler is supposed to deduce the type of the ternary operation. So, if x is a constant expression, the compiler figures that the second operand must be NULL (because it is allowed to perform the multiplication, by 0l), whose type then is whatever the third operand says (a pointer to int). The comparison succeeds. Otherwise the compiler does not perform the multiplication, takes the type of the second operand at its “face value”, i.e. as a pointer to void, and converts the type of the third operand to it - which, of course, makes the comparison false in this case.

esgwpl · on March 24, 2021

I found a detailed explanation here:

https://stackoverflow.com/a/49481218

_xrjp · on March 23, 2021

Wow! the above code is described by itself about what Torvalds means with: very odd header files that *pushes the boundary* of what can be called "C".

I'm not a C dev BTW.

120bits · on March 23, 2021

wow! As mainly a Java engineer. This is an alien language to me.

tambourine_man · on March 23, 2021

Probably some crazy macros that the kernel code uses. You can end up having an almost dialect of C with extensive use.

esgwpl · on Feb 20, 2021

I don't know about you but to me what makes D2 so charming is its loot and character progression.

esgwpl · on Jan 4, 2021

Small nit, the PRO version of Ryzen APUs do support ECC[0], also ASRock has been quoted saying that all of their AM4 motherboards support ECC, even the low end offerings with the A320 chipset.

[0] https://www.asrock.com/mb/amd/a320m-hdv%20r4.0/index.asp#Spe...

esgwpl · on Dec 23, 2020

This version has got to be the worst kernel released in a while in terms of regression, from AMDGPU null pointer dereference crash[0] to f2fs data corruption bug[1] and now this. Fixes for these are on their way as far as I can tell but since the stable team are probably on Christmas vacation it might take a while.

[0] https://bbs.archlinux.org/viewtopic.php?pid=1943906#p1943906

[1] https://bugzilla.kernel.org/show_bug.cgi?id=210765

gregkh · on Dec 23, 2020

I get a vacation? Hah!

tuldia · on Dec 23, 2020

Every software has bugs, and is easier to criticize than to help. Don't focus on the negativity of HN and keep it up!

Thanks for your hard work, Greg!

gregkh · on Dec 23, 2020

If people don't report bugs, we don't know they are there as it "works for me!".

This isn't "negativity", this is people not understanding how the process works :)

And you're welcome!

Skunkleton · on Dec 23, 2020

These are some attitude goals for me. It's so easy to take things personally. Being able to take things constructively even when they might be personal is a great skill.

mike256 · on Dec 23, 2020

As every patch nowadays means that yet another symbol gets his gpl-only tag, no I don't report bugs...

muxator · on Dec 23, 2020

Thanks for your work, Greg!

nasir_hm · on Jan 3, 2021

Thank You very much for the hard work you do Greg, You're awesome :D

gruturo · on Dec 23, 2020

Thank you for your hard work Greg, it is very much appreciated.

Merry Christmas/Isaac Newton's Birthday!

fatboy93 · on Dec 23, 2020

You should a small vacation sometime ;)

Bugs can always be fixed, mental health can't.

Thanks for all the awesome work Greg!

esgwpl · on Dec 23, 2020

Sorry if my post sounds like a jab, it wasn't meant to be, thanks for your work and Merry Christmas.

mr_sturd · on Dec 23, 2020

Merry Christmas, Greg!

malikolivier · on Dec 23, 2020

Merry Christmas, Greg!

Thanks for all the hard work!

tremon · on Dec 23, 2020

And the fstrim issue: http://lkml.iu.edu/hypermail/linux/kernel/2012.1/01288.html

Which triggered an immediate 5.10.1 release: http://lkml.iu.edu/hypermail/linux/kernel/2012.1/07005.html

diegocg · on Dec 23, 2020

5.10 is a Long Term Support release that is going to get used by many distros for a long time. Maintainers might have tried (unsurprisingly) to get some interesting features merged.

est31 · on Dec 23, 2020

On the bright side updating to 5.10 fixed a regression of a 5.4 to 5.8 kernel upgrade to me. The fix might have been in 5.9 but I only got the idea of upgrading after the 5.10 release.

Anyways, Linux needs some more CI so that such bugs can be found during the RC phase.

gregkh · on Dec 23, 2020

Where in the current CI that we have today is lacking that needs to be improved? We always want more testing and testers, what is preventing everyone from helping with this?

dlgeek · on Dec 23, 2020

I'll bite: How can I help?

I'm a software engineer who's not involved in Linux Kernel Dev... but I've got a stack of old laptops that I'd be happy to set up to run automated CI if that'd be helpful.

Is there a webpage or doc somewhere I can look at?

(I'm not trying to snark - the fact that you're you and you're here asking for help is making me want to dip my toe in).

gregkh · on Dec 24, 2020

Simplest thing to do, just run Linus's latest releases (the -rc releases), or from his git tree, on your machine and report any problem.

Second-simplest thing to do is to run the linux-next branch/tree on your machines and report any build warnings and runtime issues you find. That's what will be the "next" kernel releases and is where all of the developer/maintainer trees are merged together before they are sent to Linus.

Both of those should be very easy to do, and any problems found there should be easy to fix and resolve before they get to a "real" release.

nitrogen · on Dec 23, 2020

I haven't been following kernel dev for years; what does the CI setup look like? Did the Phoronix Test Suite ever find its way into widespread use?

Back when I was building kernels for embedded hardware (Sheevaplug) in the 2.6.33 timeframe, I found a USB audio regression between 2.6.33.7 and later versions. If there were a semi-turnkey way to set up a testbench that could automatically reboot hardware in every new kernel, run through some basic tests, and report any deviation, I probably would have been more likely to do so. At the time I was working solo trying to release a polished consumer product (sadly though the product was released the business didn't work out) and didn't have time to dig into and report bugs.

gregkh · on Dec 24, 2020

We have so many different CI systems running on the kernel on a hourly basis.

We have the 0-day bot from Intel that runs so many things on all developer trees. We have kernelci running on many many different hardware platforms, and we have Linaro test systems also running on many different branches and hardware platforms.

If you want to tie your own hardware into the system, kernelci is the best place to start, I recommend looking into that.

thanks!

fluffy87 · on Dec 23, 2020

How is the kernel tested ? There weren’t any tests covering any of this ?

rstuart4133 · on Dec 23, 2020

> How is the kernel tested ? There weren’t any tests covering any of this ?

Despite appearances, "the kernel" is not a single monolithic thing. There is a about a 100 kloc core (but I haven't looked up that number in years). The rest, hardware drivers, network protocols, file systems, crypto, raid ... bolt on as modules.

Those modules are maintained separate teams. They are as related to the kernel as the phone dialler app is related to Android. The quality of each module is the responsibility of that team, not "the kernel" team. And that applies to testing the module as well.

In a sense, "the kernel" team is more like debian or redhat than developers. What they have done is develop a framework that lets them take bits created and maintained by a cast of thousands, and bolt it together into what appears to be a single coherent thing from the outside. So the answer to "how is the kernel tested" is "it's complex, and not centrally planned".

The other answer is what you are seeing is in fact part of the testing process. Most people use kernels packaged by their distribution. kernel.org releases are more like Microsoft's pre-releases of Windows. Most Debian users for example won't see it until it gets to Debian testing. To get there it must pass through Debian experimental (which is where 5.10 sits now) then sit in Debian unstable without bug reports for a while. Those release names should give you a hint about the anticipated stability of the kernel version. I personally won't use it until it takes another step, which is from Debian testing to Debian backports (which is when it because available to Debian stable users who are willing to risk compatibility issues).

This means that for for most users, 5.10 it's done yet as it has barely begun it's testing regime.