> 2. Instantly, you have two 10GB processes No, that's not how it works. The pro...

blitzkrieg3 · on April 5, 2023

This is the only right answer. What actually happens is you instantly have two 10G processes which share the same address space, and:

3. A microsecond later, the child calls exec(), decrementing the reference count to the memory shared with the parent[1] and faulting in a 36k binary, bringing our new total memory usage to 1,045,612KB (1,048,576K + 36K)

CoW has existed since at least 1986, when CMU developed the Mach kernel.

What GP is really talking about is overcommit, which is a feature (on by default) in Linux which allows you to ask for more memory than you have. This was famously a departure from other Unixes at the time[2], a departure that fueled confusion and countless flame wars in the early Internet.

[1] https://unix.stackexchange.com/questions/469328/fork-and-cow... [2] https://groups.google.com/g/comp.unix.solaris/c/nLWKWW2ODZo/...

senko · on April 5, 2023

This is more or less what the second 2. explains:

> 2. We could overlook the memory usage increase and pretend that we have enough memory, and only really panic if the second process truly needs its own 10GB RAM that we don't have. That's what Linux does

"pretend" → share the memory and hope most of it will be read-only or unallocated eventually; "truly needs to own" → CoW

jacquesm · on April 5, 2023

It will never happen. To begin with all of the code pages are going to be shared because they are not modified.

Besides that the bulk of the fork calls are just a preamble to starting up another process and exiting the current one. It's mostly a hack to ensure continuity for stdin/stdout/stderr and some other resources.

msm_ · on April 5, 2023

It will most likely not happen? It's absolutely possible to write a program that forks and both forks overwrite 99% of shared memory pages. It almost never happens, which is GP's point, but it's possible and the reason it's a fragile hack.

What usually happens in practice is you're almost OOM, and one of the processes running in the system writes to a page shared with another process, forcing the system to start good ol' OOM killer.

jacquesm · on April 5, 2023

99% isn't 100%.

Sorry, but no, it can't happen, you can not fork a process and end up with twice the memory requirements just because of the fork. What you can do is to simply allocate more memory than you were using before and keep writing.

The OOM killer is a nasty hack, it essentially moves the decision about what stays and what goes to a process that is making calls way above its pay grade, but overcommit and OOM go hand in hand.

blitzkrieg3 · on April 5, 2023

It does not happen using fork()/exec() as described above. For it to happen we would need to fork() and continue using old variables and data buffers in the child that we used in the parent, which is a valid but rarely used pattern.

worthless-trash · on April 5, 2023

OP is referring to what would happen in the naive implementation, not what actually happens.

reisse · on April 5, 2023

The data won't be copied, but kernel has to reserve the memory for both processes.

l33tman · on April 5, 2023

No, it doesn't. See "overcommit"

reisse · on April 5, 2023

Please read the parent comments. Overcommit is necessary exactly because kernel has to reserve memory for both processes, and overcommit allows to reserve more memory than there is physically present.

If kernel could not reserve memory for forked process, overcommit would not be necessary.

blitzkrieg3 · on April 5, 2023

This is a misconception you and parent are perpetuating. fork() existed in this problematic 2x memory implementation _way_ before overcommit, and overcommit was non-existent or disabled on Unix (which has fork()) before Linux made it the default. Today with CoW we don't even have this "reserve memory for forked process" problem, so overcommit does nothing for us with regard to fork()/exec() (to say nothing of the vfork()/clone() point others have brought up). But if you want you can still disable overcommit on linux and observe that your apps can still create new processes.

What overcommit enables is more efficient use of memory for applications that request more memory than they use (which is most of them) and more efficient use of page cache. It also pretty much guarantees an app gets memory when it asks for it, at the cost of getting oom-killed later if the system as a whole runs out.

lxgr · on April 5, 2023

I think you've got it backwards: With overcommit, there is no memory reservation. The forked processes gets an exact copy of the other's page table, but with all writable memory marked as copy-on-write instead. The kernel might well be tallying these up to some number, but nothing important happens with it.

Only without overcommit does the kernel does need to start accounting for hypothetically-writable memory before it actually is written to.

reisse · on April 5, 2023

Yes, you're right here, that was what I wanted to say but probably was not able to formulate.

enedil · on April 5, 2023

In a sense it does. Page tables would need to be copied.

jacquesm · on April 5, 2023

Yes, but they all point to the same pages. The tables take up a small fraction of the memory of the pages themselves.

enedil · on April 5, 2023

But large fraction, if all you do afterwards is an exec call. Given 8 bytes per page table entry and 4k pages, it's 1/512 memory wasted. So if your process uses 8GB, it's 16MB. Still takes noticeable time if you spawn often.

jacquesm · on April 5, 2023

I've never had the page tables be the cause of out of memory issues. Besides the fact that they are usually pre-allocated to avoid recursive page faults, but nothing would stop you from making the page tables themselves also copy-on-write during a fork.

lxgr · on April 5, 2023

Aren't page tables nested? I don't know if any OS or hardware architecture actually supports it, but I could imagine the parent-level page table being virtual and copy-on-write itself.