even with internal include guards, just the time to read through (and skip) the whole files for later inclusions can become dominant in cpp processing time (i.e. it's mostly waiting on I/O reading the same headers many times over). That's why Lakos advises external include guards for large systems.
Unless you have gigantic projects, your project source code is likely to be mostly in memory through kernel IO buffers nowadays, so even this is not true anymore IMO.
EDIT: just for kicks, I compiled twice numpy (100 kLOC), first just after flushing IO buffers (echo 1 > /proc/sys/vm/drop_caches in recent linux kernels):
- cold case: 24 seconds
- host case: 16 seconds
Of course, it is hard to say what contributes to the slowness (files themselves, loading in memory all the programs needed for compilation, etc...).
If you include `ccache` (or similar), the recompilation time will drop to ~0. One-off compilation time doesn't matter that much and developers have many other tools they can use (pch?).
Partial rebuilds are obviously quite important - but in that case, IO is even less of an issue, because it is the hot case assuming you have enough memory. I forgot about one case where you may not have enough memory: large C++ program with multiple compilation in // - this can easily takes GB of memory for template heavy code.
one easy way to slow down compiles on Windows is to include windows.h everywhere, this pulls down about 2 Mb of declarations that the cpp front-end has to parse just to reach in the internal guards... it all depends on how large your header files are (and recursively how large is all headers that they recursively pull in).