You have to have an awfully good reason to add 100k lines of code to a project.....

derefr · on Jan 3, 2022

> [This patch series] decouples much of the high level headers from others, uninlining of unnecessary functions, a decoupling of the type and API headers, automated dependency handling of header files, and a variety of other changes

That's a lot more than "shaving the build time."

As far as I understand it, the goal of this patchset isn't to improve the build time; that's just a nice consequence. The goal was to refactor the header-file hierarchy to make it more maintainable and less "brittle." Sometimes, increasing maintainability requires more code. (Almost always, if the current version is a terse mess of mixed concerns.)

Think of it this way: take an IOCCC entry, and de-obfuscate it. You're "increasing the size of the codebase." You might even be "duplicating" some things (e.g. magic constants that were forcefully squashed together because they happened to share a value, which are now separate constants per semantic meaning.) But doing this obviously increases the maintainability of the code.

XorNot · on Jan 3, 2022

I'd say cycle time is also important anyway: we spend a lot of time building the Linux kernel in various forms. The savings across the board in manhours waiting for compilation, or for git bisect, are not insubstantial.

I wonder what the memory savings for compilation look like? Because that's also potentially more workers in automated testing farms for the same cost.

derefr · on Jan 3, 2022

I've never tried it myself, but presuming you build the kernel in modular rather than monolithic mode, wouldn't the incremental compiles during git-bisect et al already be pretty quick? You'd only be rebuilding the modules whose source files changed, and those (presumably) wouldn't be pulling in the entire header tree. (Or would they, with that being "the problem"?)

tedunangst · on Jan 3, 2022

That is the problem.

Supermancho · on Jan 3, 2022

> You have to have an awfully good reason to add 100k lines of code to a project... And I don't think merely shaving 50% off the build time is enough.

When you're talking a project of half a million lines, sure.

The Linux kernel has around 27.8 million lines of code. An increase of .35%

> Think how much extra stuff someone needs to skim through looking for the relevant file or part of the file.

Why add features at all? Code has a purpose. Sometimes bringing code into a static context is a net good. It was going to be generated at runtime anyway.

> If you're building the whole lot, you might as well just include everything and it'll work great.

That's not strictly true, but it's true for these features, which is a stated reasoning.

kangalioo · on Jan 3, 2022

Good golly, that 0.35% figure really puts the patch set into perspective

A library I'm working on is 7000 LOC which seems pretty sizable, but 0.35% of that is 25 LOC.

The Linux patch set is actually tiny

theon144 · on Jan 3, 2022

>The Linux kernel has around 27.8 million lines of code. An increase of .35%

This is horribly misleading; most of these lines of code are drivers, which this patchset doesn't even concern.

It's still a massive change that only a handful of developers will ever be able to review in entirety - a fact to which the size of the project is completely irrelevant - if anything, actually, it urges even more caution, given the implied complexity. Which I believe was (at least in part) parent comment's point - given the importance and ubiquity of the Linux kernel, this may be concerning.

That said, I am very confident in the structures put in place by the kernel devs, their competence and the necessity for such a change - but trivializing a 100k LoC patchset because the project it's intended to land in is even more colossally complex isn't how I'd choose my approach.

Denvercoder9 · on Jan 3, 2022

> This is horribly misleading; most of these lines of code are drivers, which this patchset doesn't even concern.

That's not true at all, a big part of those added lines are added includes in drivers.

E.g. this commit I picked at random adds 1500 lines, of which just a few procent are in the core kernel: https://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.gi...

bsder · on Jan 3, 2022

> You have to have an awfully good reason to add 100k lines of code to a project... And I don't think merely shaving 50% off the build time is enough.

In Ingo's post, he points out that the main speedup is coming from the fact that the expansion after the C preprocessor step is a LOT smaller.

That's a lot of decoupling. As someone who had to go rattling over the USB gadget subsystem, I can tell you that running "grep" with "find" was the standard way to find some data structure buried in a file included 8 layers deep. Having those data structures in files actually mentioned by the file you're working on would be a huge cognitive load improvement as well as make tool assistance much more plausible.

Even if this particular patch doesn't land, it lights the path. What types of changes need to be made are now clear. How many changes are required before you see the improvement is now clear. With those, the changes required can be driven down into the maintainers and rolled out incrementally, if desired.

wott · on Jan 3, 2022

> As someone who had to go rattling over the USB gadget subsystem, I can tell you that running "grep" with "find" was the standard way to find some data structure buried in a file included 8 layers deep.

It has always amazed me how finding something which originally seemed a trivial little thing, usually meant going through a chain of #defines and typedefs across many header files. It's the same with GLibc, by the way. It' a bit like when you hike to a summit by following a crest path: you always think the next hump in sight is the right one, your destination, the promise land; and when you reach it, dammit, it wasn't, your goal is actually the next one. Or perhaps the next after the next. Or...

cassepipe · on Jan 3, 2022

Yes, actually when I came to C and asked on stackexchange/unix&linux how I could search which libc header I had to include in order to use a defined macro or find type declaration,without a web search, my question was shot down in the hour and I was recommended to either make a web search or grep and good luck. This how desperate we are : We can't handle newbies putting that truth in front of us.

yjftsjthsd-h · on Jan 3, 2022

> As someone who had to go rattling over the USB gadget subsystem, I can tell you that running "grep" with "find" was the standard way to find some data structure buried in a file included 8 layers deep.

As someone who tried poking around: Oh good; I assumed I was just missing something. Alternatively, oh no; I had assumed I was missing something and there was a more elegant tool out there.

alexforencich · on Jan 3, 2022

https://elixir.bootlin.com/linux/latest/source can be far more helpful than grep for this sort of thing

fargle · on Jan 3, 2022

> You have to have an awfully good reason to add 100k lines of code to a project

yes. and this is very, very good reason. As another poster said, you at 0.35% lines to make it compile almost twice as fast? And you're not happy about that?

> Think how many lines are now duplicate (and therefore need to be updated in twice ...

OK, how many? None! That's how many. Adding proper header dependencies to the .c module doesn't duplicate anything. Unless you think adding #include <stdio.h> in every module somehow creates unmaintainable duplication.

> Think how much extra stuff someone needs to skim through looking for the relevant file or part of the file.

OK. Hrm... I think a lot, lot less is how much. That's the whole point of a major cleanup like this. Proper decoupling. Headers that you use are obvious where they belong not being brought in with some action-at-a-distance accident.

tedunangst · on Jan 3, 2022

Maybe the people developing the project have some insights into which build improvements they would benefit from.

Having to recompile the entire tree every time you change a seemingly unrelated header gets old fast.

stjohnswarts · on Jan 3, 2022

You need to read the article. It explains all this.