You have to have an awfully good reason to add 100k lines of code to a project... And I don't think merely shaving 50% off the build time is enough.
Think how many lines are now duplicate (and therefore need to be updated in twice as many places, and bugs introduced when a copy is missed).
Think how much extra stuff someone needs to skim through looking for the relevant file or part of the file.
If the 100k lines were in their own subdirectory and added a major new feature, it would be worth it. But spread across the whole codebase, and introducing more duplication and chances for bugs I think outweighs a 'neater' header file system.
The 'dependency hell' is only an issue for people who only want to build part of the kernel anyway. If you're building the whole lot, you might as well just include everything and it'll work great.
> [This patch series] decouples much of the high level headers from others, uninlining of unnecessary functions, a decoupling of the type and API headers, automated dependency handling of header files, and a variety of other changes
That's a lot more than "shaving the build time."
As far as I understand it, the goal of this patchset isn't to improve the build time; that's just a nice consequence. The goal was to refactor the header-file hierarchy to make it more maintainable and less "brittle." Sometimes, increasing maintainability requires more code. (Almost always, if the current version is a terse mess of mixed concerns.)
Think of it this way: take an IOCCC entry, and de-obfuscate it. You're "increasing the size of the codebase." You might even be "duplicating" some things (e.g. magic constants that were forcefully squashed together because they happened to share a value, which are now separate constants per semantic meaning.) But doing this obviously increases the maintainability of the code.
I'd say cycle time is also important anyway: we spend a lot of time building the Linux kernel in various forms. The savings across the board in manhours waiting for compilation, or for git bisect, are not insubstantial.
I wonder what the memory savings for compilation look like? Because that's also potentially more workers in automated testing farms for the same cost.
I've never tried it myself, but presuming you build the kernel in modular rather than monolithic mode, wouldn't the incremental compiles during git-bisect et al already be pretty quick? You'd only be rebuilding the modules whose source files changed, and those (presumably) wouldn't be pulling in the entire header tree. (Or would they, with that being "the problem"?)
> You have to have an awfully good reason to add 100k lines of code to a project... And I don't think merely shaving 50% off the build time is enough.
When you're talking a project of half a million lines, sure.
The Linux kernel has around 27.8 million lines of code.
An increase of .35%
> Think how much extra stuff someone needs to skim through looking for the relevant file or part of the file.
Why add features at all? Code has a purpose. Sometimes bringing code into a static context is a net good. It was going to be generated at runtime anyway.
> If you're building the whole lot, you might as well just include everything and it'll work great.
That's not strictly true, but it's true for these features, which is a stated reasoning.
>The Linux kernel has around 27.8 million lines of code. An increase of .35%
This is horribly misleading; most of these lines of code are drivers, which this patchset doesn't even concern.
It's still a massive change that only a handful of developers will ever be able to review in entirety - a fact to which the size of the project is completely irrelevant - if anything, actually, it urges even more caution, given the implied complexity. Which I believe was (at least in part) parent comment's point - given the importance and ubiquity of the Linux kernel, this may be concerning.
That said, I am very confident in the structures put in place by the kernel devs, their competence and the necessity for such a change - but trivializing a 100k LoC patchset because the project it's intended to land in is even more colossally complex isn't how I'd choose my approach.
> You have to have an awfully good reason to add 100k lines of code to a project... And I don't think merely shaving 50% off the build time is enough.
In Ingo's post, he points out that the main speedup is coming from the fact that the expansion after the C preprocessor step is a LOT smaller.
That's a lot of decoupling. As someone who had to go rattling over the USB gadget subsystem, I can tell you that running "grep" with "find" was the standard way to find some data structure buried in a file included 8 layers deep. Having those data structures in files actually mentioned by the file you're working on would be a huge cognitive load improvement as well as make tool assistance much more plausible.
Even if this particular patch doesn't land, it lights the path. What types of changes need to be made are now clear. How many changes are required before you see the improvement is now clear. With those, the changes required can be driven down into the maintainers and rolled out incrementally, if desired.
> As someone who had to go rattling over the USB gadget subsystem, I can tell you that running "grep" with "find" was the standard way to find some data structure buried in a file included 8 layers deep.
It has always amazed me how finding something which originally seemed a trivial little thing, usually meant going through a chain of #defines and typedefs across many header files. It's the same with GLibc, by the way. It' a bit like when you hike to a summit by following a crest path: you always think the next hump in sight is the right one, your destination, the promise land; and when you reach it, dammit, it wasn't, your goal is actually the next one. Or perhaps the next after the next. Or...
Yes, actually when I came to C and asked on stackexchange/unix&linux how I could search which libc header I had to include in order to use a defined macro or find type declaration,without a web search, my question was shot down in the hour and I was recommended to either make a web search or grep and good luck.
This how desperate we are : We can't handle newbies putting that truth in front of us.
> As someone who had to go rattling over the USB gadget subsystem, I can tell you that running "grep" with "find" was the standard way to find some data structure buried in a file included 8 layers deep.
As someone who tried poking around: Oh good; I assumed I was just missing something. Alternatively, oh no; I had assumed I was missing something and there was a more elegant tool out there.
> You have to have an awfully good reason to add 100k lines of code to a project
yes. and this is very, very good reason. As another poster said, you at 0.35% lines to make it compile almost twice as fast? And you're not happy about that?
> Think how many lines are now duplicate (and therefore need to be updated in twice ...
OK, how many? None! That's how many. Adding proper header dependencies to the .c module doesn't duplicate anything. Unless you think adding #include <stdio.h> in every module somehow creates unmaintainable duplication.
> Think how much extra stuff someone needs to skim through looking for the relevant file or part of the file.
OK. Hrm... I think a lot, lot less is how much. That's the whole point of a major cleanup like this. Proper decoupling. Headers that you use are obvious where they belong not being brought in with some action-at-a-distance accident.
Think how many lines are now duplicate (and therefore need to be updated in twice as many places, and bugs introduced when a copy is missed).
Think how much extra stuff someone needs to skim through looking for the relevant file or part of the file.
If the 100k lines were in their own subdirectory and added a major new feature, it would be worth it. But spread across the whole codebase, and introducing more duplication and chances for bugs I think outweighs a 'neater' header file system.
The 'dependency hell' is only an issue for people who only want to build part of the kernel anyway. If you're building the whole lot, you might as well just include everything and it'll work great.