Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I suspect that most people would be better off favoring inlined code over modules and microservices.

It's okay to not organize your code. It's okay to have files with 10,000 lines. It's okay not to put "business logic" in a special place. It's okay to make merge conflicts.

The overhead devs spend worrying about code organization may vastly exceed the amount of time floundering with messy programs.

Microservices aren't free, and neither are modules.

[1] Jonathan Blow rant: https://www.youtube.com/watch?v=5Nc68IdNKdg&t=364s

[2] Jon Carmack rant: http://number-none.com/blow/john_carmack_on_inlined_code.htm...



I’ve said this before about applying Carmack’s architectural input on this topic:

Games are highly stateful, with a game loop that iterates over the same global in-memory data structure as fast as it can. You have (especially in Carmack era games) a single thread performing all the game state updates in sequence. So shared global state makes a ton of sense and simplifies things.

Most web applications are highly stateless with request-oriented operations that access random pieces of permanently-stored data. You have multiple (usually distributed) threads updating data simultaneously, so shared global state complicates things.

That game devs gravitate towards different patterns for their code than web service devs should not be a surprise.


I felt suspicious as soon as I saw Jon Carmack’s website being mentioned in a conversation about Microservices.


I hope this quote from the Carmack essay shows that his argument isn't necessarily restricted to game development:

    ---------- style C:
     
    void MajorFunction( void ) {
            // MinorFunction1
     
            // MinorFunction2
     
            // MinorFunction3
     
    }
> I have historically used "style A" to allow for not prototyping in all cases, although some people prefer "style B". The difference between the two isn't of any consequence. Michael Abrash used to write code in "style C", and I remember actually taking his code and converting it to "style A" in the interest of perceived readability improvements.

> At this point, I think there are some definite advantages to "style C", but they are development process oriented, rather than discrete, quantifiable things, and they run counter to a fair amount of accepted conventional wisdom, so I am going to try and make a clear case for it. There isn't any dogma here, but considering exactly where it is and isn't appropriate is worthwhile.

> In no way, shape, or form am I making a case that avoiding function calls alone directly helps performance.


I love Carmack but I always thought his conclusion there was unsatisfying. I think the issue with really long functions is that they increase scope - in both the code and mental sense.

Now your MinorFunction3 code can access all the local variables used by MinorFunction1 and MinorFunction2, and there's no easy list of things that it might access which makes it harder to read.

Separate functions do have a nice list of things they might access - their arguments!

Of course this technically only applies to pure functions so maybe that's why it doesn't matter to Carmack - he's used to using global variables willy nilly.

Also sometimes the list of things a function might need to access gets unwieldy, which is when you can reach for classes. So no hard and fast rule but I think increased scope is the issue with it.


I used to be a fan of style C, but these days, I prefer either A or B, with the condition that no MinorFunction should be less than 5 lines of code. If a function is that small, and it's called from only one place, then it doesn't need to be a function.

Using A or B results in self-documenting code and I think DOES (or at least, CAN) improve readability. It also can help reduce excessive nesting of code.


Counter point - https://github.com/microsoft/TypeScript/blob/main/src/compil... - your task is to just make it a little bit faster. Where do you begin with a 2.65mb source file?

It’s easy to miss the point of what the OP is saying here and get distracted by the fact this file is ridiculously huge. This file used to be a paltry 1k file, a 10k file, a 20k SLOC file… but it is where it is today because of the OP suggested approach.


Counter point: you have 2650 files, with a couple of 10 line functions in each. Your task is to just make it a little bit faster. Where do you start?

Answer: the same place - with a profiler. Logical code organization matters a lot more than “physical” break to specific files.

I have inherited a Java class spaghetti project in the past, with hunderds (perhaps thousands) of short classes, each which doesn’t do much but sits in its own file - and I would much prefer to work on an SQLite style codebase, even if I have to start with the amalgamation file.


> your task is to just make it a little bit faster. Where do you begin

With a trace from a profiler tool, which will tell you which line number is the hot spot. If run from any modern IDE, you can jump to the line with a mouse-click.

In essence, the file boundaries ought not to make any difference to this process.


>> In essence, the file boundaries ought not to make any difference to this process.

I find this the strongest counter argument. File boundaries shouldn’t matter - but in practice they do.

The ide will give up doing some of its analysis because this is not an optimised use case, the IDE vendors don’t optimise for it. Some IDEs will even give up just simple syntax highlighting when faced with a large file, never mind the fancy line by line annotations.


Do you have a feel for how large a file has to be for an IDE to choke on it? My vim setup can open a 13GB file in about ten seconds, with all syntax highlighting, line numbers, etc working. Once the file is open, editing is seamless.

Bitbucket is the biggest offender for me. It starts to disable syntax highlighting and context once the file hits about 500KB, which should be trivial to deal with.


Where do you begin if the code was in hundreds of separate modules? It's not clear if it's easier. It would take time to load the code into your brain regardless of the structure.

By the way, JavaScript parser in esbuild is a 16 thousand lines of code module too:

https://github.com/evanw/esbuild/blob/0db0b46399de81fb29f6fc...


Yeah, I think this a very much worst case scenario though.

<rant> You will (for most statements) both be able to find a best and worst case. That's the catch with most generalized statements/ principles, e.g., DRY. The challenge is to find a good "enough" solution because perfection is usually either unfeasibly expensive or impossible (different viewpoints, ...) </rant>

Though it's kinda hilarious that the source code of a MS project is not natively viewable on a MS platform.


If GitHub (or whatever you use) says 'sorry we can't display files that are this big', that should be your hard limit...


I do not agree. I believe that professionally "most people" don't work on the same code every day and don't work alone. Modules are a mean of abstraction and are a classic application of "divide et impera" and you'll need them pretty soon to avoid keeping the whole thing in your head. But different cultures of programming have a different meaning of what a module is, so, maybe, I'm misunderstanding your point


I strongly disagree on this one. 10+K lines files are absolutely unreadable most of the time. Separating business logic than other parts of the application helps maintaining it and making everything evolve in parallel, without mixing things up. It also helps to clearly see where business logic happens.


I'm inbetween. 10K line files are usually extremely messy, but they can be written not to - a large number of well-organized, well-capsulated <100 LOC classes can be very readable if smashed together in one file. It just so happens that people who tend to write readable self-contained classes just don't put them in 10 KLOC files, but rather split them. And vice-versa, creating an association "10 KLOC files are unreadable", where it's not the length of the file, but rather the organization itself.

Same for business logic - very clear separation can be cumbersome sometimes, but otherwise it becomes messy if you're not careful. And careful people just tend to separate it.


I disagree with this stance. Creating a file and naming it gives it a purpose. It creates a unit of change that tools like git can report on.


A line is a unit of change that git can report on.

If it's a separate file that is scoped to some specific concern, sure. But its tgat grouping by concern that is key. Not separation into another file. Extracting ra dom bits of code into separate files would be _worse_.


> A line is a unit of change that git can report on.

Yes and no.

Git doesn't store lines, it stores files. Git diff knows how to spit out line changes by comparing files.

So to run git blame on a 10k line file you're reading multiple versions of that 10k file and comparing. It's slow. Worse still is that trying to split said file up while trying to preserve history won't make the git blame any faster.


Yes and yes. While agree with the general points, note that they didn't say "unit that git stores", but "unit git can report on". Git can totally report on lines as a unit of change.


git diff understands function boundaries, and for many languages will “report” equally well on a single file.

It’s a good idea to break things down to files along logical boundaries. But got reporting isn’t a reason.

edit: "got diff" -> "git diff". DYAC and responding from mobile!


Git diff absolutely does not understand function boundaries, it's diff algorithms routinely confuse things like adding a single new function, thinking that the diff should begin with a "}", instead of a function definition.


It varies a bit with language and tooling, but 10k lines is around the place where the size of your file by itself becomes a major impediment on finding anything and understanding what is important.

A 10k lines file is not something that will completely destroy your productivity, but it will have an impact and you'd better look out for it growing further, because completely destroying your productivity is not too far away. It is almost always good to organize your code when it reaches a size like this, and the exceptions are on contexts where you can't, never on contexts where it's worthless.


I generally agree. My argument is that 10K lines written one way can certainly be more readable than 10 files x 1K lines written in a different way, so the real differentiator is the encapsulation and code style, not KLOC/file per se.


What muddies the waters here is languages like Java, where "10k lines" means "you've got a 10kLOC class there", and ecosystems like PHP's where while there's nothing in the language to require it, people and teams will insist on one class per file because hard rules are easier to understand and enforce than "let's let this area evolve as we increase our understanding".

As long as what's there is comprehensible, being able to evolve it over time is a very useful lever.


Honest question: do you think the same exact 10+K lines of code are easier to read spread across 1,000 files? And why do you think the overhead of maintaining the extra code for module boundaries is worth it?

EDIT: And what editor do you use? I'm wondering if a lot of these differences come down to IDEs haha


The right answer is 20 files with 500 lines in each - i.e. few pages of clean/ readable/logical/well-factored code. Obviously it depends on the code itself - it's is fine to have longer files if highly correlated. Stateful classes should be kept short however as the cognitive load is very high.

I also find that updating code to take advantage of new/better language features / coding styles, etc. is impossible to do on a large code base at once. However, sprinkling these kind of things randomly leads to too much inconsistency. A reasonable sweet spot is to make each file self-consistent in this regard.

My experience stems from larger 500+ person-year projects with millions of lines of code.


The right answer is that there is no right answer. You shouldn't divide your code based on arbitrary metric like size, you should divide it based on concepts/domains.

If a particular domain gets big enough it probably means it contains sub-domains and can benefit from being divided too. But you cannot make that decision based on size alone.


Sure but no problem domain is not going to lead you to 10,000 single line files. Similarly it will likely lead to very few 10K line files. There will likely be a way to factor things into reasonable sized chunks. File sizes are not going to be that highly coupled to problem domain as there are multiple ways to solve the same problem.


Sure, but it can lead to 1000 lines which some people still think is too much.

The point is that a numeric upper bound on LoC is inherently subjective and pointless. Instead of measuring the right thing (concepts) you're measuring what's easy to measure (lines).

In fact, it usually makes things worse. I've seen it over and over: you have N tightly coupled classes in a single file which exceeds your LoC preference.

Instead of breaking the coupling you just move those classes into separate files. Boom, problem solved. Previously you had a mess, now you have a neat mess. Great success!


10+K lines spread across 1k file is equally as bad as 10+K line files IMO.

I tend to ensure each file serves exactly one purpose (e.g. in C# one file = one class, with a only few exceptions).

I use VS Code, but in every IDE with a file opening palette it's actually really fast: you want to look for the code to, let's say, generate an invoice, just search for "invoice" in the list of files and you'll find it immediatly.

(Also modules have their own problem, I was mainly talking in a general way since that's what the parent comment was talking about.)


> Honest question: do you think the same exact 10+K lines of code are easier to read spread across 1,000 files?

This is a fair point but assumes 1 particular use case. It is easier if you are just concerned with a bit of it. If you need to deal with all of it, yeah, good fucking luck. 10k LOC file or 1k 100 LOC files.


> EDIT: And what editor do you use? I'm wondering if a lot of these differences come down to IDEs haha

Yes java developers can't do anything without their IDE. It helps them mask the 30000 nested directories they've created to "organize" the code.


And of course it's lot easier to read 200k+ LoC shattered around twenty repos.


The problem arises when you need to read the code of other modules or services. If you can rely on them working as they should, and interact with them using their well-defined and correctly-behaving interfaces, you won't need to read the code.

I'm a proponent of keeping things in a monolith as long as possible. Break code into files. Organize files into modules. A time may come when you need to separate out services. Often, people break things out too early, and don't spend enough effort on thinking how to break things up or on the interfaces.


> and interact with them using their well-defined and correctly-behaving interfaces, you won't need to read the code.

Don't you want determinism and deterministic simultations? If you do, you'll also need stub implementations (mocks, dummies) for your interfaces.

Some notes on that: https://blog.7mind.io/constructive-test-taxonomy.html

> A time may come when you need to separate out services.

Most likely it won't if you organise properly. For example, if each your component is an OSGi module.


> The problem arises when you need to read the code of other modules or services. If you can rely on them working as they should, and interact with them using their well-defined and correctly-behaving interfaces, you won't need to read the code.

You can say the exact same thing about C-headers though.


No, to me that's equally as bad. But 100k lines split across 500 well-named files is a lot easier to work with than 10+K line files or multi-repo code.


With an IDE you're can just look at the class hierarchy/data types rather than the files. As long as those are well organized, who cares hoe they span files?

For instance, in C# a class can span multiple files using "partial" or you can have multiple classes in a single file. It's generally not an issue as long as the code itself is organized. The only downside is the reliance on an IDE, which is pretty standard these days anyway.


Respectfully, rants by niche celebrities are not something we should base our opinions on.

If you're a single dev making a game, by all means, do what you want.

If you work with me in a team, I expect a certain level of quality in the code you write that will get shipped as a part of the project I'm responsible for.

It should be structured, tested, malleable, navigable and understandable.


I feel like this is a knee jerk reaction to the hyperbole of the parent comment rather than the contents of the actual linked talks. I'm watching Jonathan Blow's talk linked above and your comment does not seem relevant to that. Jonathan's points so far seem very reasonable. Rather than arguing for 10000 lines of code it's arguing that there is such a thing as premature code split. Moving code into a separate method has potential drawbacks as well.

One suggested alternative is to split reusable code into a local lambda first and lift it into a separate code piece only once we need that code elsewhere. It seems to me that such approach would limit the complexity of the code graph that you need to keep in your head. (Then again, when I think about it maybe the idea isn't really that novel.)


So you think it’s easier to keep in your head lambdas in a 10k line file vs methods split by functionality across a number of smaller files?


> It should be structured, tested, malleable, navigable and understandable.

Great comment!

I personally find that most codebases are overstructured and undertested :)

In my experience, module boundaries tend to make code less malleable, less navigable, and less understandable.


>It should be structured, tested, malleable, navigable and understandable.

People have different thresholds for when their code reaches these states though, especially "understandable".

You can meaningfully define these (and test for them) on a small scale, in a team, but talking about all developers everywhere, these are all very loosely defined.


It's a valid hypothesis, but without empirical data the question is not easy to settle (and neither Carmack nor Blow are notable authorities on systems that have to be maintained by changing teams of hundreds of people that come and go, maintaining a large codebase over a couple of decades; if anything, most of their experience is on a very different kind of shorter-lived codebases, as game engines are often largely rewritten every few years).

Also, the question of inlined code mostly applies to programming in the small, while modules are about programming in the large, so I don't think there's much relationship between the two.


> neither Carmack nor Blow are notable authorities on systems that have to be maintained by changing teams of hundreds of people that come and go

Most systems don't have to be maintained by hundreds of people. And yet they are: maybe because people don't listen to folks like Carmack?

We like stories about huge teams managing huge codebases. But what we should really be interested in is practices that small teams employ to make big impact.


I didn't mean a team of hundreds maintaining the product concurrently, but over its long lifetime. A rather average codebase lifetime for server software is 15-20 years. The kind of codebase that Carmack has experience with has a lifetime of about five years, after which it is often abandoned or drastically overhauled, and it's not like games have an exceptional quality or that their developers report an exceptionally good experience that other domains would do well to replicate what games do. So if I were a game developer I would definitely be interested in Carmack's experience -- he's a leading expert on computer graphics (and some low-level optimisation) and has significant domain expertise in games, but he hasn't demonstrated some unique know-how in maintaining a very large and constantly evolving codebase over many years. Others have more experience than him in domains that are more relevant to the ones discussed here.


I can't say I agree with all your "okays", although If you prefix them with "In some cases it's okay", then I understand where you're coming from.

The problem is when it's OK and for how long. If you have a team of people working with a codebase with all those "okays", then they have to be really good developers and know the code inside out. They have to agree when to refactor a business login out instead of adding a hacky "if" condition nested in another hacky "if" condition that depends on another argument and/or state.

I guess what I'm trying to say that if those "okays" are in place, then there's a whole bunch of unwritten rules that come in place.

But I agree that microservices certainly aren't free (I'd say they are crazy expensive) and modules aren't free either. But all those "okays" can end up costing you your codebase also.


> It's okay not to put "business logic" in a special place

It's not. This is the thing where you start thinking "YAGNI", yadayada, but you inevitably end up needing it. Layering with at least a service and a database/repositories is a no brainer for any non-toy app considering the benefits it brings.

> It's okay to have files with 10,000 lines

10.000 lines is a LOT. I consider files to become hard to understand at 1.000 lines. I just wc'd the code base I work on, we have like 5 files with more than 1.000 lines and I know all of them (I cringed reading the names), because they're the ones we have the most problems with.


As someone who has personally dealt with files as large as 60K lines, I disagree completely. I believe instead that structure and organization should be added as a business and codebase scales. The problem I think most orgs make is that, as they grow more successful, they don't take the time reorganize the system to support the growth, so, as the business scales 100x in employee account, employee efficiency is hampered by a code organization that was optimized for being small and nimble.

It gets worse when the people who made the mess quit or move on, leaving the new hires to deal with it. I've seen this pattern enough times to wonder if it gets repeated with most companies or projects.

I do agree that microservices and/or modules aren't magical solutions that should be universally applied. But they can be useful tools, depending on the situation, to organize or re-organize a system for particular purposes.

Anecdotally, I've noticed that smart people who aren't good programmers tend to be able to write code quickly that can scale to a particular point, like 10k-100k lines of code. Past that point, productivity falls rapidly. I do believe that part of being a skilled developer is being able to both design a system that scales to millions of lines of code across an organization, and to operate well on one designed by someone else.


Well said. You will see very fast if a dev is experienced or not by looking at code organization and naming. Although I deal with experienced ones that just like to live in clutter. You can be both smart and stupid at the same time.


> It's okay to have files with 10,000 lines.

Ever since my time as a mathematician (I worked at a university) and using LaTeX extensively, I never understood the "divide your documents/code into many small files" mantra. With tools like grep (and its editor equivalents), jumping to definition, ripgrep et al., I have little problem working with files spanning thousands of lines. And yet I keep hearing that I should divide my big files into many smaller ones.

Why, really?


I think what goes wrong with dividing is people truly just splitting the code into multiple files.

I think the code should be split conceptually thus not just copy/paste part of code from main file to submodules, but split the code around some functional bounderies or concepts so that each file is doing one thing and compose those concepts into more abstract concepts.

So that when I try to debug something I can decide where I want to zoom in and thus be able to quickly identify the needed files.


I think the big benefit is not the actual split into files, but the coincidental (driven both by features of some languages and also mere programmer convenience) separation of concerns, somewhat limiting the interaction between these different files.

If some grouping of functions or classes is split out in a separate file where the externally used 'interface' is just a fraction of these functions or classes, and the rest are used only internally within the file, then this segmentation makes the system easier to understand.


One big reason is source control. Having many smaller files with well defined purpose reduces the number of edit collisions (merges) when working in teams.

Also, filenames and directories tree act as metadata to help create a mental map of the application. The filesystem is generally well represented in exploratory tools like file browser and IDE. While the same information can be encoded within the structure of a single file, one needs an editor that can parse and index the format, which may not be installed on every system.


> Also, filenames and directories tree act as metadata to help create a mental map of the application

Correct. But this is an argument against splitting: once your folder structure reflects your mental model, you should no longer split, no matter how big individual files get. Splitting further will cause you to deviate from your mental model.

Also, it seems like we're arguing against a strawman: saying "big files are okay" is not the same as "you should only have big files". When people mean is that having a big file does not provide enough justification to split it. But it is still a signal.


Well, Git is pretty good at merging when the changes are in different places of the same file. Though your second point is a very good one.

FWIW, sometimes when I worked on a really large file, I put some ^L's ("new page" characters) between logical sections, so that I could utilize Emacs' page movement commands.


It's not just git, it's pull requests and code reviews that block on concurrent file updates. The workflow friction multiplies with the number of devs sharing the same code area. Small files minimize this by making more granular work units.


At the risk of getting lost in a swamp of not particularly good answers, it's most useful if you have scope control: If you have language keywords that allow you to designate a function as "The context/scope of this function never escapes this file," then multiple files suddenly become very useful, because as a reader you get strong guarantees enforced by a compiler, and a much easier time understanding context. The same can be said of variables and even struct fields. In very large programs it can also be useful to say, "The scope of this function/variable/etc never escapes this directory".

If everything is effectively global to begin with, you're right, it might as well all be in one file. In very large programs the lack of scope control is going to be significant problem either way.

This is where object-oriented programming yields most of its actual value - scope control.


This is one of alarmingly few sane, articulate comments amid an extremely weird conversation about how maybe a giant mess of spaghetti is good, actually.

Yeah sure overengineering is a thing, but you're way off the path if you're brushing aside basic modularization.


Because one day someone else may need to read and understand your code?


But is it generally easier to read and understand, say 10 files of 1000 lines or 100 files of 100 lines or 1000 files of 10 lines, compared to one 10,000 lines file? (I don't know the answer, and don't have any strong opinion on this.)


navigating between files is trivial in most real IDE's. i can just click through a method and then go back in 1 click


But navigating between functions / classes / paragraphs / sections is also trivial in most real editors.


But this is precisely my question: in what way does splitting the code into many small files help with that? Personally I find jumping between many files (especially when they are located in various directories on various levels in the fs) pretty annoying...

Of course, that probably depends on the nature of the project. Having all backend code of a web app in one file and all the frontend code in another would be very inconvenient. OTOH, I have no problems with having e.g. multiple React components in one file...


Obviously you can split your code into many files in a way that obfuscates the workings of the program pretty much comoletely. And I think you can write a single 10kloc file that is so well organized that it is easy to read. I just never have seen one...

I believe that files offer one useful level of compartmentalizing code that makes other people easier to understand what is going on by just looking at the file structure before opening a single file. The other guy can't grep anything before they have opened at least one file and found some interesting function/variable.


Mathematicians rarely collaborate by forking and improving other papers. They rewrite from scratch, because communicating ideas isn't considered the important part, getting credit for a new paper is.


Fair enough, though I've been working as a programmer for over 6 years now (and I've been programming on and off, as a hobby, for over 3 decades).


I often want to have multiple parts of the code open at once. Sometimes 5-10 different parts of the code (usually not so many as that, but it depends what I'm doing) so I can flip between them and compare code or copy and paste. Most editors I've used don't have good support for that within a single file (and having 5 tabs with the same name isn't going to make it very easy to tell which is which).


Very good point, though this is really a deficiency of "most editors". ATM I have 300+ files open in my Emacs (which is no wonder given that I work on several projects, and my current Emacs uptime is over 6 days, which is even fairly short). Using tabs is bad enough with 5+ files, tabs with 300+ would be a nightmare.

That said, Emacs can show different parts of the same file in several buffers, a feature which I use quite a bit (not every day, but still). And of course I can rename them to something mnemonic (buffer name ≠ file name). So I personally don't find this convincing at all.


Because often you don't know what to grep for, or search is too general and returns lots of irrelevant results, or perhaps you're just in process of onboaring on the new project and you want just to browse the code and follow different logical paths back and forth...

and when dealing with the code that's well-organized and grouped into logically named files and dirs, you simply can navigate down the path and when you open a file all related code is there in one place without extra 10k lines of misc. code noise.


Just one of many reasons: parallel compilation.


Another excellent point, but only applicable to compiled languages (so still not my case).


Just two of many reasons: parallel linting, parallel parsing.


Ah. Silly me. Still, linting/parsing 5000 sloc is almost instantaneous even on my pretty old laptop.


5000 LoC isn't that much.

There are many other reasons why separation is better for humans (separation is organization) but these arguments about parallelism are valid for androids, humanoids and aliens.


It is a kind of cult really. Along with the rise of modern editors which somehow craps out at relatively large files. So small files are sold as well organized, modular, logically arranged codebase. You see these lot of adjectives to support short files. None of them seem to be obviously true to me.


Generally speaking those 10k lines files are shit code, regardless of context

Generally speaking it's only Enterprise(tm) code that has miniscule (not small, miniscule) files with shit code

As long as it's good code, no one cares. If you aren't working on Enterprise(tm) code odds are every bad code you'll look at has way too many lines in the abstraction unit that is being used


If you change 1 .c file, your compiler needs to recompile the .o file for that file and link it with the unchanged .o files.

If you have everything into one .c file, you need to recompile the whole thing every change.


If the file only contains static methods, it doesn't matter too much. However a class with mutable state should be kept pretty small imo.


This only applies to OOP, no?


Or module level state (Go, Python) which is in many ways even worse


grep can search multiple files. Tags can jump between files.

LaTeX is an obvious case: why pay to recompile the whole huge project, for each tiny change?


Actually, LaTeX has to recompile everything even after a tiny change (there are rare exceptions like TikZ, which can cache pictures, but even then it can put them into some file on its own AFAIR, so that I can still have one big file).

Now that I think about it, LaTeX is a very bad analogy anyway, since LaTeX documents usually have a very linear structure.


Broadly my heuristic for this is, "Would it make sense to run these functions in the other other?".

If you split up MegaFunction(){} to Func1(){} Func2(){}, etc, but it never makes sense to call Func2 except after Func1, then you haven't actually created two functions, you've just created one function in two places.

Refactoring should be about logical separation not about just dicing a steak because it's prettier that way.


I think that's a reasonable heuristic, but I'd also say you have to take into account the human readability aspect of it. It sometimes does make sense IMO to split solely for that, if it allows you to "reduce" a complicated/confusing operation to a string name, leading to it being easier to understand at a glance.


Shouldn't split MegaFunction into Func1 and Func2, you should:

  MegaFunction()
  {
  Func1();
  Func2();
  ...
  FuncN();
  }
People will call MegaFunction() but it will be logically split internally.


This is how it's taught in school but the argument is that you shouldn't do that, instead just inline Func1 and Func2 and comment it better.

By chopping up MegaFunction like this, you've not actually separated Func1 and Func2 if they aren't really independent, they're just MegaFunction in disguise but now split across two places making it more difficult, not easier, to reason about.

If you need the state (implied or explicitly passed in and out) from having executed Func1 to run Func2 then you're just creating spaghetti code. You're taking a big ball of mud and smearing it around instead of actually tackling the abstraction.

This is how you end up with Func(a, b, c, d, &e, &f) which ends up changing your MegaFunction state.

Or just as bad, Func1,2,N are all private functions, never called anywhere else outside the class MegaFunction is in, so logically (and from the point of view of testability) it's no different to having them all inline.

If you're creating a function that's only ever called once, then instead of a function call you're probably better off with a comment to "name" that block and explain the process instead.


> you shouldn't do that, instead just inline Func1 and Func2 and comment it better.

Frankly It Depends(tm). Sometimes you can do this and not pass too much state, sometimes you can not. Sometimes your state is in a class, or global, and you just pass the class around.

I have somewhere a 3000 line state machine with the app state in a class - i just pulled out logically connected states in auxiliary files when it went over 1k lines. In my case it's easy to comprehend because groups of states are kinda separated logically.


As someone who works at a place that previously lived by:

>It's okay to not organize your code. It's okay to have files with 10,000 lines. It's okay not to put "business logic" in a special place. It's okay to make merge conflicts.

I absolutely disagree. It's "okay" if you're struggling to survive as a business and worrying about the future 5+ years out is pointless since you don't even know if you'll make it the next 6 months. This mentality of there being no need for discipline or craftsmanship leads to an unmanageable codebase that nobody knows how to maintain, everybody is afraid to touch, and which can never be upgraded.

You don't see the overhead of throwing discipline out the window because it's all being accrued as technical debt that you only encounter years down the road.


I think we're all confused over the definition. Also one might understand what all the proponents are talking about better if they think about this more as a process and not some technological solution:

https://github.com/tinspin/rupy/wiki/Process

All input I have is you want your code to run on many machines, in fact you want it to run the same on all machines you need to deliver and preferably more. Vertically and horizontally at the same time, so your services only call localhost but in many separate places.

This in turn mandates a distributed database. And later you discover it has to be capable of async-to-async = no blocking ever anywhere in the whole solution.

The way I do this is I hot-deploy my applications async. to all servers in the cluster, this is what a cluster node looks like in practice (the name next to Host: is the node): http://host.rupy.se if you click "api & metrics" you'll see the services.

With this not only do you get scalability, but also redundancy and development is maintained at live coding levels.

This is the async. JSON over HTTP distributed database: http://root.rupy.se (2000 lines hot-deployable and I can replace all databases I needed up until now)


How can I sell your idea?

I easily find my way in messy codes with grep. With modules, I need to know where to search to begin with, and in which version.

Fortunately, I have never had the occasion to deal with microservices.


I'm unsure how to "sell" this idea. I don't want to force my view on others until I truly understand the problem that they're trying to solve with modules/microservices.

For searching multiple files, ripgrep works really well in neovim :)

[1] https://github.com/BurntSushi/ripgrep


> With modules, I need to know where to search to begin with

You can just grep / search all the files.

> in which version.

grep / search doesn't search through time whether you're using one file or many modules. You probably want git bisect if you can't find where something is anymore (or 'git log' if you have good commit messages),


grep can search multiple files at once.


I actually use ripgrep.

But modules can be in different repositories I haven't cloned yet.


Just use a proper IDE. It doesn't care about how your code is structured and can easily show you what you look for in context.

(And other tools like symbol search https://www.jetbrains.com/help/idea/searching-everywhere.htm...)


and it also does not magically guess what is in modules you haven't fetched yet. (I use LSP when I can)


Of course it can't know about non-existent sources. But when the sources are there, it's light years ahead of a simple text search that is grep/ripgrep.


Repo search.


> It's okay to have files with 10,000 lines. It's okay not to put "business logic" in a special place.

Couldn't disagree more. As usual, it's a tradeoff. You could spend an infinite amount of time refactoring already fine programs. But complex code decreases developers productivity by orders of magnitude. Maybe it's not always worth refactoring legacy code, but you're always much better off if your code is modular with good separation of concerns.


... then before you know it your code-base is 10 million lines long and you have no idea where anything is, what the side-effects of calling X are, what's been deprecated, onboarding is a nightmare, retention of staff is difficult, etc. You may be right for the smallest of applications, or whilst you're building an MVP, but if your application does anything substantial and has to live forever (a web app, for example), then you will have to get organised.


I think we are conflating (at least) two different issues here because we don't have a good way to deal with them separately. Closure on one hand and implicit logic on the other.

Most of the time when I split out a method what I want is a closure that is clearly separated from the rest of the code. If I can have such closures, where input and output is clearly scoped, without defining a new method, that might be preferable. Lambdas could be one way to do it but they still inherit the surrounding closure so it's typically not as isolated as a method.


> I think we are conflating (at least) two different issues

For sure, I have absolutely no idea how your comment relates to what I wrote.


Oops, it looks like I answered the wrong comment here. :/


It happens to the best of us! :)


> It's okay to have files with 10,000 lines.

I find there are practical negative consequences to having a 10,000 line file (well, okay; we don't have those at work, we have one 30k line file). It slows both the IDE and git blame/history when doing stuff in that file (w.r.t. I'll look at the history of the code when making some decisions). These might not be a factor depending on your circumstances (e.g. a young company where git blame is less likely to be used or something not IDE driven). But they can actually hurt apart from pure code concerns.


I think you are missing one point: mental load. I doubt people keep all their files in one directory or all their emails in one folder or just have one large drawer with every possible kitchen utensil in a pile. The same is true for code. Organizing your code around some logical divisions allows for thunking. I will agree that some people take it too far and focus too much on "perfect". But even some rudimentary directories to separate "areas" of code can save a lot of unnecessary mental gymnastics.


Not sure I totally agree, but one strong point against factoring in-line code out into a function:

You have to understand everything that calls that function before you change it.

This is not always obvious. It takes time to figure out where it's called. IDEs make this easier but not bullet proof. Getting it wrong can cause major, unexpected problems.


Organizing code and services so that a rotating crew of thousands of engineers can be productive is critical to companies like Amazon and Google and Netflix. Inlining code is a micro-optimization on top of a micro-optimization (that is, choosing to [re]write a service in C/C++), not an architectural decision.


I agree, but the problem is that eventually it becomes not okay. So it requires a bit of nuance to understand when it is and when it isn't.

Unfortunately most engineers don't like nuance, they want one-size-fits-all solutions.


Modules are lot cheaper if you have a solver for them.

Microservices are the same modules. Though they force-add distributiveness, even where it can be avoided, which is fundamentally worse. And they make integration and many other things lot harder.


What is a "solver"? Do you have any resources where I can learn about them?


Well, anything what allows you to express the wiring problem in formal terms and solve it. A dependency injection library. Implicit resolution mechanism in Scala. They may solve basic wiring problem.

distage can do more, like wire your tests, manage component lifecycles, consider configurations while solving dependency graph to alter it in a sound manner.


Blow and Carmack are game programmers. They are brilliant, but their local programs and data are tiny compared to distributed systems over social graphs, where N^2 user-user edges interact.


Their programs interact with different APIs and constraints on many different sets of hardware. 5 different major hardware targets, 8 operating systems, and hundreds of versions of driver software. It’s hard to do all that well and keep things validated to continue running with minimal ability to update and support it. Web programs barely work on two browsers. Server targets are generally reduced to a single target (Docker or a specific distribution of Linux).

Their tiny data is rendering millions of polygons with real-time lighting, asset loading in the background, low latency input handling, etc. at 60fps+. If my API responds to the user in 100ms, they’re ecstatic. If their game responds to the user in 100ms (5-10 dropped frames) even a few times, they’re likely to ask for a refund or write a bad review, hurting sales.

The constraints are different, but game programmers do the same kinds of optimization social networks do, just for different problems. They avoid doing the N^2 version of lighting, path finding, physics simulations, etc. Social networks avoid it when searching through their graphs.

I think the web and other big tech companies should try thinking about problems the way game programmers do.


Seems like you're talking about algorithms issues, not code complexity. If your code needs to scale (at all, never mind quadratically) with the size of the data, you're doing something very wrong.


You say that like real-time multiplayer gaming doesn't exist or something. Both of them have worked on those.... I think Carmack invented a lot of the techniques we use for those.

Yeah sure, the scale is smaller, but you can't get away with making the user wait 10 seconds to render text and images to a screen either. I think the software world might be a lot better place if more developers thought like game developers.


Let's not turn it into penis measuring contents, please. Code organization and requirements differ significantly between different programming niches, and the today's accepted practices are not some randomly invented caprices, but the result of the slow (and painful) evolution we've been fighting through past decades. Each niche has optimized over time for its own needs and requirements. My web apis have hundreds of controllers and keeping them in separate files makes it way easier to manage. I know that because we used to keep it all in a single file and it sucked, so over time I learned not to do it anymore. Does it mean that embedded systems devs should organize their code in the same way? I have no idea, that's up to them to decide, based on their specific environment, code and experience.


> Each niche has optimized over time for its own needs and requirements.

Sure, but those "needs and requirements" aren't necessarily aligned with things that produce good software, and I think a lot of software development these days is not aligned. Further, I think the evolutionary path of the web in particular has produced a monstrosity that we'd be better off scrapping and starting over with at this point, but that's a tangential discussion.


> My web apis have hundreds of controllers

That you probably don't even need, but there's a paradigm of "every function should be it's own class" that some devs seem to follow that I will never understand.


I don't do one function one class mantra, but I absolutely need the separate controllers to group methods because each set of them does different things and returns different data. If in my 20+ years of web dev I learned one thing, it's that trying to be too smart with code optimizations and mixing different logic together is never a good idea - it will always backfire on you and everything you "saved" will be nullified by extra time and effort when you're forced to untangle it in future. The whole point of what I wrote was that there's no recipes that can be just uncritically applied anywhere, you need to adapt your style to your particular needs and experience. If you don't need many controllers, great for you... but don't presume you can just copy/paste your own experience on every other project out there, and we all are stupid for doing it differently...


You greatly underestimate the complexity of games, and greatly overestimate the complexity of working with distributed systems over social graphs


I've worked extensively in both. Both are complex. Games typically have a complexity that requires more careful thinking at the microscopic level (we must do all this game state stuff within 16ms). Web service complexity requires careful thinking about overall systems architecture (we must be able to bring up 10x normal capacity during user surges while avoiding cost overruns). The solutions to these problems overlap in some ways, but are mostly rather different.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: