Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
-2000 Lines of Code (2004) (folklore.org)
317 points by chanux on April 2, 2014 | hide | past | favorite | 139 comments


My favorite anecdote on code removal comes from using Sonar, a static code analysis tool.

At a previous job, a pretty smart architect decided that Sonar was a pretty good tool to gauge code quality. He wasn't wrong. However, as a way to improve code quality, management decided to make the continuous integration server demand 90% test code coverage minimum, including branches, and no loss of code coverage over half a percent from maximum, as measured by Sonar.

So I inherited a large, rather terrible application, which included a whole lot of UI code using Swing. Swing has it's qualities, but being easy to write meaningful tests for isn't one of them. So my predecessors often left some chunks of frontend relatively untested, while making sure large amounts of business logic was well tested. So to keep the application in order, I started doing a bunch of refactoring. I cut the size of the backend in half, and all the tests kept passing. But my commits were rejected by the build.

So if you have 10K lines of untested code, and 90K lines of very ugly, badly factored, but tested code, you just couldn't remove 10K of the bad, well tested code, because that would sink coverage metrics.

After much arguing for relaxing the rules, as I didn't think that adding tests after the fact to bad UI code seemed like a good use of our time, I fixed it in the only sensible way I could: Added an extra ten thousand lines of well tested code that didn't actually run in production, but made the metrics happy. Only then I could get the build to pass again.


Terrible, but a necessary evil I guess. Sometimes you gotta just let sleeping dogs lie and start removing unused code as you go.


Haha. Test coverage requirements + no code review = just begging for this to happen.


There's a saying that goes something like "Measuring the progress of developing software by lines of code is like measuring the progress of manufacturing an airplane by kilograms of mass."


Also reminiscent of this quote from Dijkstra [0]:

  My point today is that, if we wish to count lines of code,
  we should not regard them as "lines produced" but as
  "lines spent": the current conventional wisdom is so
  foolish as to book that count on the wrong side of the
  ledger.
[0] http://www.cs.utexas.edu/users/EWD/transcriptions/EWD10xx/EW...


I believe it was Bill Gates:

"Measuring software productivity by lines of code is like measuring progress on an airplane by how much it weighs."


And yet: Windows 8: 20gb (vs 1gb Ubuntu) Visual Studio: 2gb (vs 300mb Intellij) SQL Server: 5gb (vs 100mb Postgres) .net: 2gb (vs 20mb Python) (All guesses and silly comparisons, admittedly)

There seems to be quite a lot of lines of code there.


To be fair, even Microsoft doesn't use the binary size of their applications as a measure of their capabilities. Yes, they are ginormous, but they aren't advertising "Windows 8: The Binaries are Bigger, so It Must Be Better!"


Bill Gates has about as much to do with Windows 8 as William E. Boeing has to do with the 787.


You can say that about all of Microsoft's operating systems.


Didn't he have some contribution to MS DOS?


He and Paul Allen wrote Altair BASIC, a BASIC interpreter for the Altair 8800. I believe they just bought MSDOS.

http://en.wikipedia.org/wiki/Altair_BASIC


Other than writing the check?


In some circles, that's the most important contribution. ;)


Windows binaries (PE file format) stores more than just executable code. You can store any resource data you wish; from string tables to images and icons through to zip archives (hence how self-extracting archives work).


More on that:

http://ianmurdock.com/platforms/on-the-importance-of-backwar...

Original ASP.NET blog entry was gone, best I could find.


> There seems to be quite a lot of lines of code there.

Oh yeah, that's Moore's Law at work here too!


Tangent: I once stopped logging visits on a website of mine because the actual metric I care about is the user interaction through comments (and using certain features on my site). I can highly recommend this if the anonymous visits to a site actually really don't matter to you.


But then how do you know whether your lack of comments is because there isn't a large enough pool of potential commenters or if your content isn't engaging enough to incite them to post?

You're throwing away the top end of the funnel so it's impossible to know whether it's leaky or just completely dry all the way through.


> "But then how do you know whether your lack of comments is because there isn't a large enough pool of potential commenters or if your content isn't engaging enough to incite them to post?"

The solution to either problem in those examples is the same. More engaging content will drive more visitors to the site as well as stimulate more conversation.

However I do agree with you in theory about how it's good to keep a log of traffic; with the caveat that it's not used as the primary measure of activity when conversation is end goal of the site.


At my last job I wrote a script to walk over git-log and add up all the line number differences (skipping merges). By the time I left, my net code contribution was something like -290kloc. I was the only person with a net negative, though it was a smallish team.

With the sheer amount of bad code people write, I expect to do a lot of deleting, refactoring, and rewriting, and I'd hope managers/fellow team members would be able to see the value in that. But sadly, they usually don't.


People also leave dead code around instead of deleting it. That dead code is poison to the maintenance programmer because he's spend plenty of time trying to understand what role it plays in the system of tracing a complicated bug.

I'd say it is open season on commented-out code. If code has been commented-out since a few weeks it is good to delete it just as a matter of course.


You should never commit commented-out code. That's what version control is for. If you need that code back, just fetch it from the repo.


Agreed, but I have seen one time (and only one time) where deleting a piece of code instead of leaving the comment generated a compiler bug that generated some weird code. Put comment in - good code, delete comment - bad code. It was really, really odd. Add an ignored assignment (a=1), good code. So, we left a comment with a "DO NOT DELETE". Next version of the compiler fixed the bug and we deleted the comment. Weirdest damn thing I ever saw.


Yes, but in my book, that's not commented out code, that's an (undocumented) compiler directive.

And I've done similar things with "undocumented compiler directives" for XSLT processors before. If you didn't leave the comments in then you got the dreaded "GregorSamsaException". I refer to it as "dreaded" because folks on our team dreaded it... the exception occurred at unpredictable times and who the hell was Gregor Samsa anyway, and what did that have to do with our Java application?

Turns out, Gregor is the main character of Kafka's "The Metamorphosis" and the exception was the brilliant idea of someone who wrote the XSLT transformer and probably thought it was cute. (It wasn't.) It was an internal exception in the transformer. (Get it? Metamorphosis? Transformer? Never mind.) It occurred when a certain buffer filled up, and the exact length of the XSLT input file affected that, so adding a few lines of commented-out content would make the error appear or disappear.

I wrote a lengthy essay explaining the above facts, and used commented-out excerpts of that essay as padding in the file. Yes, I was trying to be "cute" also, but I was too young to realize that was a bad idea.


What language was it?


C compiler from a unix vendor in the late 90's.


Maybe the dead code was preventing the C compiler from carrying out a buggy "optimisation"?


Could have been. Although a comment or a printf kept the code working, and removing that line broke the line above it. Next compiler release fixed it. It was just odd and a lesson in compilers are not perfect.


Occasionally I'll leave useful debugging log calls in there. It's a lot quicker to just uncomment the code when it's needed, rather than pulling that bit out from source code.

Also, even more occasionally, I'll leave some incomplete code commented out as an obnoxious reminder to complete it later. This is especially useful if the code wasn't ever committed before, so checking it out via source-control isn't an option. The very fact that it's not really supposed to be there is a good motivator to implement it!


Ideally you use log-levels and log-sources for this. Leave the logging calls in the production binary, but on DEBUG level. If there's a problem, bump the log-level up to DEBUG for just the subsystem you care about. Then you don't even need to recompile to get your logging information, you can just flip a flag - sometimes without even restarting the app.

I'll admit to just commenting and uncommenting log lines before, though - learning how the logging systems of major programming languages work takes some time, and there's an up-front cost to starting to use them.


The other thing is, you have a bunch of wasted calls to a logger if you don't want any logging to take place. Minor, but if you are working on embedded systems etc... you won't be using a logger, you'll more than likely just comment out debug statements and leave them in.

It all really depends on the application and how much performance is an issue.


Typically you wouldn't put a logging statement inside a hot inner loop anyway, as you'll get so much output when logging is turned on that it makes the logs worthless. For heavy algorithmic-intensive stuff, you'd often just record your loop counters, key invariants, and exit conditions, and then log them all at once when the algorithm exits.

I have little embedded systems experience, but what I gather from talking to folks who do is that they also use logging APIs, but they avoid logging from inside a hot inner loop (their log statements are usually around startup/initialization and when the system receives certain inputs), and they use custom log writers that write the logs to a host server or desktop system when plugged in for debugging rather than taking up storage space on the device itself.


> Typically you wouldn't put a logging statement inside a hot inner loop anyway, as you'll get so much output when logging is turned on that it makes the logs worthless. For heavy algorithmic-intensive stuff, you'd often just record your loop counters, key invariants, and exit conditions, and then log them all at once when the algorithm exits.

Debugging is a highly interactive art and sometimes operates inside a much faster feedback loop than that. Put those few log calls in the innermost loop, run it, scan the 20MB dump for a weird entry, fix, remove logger calls, and you're done in less time than it takes to consider which key invariants and exit conditions are worth tracking.


One of these days, I really need to post my C++ logger class. Don't know if you're doing embedded C++, but I've used this class in VxWorks, and if you're not logging anything (the default build with NDEBUG does nothing), it should optimize away.

For now, I can post a link to my inspiration: http://wordaligned.org/articles/cpp-streambufs

Extending from there is pretty straightforward, albeit you can hit some dark corners of C++ (I spent a few weeks tracking down a double link error caused by not templatizing an addition to the Logger that gave the capability to output to MSVS's debug window). This is also one of those very few cases in which I have justified using multiple inheritance, virtual inheritance, private inheritance, and templates.


OpenOffice.org went through contortions as it changed version control systems over the years. It got to the stage where devs would comment out code rather than deleting it because they didn't trust the VCS.

LibreOffice put the code into git and went mad with an axe deleting all the commented-out code.

Apache OpenOffice, on the other hand, still commits new commented-out code. http://mail-archives.apache.org/mod_mbox/openoffice-commits/... Possibly nostalgia for the good old days at StarDivision.


This is a bit of a tangent but I've been on teams that demanded you rebase everything so that there's a linear history.

Of course this leads to nasty conflict resolutions, so they solve that by squashing all of their commits before rebasing.

Now your code is no longer in the repo.


But if it never appeared in the main branch then it doesn't really matter, does it? You don't want to clutter up the history with experiments and false starts.


Weirdly: yes, I do. It's handy, and those experiments show what I've done to tackle a problem. Numerous times those experiements that I've left in the repo somewhere have come in handy.

Hell, we keep our feature branches around here. *shrugs


I'm not sure that you don't. Having a history of how you got to the current version could help future maintainers understand why you made the decisions you did, and prevent them from making similar mistakes.


I tend to work out of a private branch, and commit early and often. So I end up with a lot of commits that say "fixed foo to be 1", then "oops, I meant it to be 2", then "No, it should be -7", etc. I'll drop a tag, then rebase a group of commits to re-order them (so all the fix, to a fix, to a typo mistake that shouldn't have been there) are together. Then sometimes I'll drop another tag, then squash, then rebase onto the main branch. Those temporary tags then get pushed only to a separate repo that only I write to (the one that everyone else uses gets the cleaned-up presentable code).


Well, I definitely prefer the rebasing route, but with the caveat that everyone's working on the same branch, so there's not much divergence. The only rebasing occurs when somebody commits before you do, and it's only your own local commits that need conflict resolution.


That seems like a git anti-pattern. On the other hand, one or two repos where I work have developed some, uh, interesting network graphs, as various branches have merged and evolved.


What do you mean by "your code is no longer in the repo"?


I suppose (s)he means that, if you squash commits, you lose the intermediate states containing code that was later deleted. It's not necessarily bad - squashing is a useful tool to clean up your changes before you merge them in.


If you squash together the commits that added and deleted a line, that line never existed.


Disagree wholly.

I don't think that a commit should be littered with commented out code, but there clearly are positive reasons to have some.

Besides, if the comment is not there, how can you be expected to know that there is old but still relevant code in the repo?


Out of curiosity: What are the positive reasons in having commented out code?


Sometimes just the presence of code triggers a bug because of dynamic dispatching / whatever, and a hotfix is necessary. You could remove, but then later programmers wouldn't have context to fix the bug properly.

    # TODO: Figure out why do_foo is triggering a bug
    # http://my-company.org/issues/42
    # def do_foo(self):
    #   ...


If the code is commented out, how can it be relevant?


"We cannot enable this yet, because foo isn't implemented yet, or because bar doesn't work correctly."


I don't agree. You should probably have a weak preference for storing unused code only in VCS, but if you think something will be needed soon I don't think it makes sense to delete it - c.f. "You should never keep values in cache. That's what disk is for."


Working on projects that have existed for years, and have been maintained by several different people, you don't always know what code there is that you could possibly get back out of the repo, without exploring it version by version.

I can imagine there being instances where leaving commented out code could be helpful -- including comments about why you thought it could be helpful, and why it is commented out!


I can almost guarantee that, if your code bases version history is too messy to find old code, then it will be even worse if you were to favor commenting out to deleting.

That said, I do see where you're coming from. Part of the issue is that we (well, most of us) don't have good ways of searching old code. There is Codeq ( https://github.com/Datomic/codeq ), which is prettydamncool™ ...hopefully we'll start to see more systems like it.


Ok, I can understand that a good version history will allow you to find what you are looking for.

But how will you know that you should look for it in the first place?


Yes, well -- and I suppose I should add that I'm not advocating for an environment in which keeping commented-out code around is a good idea. Just suggesting that there could exist situations in which it is a reasonable thing to do.

I've increasingly been noticing that a lot of really good development practices make sense if you're starting from scratch and can employ them right away, but sometimes if you've got years or decades of legacy code and legacy process (and code that was written as the result of legacy process) to deal with, the right thing to do isn't always so clear.

(I've unfortunately/fortunately been working some recently on a very large, very old code base that mostly doesn't need updating. Trying to unravel its mysteries enough to add a new feature has been an interesting experience.)


This is actually one of my favorite things about version control. If I know it's been committed at one point, I can delete anything and breathe the sweet air of clean code.


But how does the next guy know it's been committed at one point?


Or better yet, ban commented out code from getting into the source tree altogether.

If the code is in past commits, there isn't a reason to muddy the source tree with it.


Some people just don't understand that it's harder to maintain the code than to write new ones. They are probably not appreciating your refactorizations because they have not got to a point that it is almost impossible or super hard to write new code because the codebases are a large mess (they probably haven't got to this point because of your great refactorizations). But one day when they reach that point, I think they are probably going to miss and appreciate your past work.


I'd like to see that script.


Here is a sample implementation:

    git log --author=rav --numstat --no-merges --pretty=format: |
        awk -e '{ a += $1; b += $2; } END { print a; print b; print a-b; }'
produces this output (sum additions, sum deletions, net line additions):

    112529
    85383
    27146


To count by author:

    git log --numstat --no-merges --pretty=format:%an
        | awk '
          author == "" { author = $0; next }
          /^$/ { author = ""; next }
          { added[author] += $1; removed[author] += $2 }
          END { for (author in added) {
            print author, "added", added[author], "removed", removed[author], "sum", added[author]-removed[author]
          } }'
        | sort -n -k 7


That is great! Thank you.


Ha, nice, there's two options that definitely would've simplified my code (--numstat and --no-merges).


Unfortunately I forgot to grab it off my work laptop when I got laid off. I would've liked to have stuck it up on github...

It wasn't really all that tricky though, it took me a few hours to write. git-log has options for only displaying the status line of diff-stat for each commit, and then displaying the parents of each commit, and the author. You look to see that there's only one parent (so it's not a merge), parse out the X added, Y deleted numbers, and stick them in a dictionary keyed by name.

A lot of the script was just getting statistics like average/min/max/stddev line counts, and printing them nicely.


Though I've never used an Apple product, I adore folklore.org. So many great tales of engineering. When I hear the word "hacker," these people come to mind. A shame they didn't play a larger role in the company's direction (see "Diagnostic Port"[1]).

[1]: http://www.folklore.org/StoryView.py?project=Macintosh&story...


Thinking about the modern Mac Pro, I'm shocked to see people haven't referenced this story more.

No internal slots, isn't this Steve Jobs' dream machine?


Sometimes, less is more.

Some said: "A piece of art is only finished, when there is nothing left, that can be taken away."

The art in computer programming is, to find ways to bring the problem to the point, to find out what is really necessary to solve the problem (and not more). This reduces (often, not always) the runtime, the amount of memory needed -- and (most importantly!) the amount of maintenance that is needed. The maintenance of a program is directly dependent on the number of code lines.

Many companies start fast with a superior product, but than comes the time of growth and growing demand, new employees are rushing in ... and the number of lines explode. That is the point of danger. The company is about to strangle itself. The number of errors are rising.

I remember an old, but once famous database product. The first 2 or 3 versions where great and the company grew out of 4 developers to a horde. The next version came out much much later than expected and was first a bug ridden chaos. The problem: The number of employees and the number of code lines grew faster than the company could manage them.

It is also said: "Adding new members to a late project, makes it later" That's because of the overhead of managing those peoples and the added code does not always add to project speed.


That first quote is from Antoine de Saint Exupéry. http://en.wikiquote.org/wiki/Antoine_de_Saint_Exup%C3%A9ry

Then again, "A work of art is never finished, it is abandoned." (http://www.quoteyard.com/art-is-never-finished-only-abandone... )

The database product sounds like dBase. If so, certainly management focus was also a factor. http://en.wikipedia.org/wiki/Ashton-Tate#dBASE_IV:_Decline_a... attributes it to a push for Diamond. Its source material is unattributed. While http://www.dbase.com/Knowledgebase/dbulletin/bu03_b.htm says it was a management push towards OS/2.

Your last quote is from "The Mythical Man-Month" by Fred Brooks(1975) and is called "Brooks's Law".


Both statements about art are true: you will never achieve perfect simplicity, but you should try to get as close as you can!


Minimalism is but one of many art styles, so I disagree with the goal of achieving perfect simplicity.


Sure, but in the context of code and complexity, minimalism is your best bet.


You are against Easter Eggs, I gather? ;)

More seriously, of these two implementation for Python's str.lower(), which is "minimal"?:

    for (i = 0; i < n; i++) {
        int c = Py_CHARMASK(s[i]);
        if (isupper(c))
            s[i] = _tolower(c);
    }


    for (i = 0; i < n; i++) {
        int c = Py_CHARMASK(s[i]);
        s[i] = _tolower(c);
    }
I suspect most would say the second is minimal, but the first is what Python uses, because it's measurably and distinctly faster than the second, and that extra performance is worth the extra maintenance overhead.

That's one reason I believe that programming-as-art metaphors don't really apply. Minimalism as an art form not the same as minimizing the cost function of different uncertain factors.

Anyway, this reminds me of a story from the Tao of Programming, http://www.canonical.org/~kragen/tao-of-programming.html :

There was once a programmer who was attached to the court of the warlord of Wu. The warlord asked the programmer: ``Which is easier to design: an accounting package or an operating system?''

``An operating system,'' replied the programmer.

The warlord uttered an exclamation of disbelief. ``Surely an accounting package is trivial next to the complexity of an operating system,'' he said.

``Not so,'' said the programmer, ``when designing an accounting package, the programmer operates as a mediator between people having different ideas: how it must operate, how its reports must appear, and how it must conform to the tax laws. By contrast, an operating system is not limited by outside appearances. When designing an operating system, the programmer seeks the simplest harmony between machine and ideas. This is why an operating system is easier to design.''

The warlord of Wu nodded and smiled. ``That is all good and well, but which is easier to debug?''

The programmer made no reply.


Or, as Einstein put it, "It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience."

Rephrasing for the software world: "...without having to make appreciable sacrifices in user experience."


BTW there is a piece of art which contains only one character (space). It was written by Russian author Баян Ширянов (http://ru.wikipedia.org/wiki/Баян_Ширянов ). AFAIK there is no English translation, but you can read it in Russian on http://lib.ru/PROZA/KLOCHKO/probel.txt


I love people who can leave a program functionally better with fewer lines of clear code. That last bit is important, obfuscated code is clever, but not useful.


> That last bit is important

Recently I've been learning that from two angles. First I'm realizing that being a bit more verbose is often clear and safer. Second, I've been reading and learning a lot about compilers, and I'm realizing that my "efficient" (short) code and other trickery does diddly-squat, and the compiler will produce the same code either way.


My favorite optimization: replaced 10,000 lines of code that marshaled structures in a Broadcom driver for transmission to the embedded processor, with a single template of ~20 lines.


I once had the opposite happen, I wrote a ~12 line hack to get something to run which was replaced by ~10,000 lines of code that added no new functionality but was less 'ugly'.


Awesome. I once replaced an intern's 540 lines of code with 12 lines of my own. Feels great.


An intern? Don't pat yourself on the back TOO hard... ;-)

I once sped up an intern's code by just deleting a 30-line function, and doing nothing else. Doofus didn't realize our language had a highly optimized built-in sort, and so he wrote his own inefficient sort (insertion) that overrode the existing one. Poor little guy was so proud of having chosen exactly the right sort, and then implementing it based on his recollection of college... I told him to spend the next few hours just reading the documentation of the core API for our language.


A quote I'm somewhat fond of: "A couple months in the laboratory can save a couple hours in the library."

Unfortunately, I forget who said it.


The programming equivalent that I've heard is: "Weeks of programming can save you hours of planning".



Thanks!


> A quote I'm somewhat fond of: "A couple months in the laboratory can save a couple hours in the library."

Isn't that backwards?


No. It's correct.

I can say this having spent months in the laboratory to learn things which were already known. Who knew that precision quartz pressure gauges were called 'manometers' in the late eighteen-hundreds? I didn't, and thought I'd invented something new for a couple months in my first year of grad school.

The adage is a good one. Reading a lot and reading widely pays off.


That idea might imply that future Google's, using its old self definition are going to play an increasingly critical role in human progress.


No. It's for effect.


No its quite correct, especially holds true for other sciences such as Biology and Chemistry


Calling an intern Doofus is not a good attitude to take. Remember we were all programming noobs at some point.


An intern? Don't pat yourself on the back TOO hard... ;-)

Who's to say an anecdotal 'employee' is any better than his 'intern'?


I can confirm that this is so true. At one place I worked I wasn't allowed to program in C, but did the C code reviews[1].

I was riding in a car with buddies who were much better C programmers than myself and making notes on some code I needed to do the review of the next morning. This is the code pattern that almost crashed our car:

  a = some_function(i);
  a1 = &a;
  a2 = &a1;
  calc(a2);
with calc() reversing it out

I was cussing a bit much[2] and front seat passenger had to look then driver got too curious. Sadly, this was the least "wrong" thing about the code. Its very hard to do a code review where you suggest 500 lines of C can be reduced to 50.

1) cannot have the guy doing code reviews actually coding, that would be improper

2) cussing in private allows positive, motivational tone in public


He had sorting algorithms and data structures drilled in by school. They didn't tell him it's probably a bad idea to ever actually implement them in production code.


What language?


"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away."

― Antoine de Saint-Exupéry, Airman's Odyssey


No one seems to have pointed out yet that code size as a predictor of error rates (and as a measure of complexity) is one of the few findings from software engineering research that has been replicated many times. It's not clear that the studies are very good, but at least they exist. We've discussed this on HN more than once.

I'm longing for the day when people realize how significant this finding potentially is.


Every functional programmer I've met has pointed things like this out to me. As well the ability to reason about smaller code-bases.


Oh, yes. Me too. But they/we are a tiny minority with little influence over industry practice. What I long for is the day that software organizations run dramatically differently because of this principle. I think it's one of the most profound things we know about software development, and it has huge implications, but they're ignored.

Like any other powerful ignored truth—if that's what this is—its path to acceptance will likely be through somebody doing something impressive with it that hasn't been done before.


I think Startups doing impressive things at scale will help with this. For example, Whatsapp serving 500M people with code written/maintained by <50 engineers is an impressive story. What did they do differently? The post-hoc stories about success rarely discuss the leverage achieved due to the technology choices.


One of the professors I had didn't like paperwork as much as we did, so he shared a little [paraphrased] advice-story:

"Hey, I know you don't like this paperwork. I don't either, but I get asked to do it all the time. My strategy? Give them as much paperwork as I can. These forms, those memos, CCs on emails... eventually, sometimes, they ask me to stop. 'You're giving us too much paperwork', they say. 'You don't want me to give you paperwork? Works for me.'. So fill these out, and I'll take care of the rest. Now, for today's class..."


There's a quote I saw in a users' forum signature that's stuck with me; "My best code was written with the delete key."


Reminds me of this quote:

"One of my most productive days was throwing away 1000 lines of code." - Ken Thompson


In my opinion the best code is the shortest code/documentation that can be understood in one or two reads by someone that is new to the codebase. Less or more than that is bad.

exceptions are things like game programming, where certain hairy-looking tricks can be necessary and have a real benefit. can't say the same for your average web app.


I came to a company last year where they were forecasting how they were progressing on a project by counting the number of objects that had been written. As in, the tech lead estimated the project would require like 200 object classes and some PM was crossing off objects as they were being written.

Anyone who has completed a complicated functional program would probably understand how that could be a misleading measure of progress and, worse, lead people to spend time writing poorly engineered untested code. I think the people who implemented this system understood that counting lines of code was a dumb idea, but somehow thought this was better.


Couple years ago while still studying for CS, I did a telephone screen (with a recruiter, not an engineer) with Microsoft and they asked me "how many lines of code have you written in your career?" (at this point I didn't have a career in SE). That was followed up by "what is the most number lines of code you've written for a single project?". Other than scanning the diffs/commits or using some reporting/metric tool, why would you care to keep count? Seems like you are focusing on the wrong data points for recruitment and software quality.


The irony here is that when Microsoft was developing OS/2 for IBM, the latter started requiring updates (similar to the OP) showing KLOC's written in a week, and the then Microsoft engineers lamenting at how stupid that was.

Congratulations Microsoft. You've now become the bloated mid-80's IBM you used to hate.


I worked for Andy Hertzfeld once, it was awesome in many ways, but the anecdotes... so good.


I'd like to get your opinions on this...

I think programmers nowadays are a bit overly obsessed on getting lines of code down

Its certainly possible to write something in one line when you could use 6 if you use the more obscure and "fancy" tools available in the language you are using. One thing that randomly springs to mind is linq in .net Reasons why more lines might be better.

1. The longer more explicit code might end up shorter and more performant after it is compiled (like in the oldskool performance increasing trick of unrolling loops)

2.When you come back to your fancy code later you may have forgotten about that particular fancy trick and now you dont understand your code.

3. Other people are less likely to understand your code.

4. By using more specialised features in a langauge your code is now less transportable to other langauges.

Personally I think the obsession with fewer lines quickly becomes counter-productive and the main reason it is done is in order to show-off your knowledge of these fancy things.

Note: I'm not talking about the guy in the article btw. Just talking a about a general trend I've noticed in modern programming.


> 1. The longer more explicit code might end up shorter and more performant after it is compiled (like in the oldskool performance increasing trick of unrolling loops)

Or it might be less performant. If you actually care about this, you should have an automated test running regularly that will tell you one way or the other. But most of the time it's not worth worrying about such things.

> 2.When you come back to your fancy code later you may have forgotten about that particular fancy trick and now you dont understand your code. > 3. Other people are less likely to understand your code.

Or using fancy tricks more frequently can help you remember them. I think you should use every feature available in the language - developers need to be able to understand the language so that they can read third-party library code. Or else have an automated system that flags usage of particular features.

> 4. By using more specialised features in a langauge your code is now less transportable to other langauges.

Who cares? Seriously, how likely is this to actually come up? If you've chosen language X you presumably had a good reason for doing so; you should write language X, not try and write language Y in language X.

Fewest lines of code is not the perfect metric, but I think it hits the sweet spot: it's very simple to calculate, and captures a good proportion of the difference between good code and bad code.


I haven't seen much of this myself, although I can imagine it happening.

From what I've seen, bloated code tends to be code that is copied (either literally or in style) from somewhere else, which doesn't fit the task at hand. So you have functions that have lots of options, but only ever get called once with one set of parameters. The fix is to rewrite everything so that the abstractions are taylored to the task at hand.

What you are describing is a kind of excessive elegance. But this is fairly independent of the problem I described (and the article is describing).

I'll also say that there is a limit to bloat, and LOC is often a good guide to functionality. E.g. if I'm interesting in re-implementing something, I'll first look at the weight of the code. If I thought something would be a half hour job and the code is 5000 lines, I'll probably re-evaluate. Sometimes I've pondered reimplementing certain libraries, only to find they weigh in at a million lines of code.


What's funny about this is that the YCombinator application actually asks you how many lines of code your app is. I took the question as a chance to rant about the irony of a software accelerator geared on making you an "software expert" asking this question.


If you've written 0 LOC for your revolutionary artificial intelligence system, or 1,000,000 LOC for your "Instagram for Squirrels" app, I'm likely to be equally suspicious. Otherwise there's a wide range of acceptable answers for that question.

A rant probably isn't one of them.


Somehow I imagine that Instagram for Squirrels necessarily includes a revolutionary AI system.


"Instagram for Squirrels!" - great title for a pet project!


Why would you rant about this? You don't know what they're even looking for in an answer. How many LOC there are in your app is relevant but doesn't necessarily mean it's going to effect your application in any way. I think the history of some of the groups accepted speaks to this.


I seriously doubt they "rank" applications by KLOC, they just want to see if you have enough for a decent demo/prototype. Something that shows some implementation effort. And I suspect there's nothing wrong with zero if it makes sense in the context of that particular application.


Why is it ironic? Lines of code is a good estimate of complexity. The point of the article is that increase in complexity is not a good estimate of progress.


The "problem size"/"program size" ratio is very informative.


folklore.org is amazing, but it’s been around for 10 years now. I kinda want to amend to the title: “(2004) (1982)”. Heh.


Take a look at the contributor's stats for the Bitcoin project on GitHub: https://github.com/bitcoin/bitcoin/graphs/contributors Gavin Andresen, the lead developer for the project, has removed more lines than he has added.


At my last job, I was usually the one that cleaned up old or unused code after we had gone through a design change or such. It was always very satisfying. Not so much, though, for the poor chap whose code was usually the target of deletion (through no fault of his own).


What do people think of commits as a metric? It's becoming more popular for managers thanks to github's graphs.


I hypothesize that any numbers-based code metric management comes up with can also be gamed by adequately-motivated developers. In fact, if you come up with such a metric and have a developer who can't figure out a way to game it, perhaps you want to consider letting that developer go.

Were I to work in a place with a 'git commits' metric for productivity, I'd happily commit each line individually, to ensure that any potential data loss was as limited as possible, of course.

To paraphrase Wally from Dilbert: "I'm writing myself a mini-van."


"I'd happily commit each line individually, to ensure that any potential data loss was as limited as possible, of course."

Bah. While where I work has never even flirted with that metric, general agreement in my team was that the first thing to be done in the event that ever changed was write a git filter that committed one character at a time. I suppose you could take it further down to one bit at a time if you really wanted to; once you had the code for one character at a time, one bit at a time would be a trivial extension.

Less humorously, I've been complaining that our email notification system sends out a separate email for each commit in every new git branch... that is, if you are on a branch with 100 commits, and you take a new branch and commit that branch onto the server, our emailer seems to believe you just made 100 commits, and sends 100 emails. Or 1,000, as the case may be. One of these days that thing is going to take out the entire corporate email system.... of course, one never fixes the problem until it reaches that state, so I'm just waiting....


http://www.klocwork.com/blog/wp-content/uploads/2009/10/dilb...

>if you come up with such a metric and have a developer who can't figure out a way to game it, perhaps you want to consider letting that developer go.

Maybe that would be a good interview question (for a software manager): "How would you game metric X"?


probably not. Seems to me like you are forcing the candidate to either look stupid or dishonest.


Hm, how about, "How can metric X be gamed?", then?


Only acceptable if each commit results in valid code without regression (if it doesn't build/parse or breaks everything it will make bisecting a terrible experience).

Making small atomic commits is actually good practice.


"The more any quantitative social indicator (or even some qualitative indicator) is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."


PHB: "NO METRIC HAS BEATEN ME YET!!" http://dilbert.com/strips/comic/1994-11-21/


It's not just managers: plenty of people go by this number and the recentness of activity, which rewards coding before thinking.


Does anyone feel that those days are over.


Nope, there's more bad code being written now than at any other point in history (I totally just made that fact up and I stand by it). If ever there was a time to be hero through code deletion, now is the time. Carpe diem!


If I didn't have so many bills to pay, I'd say that my current project would work better on 20% of the current size of the code base. And yes, lines of code is used as a team performance metric.

And since I was formally banned from using generics and lambdas and from refactoring old code without explicit permission, I am no longer hurting the team! Yay!

This is the worst job I have ever had.


This is why one of my greatest joys as a sysadmin (where "is it working wonderfully?" is the metric) is my occasional forays into the build infrastructure codebase with an axe. Less is always, always, more.


What do you mean?


Well, another way to look at it is that you've improved 2000 lines of code. That is, you have touched 2000 lines of code and modified them in a way that makes the software better. Yes, you did delete all of them, but as a professional expert you've discovered that these 2000 lines serve the project better when not existing. Then, you wrote another 1000 lines of fresh code. In the end, you've achieved a net betterment worth 3000 lines of code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: