Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Number 6 (DRY), and number 9 (use libraries), both lead to problem 16 (leaking abstractions). Personally, I follow a rule I read in a blog some time ago and don't try to de-duplicate code until I've copy and pasted it three times. That way I know what the actual pattern that's being repeated is, not just what I think the pattern is.

As for number 9, I'll agree with this, so long as it's in the standard libraries (which are slow to change and unlikely to break/change functionality when they do change). But when you use 3rd party libraries, you are still responsible for that code in the long run. You'll have to ensure it remains up-to-date, ensure that you're testing that it's doing what you intend for it to do, and when it's eventually compromised or removed, you'll have to deal with that too.

See: Left Pad.



> Personally, I follow a rule I read in a blog some time ago and don't try to de-duplicate code until I've copy and pasted it three times

Known as "Three Strikes and You Refactor"[0]:

- The first time you do something, you just do it.

- The second time you do something similar, you wince at the duplication, but you do the duplicate thing anyway.

- The third time you do something similar, you refactor.

[0] http://wiki.c2.com/?ThreeStrikesAndYouRefactor


The problem with this is the cases when you don’t remember that you (or someone else) already used this same snippet twice. In large codebases with a lot of people working on them, this can expand to dozens of copies.


If you have dozens of instances of nearly identical code and not a single person knows of more than 2 instances of them, you've got much bigger problems.


This would be true if everyone on the team was equally involved from the very start on a greenfield project, and they aren't all already fighting other forms of tech debt.

Sadly the real world isn't so accommodating.


So, instead of dozens of copies of a snippet (which can typically be identified with tooling), the repo ends up with half as many dozens of partial abstractions? I guess it could also end up with a Frankenstein's Monster of an abstraction with more arguments and conditionals than the original code.

Like walking through a doorway, descending into a function can make you lose the context surrounding that function, making it difficult to see what is actually common between the 2, 3, or 4 different invocations of that seemingly common code.


Typically, in the teams I've been on that did the best job of keeping the code clean, we'd handle this by just being good at code review: The person submitting the change might not remember any duplicates, but there's a good chance that one of the reviewers will, if that's one of the things they're watching for.

Alternatively, if you really want to be exacting about this, there are code analysis tools that will do it for you.


Couldn’t agree more about the negative impact of this rule on a big team / large old codebase. The “copy once” rule pushes back against abstractions rather than the real problem of wrong abstractions, too big abstractions, or too complex interfaces to abstractions.

We instinctually abstract by fitting n use cases into 1 abstraction, when in fact we should be inverting the dependency graph and writing or reusing n abstractions for each use case.


I think that's okay, because those large, multi-person codebases are precisely the ones which pay a huge cost for premature abstraction.

Too much copy-pasta is a great reason to refactor.


If your code base is this big and you're adding a new feature, you really should be doing an impact analysis prior to making any changes.


Or worse, you need to change some behavior and it it only gets changed in one of those places...


That's why you have your code reviewed by other people in the team before it hits master.


I wouldn't automatically refactor on the 3rd time.

It's not enough for the three passages to happen to be identical at this moment in time. You've also go to be sure that, going forward, they will need to evolve identically and in lockstep.


I agree, nothing is a hard and fast rule. And sometimes, an attempt to generalize copied-but-customized code can result in much less readable code than just leaving the originals in place.


+1 on "premature generalization"

A similar issue I've seen a few times is people implementing an API and the use of that API in two separate changes, and as a result creating an over-generalized and harder-to-test API that tries to anticipate lots of use-cases, instead of a much simpler API that only exposes/tests what's actually needed.


Sandi Metz did a great talk on this that can be summed up in one really good quote

> Code duplication is far cheaper than the wrong abstraction.

https://youtu.be/8bZh5LMaSmE


"But when you use 3rd party libraries, you are still responsible for that code in the long run."

Sure, but you very rarely have to do it alone, and you'll only have to do it for a very small fraction of libraries.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: