I have never worked in an environment where new employees didn't show up and declare that the existing infrastructure sucks. Knowing this, I caught myself doing the same thing. Reflecting on it, for me it was a coping mechanism. I was overwhelmed by all the new things and didn't understand why this was setup that way, etc.
The reason I bring this up is I am surprised that architectural choice is overwhelmingly the top technical debt concern. That doesn't seem right. I would expect it to be poor code due to time constraints.
In most places I have been, the infrastructure DOES suck.
First question: "Backups?" followed by "When was the last time you did a restore?"
Second question: "What's your revision control?" For all of my slagging on git and the people who use it, I am thrilled if I hear "CVS" or "Subversion" because it means they're using version control. "git" or "mercurial" tells me I have an team that has some level of clue.
Third question: "Build system?" Good luck. Never yet seen it.
I took a development job once, where I learned on the first day:
1. No backups. Emergency plan = several older copies of the product's source spread among hard drives of various former employees' systems.
2. No source/revision control. Current version of the product's source code was literally whatever was on the lead developer's hard drive at the moment.
3. No deployment process. When a release had to happen, above mentioned developer would build an EXE out of whatever he had (if it built) and directly deploy it to customers.
4. No dependency management. What versions of vendor-provided libraries do we use? Whatever is on the guy's hard drive. Oh, and they're all binaries because we either lost or can't build the source anymore.
5. No bug tracker. The list of stuff that needed to get worked on was whatever the CEO complained about last. There was no concept of a backlog of existing bugs or technical debt.
6. No documentation. But are you surprised at this point?
7. Setting up an environment to build the code was a manual process that relied on searching through past E-mails for lost tribal knowledge, which basically amounted to: Try different include and library paths until there are no errors. Once built I was greeted with 2500+ compiler warnings... (out of 200 or so C++ files).
>took a development job once, where I learned on the first day: No backups, No source/revision control, No deployment process, No dependency management, No bug tracker.
And that is why one should always ask about these important things before the first day, i.e. during the interviews, before you decide if you want the job or not. Fortunately I'm in a position that the above would be a near-automatic "no" from me.
edit "Near" automatic as there might be a way out if the company knows that they need to improve drastically and has the will to do it. The other response "what were you able to do to improve the situation?" is key.
I'd like to think I left the situation far better than I found it.
Tackled source control and issue tracking first since they were low hanging fruit. Next came basic (manual) backups. My background is not in putting together general office-wide IT solutions, so I did not feel qualified to recommend any specific package. After that, it was on to a sane build and deployment process that involved a dedicated build machine and included QA. After a long time I could start safely making code changes to clean the actual codebase up.
I think you'll see this a lot in manufacturing companies who sell physical hardware. The embedded software is a critical part of the product, yet it's not treated with much more care or seriousness than anything else on the BOM. Night and day difference between working at a company where software _is the product_ vs where software is just one of the many components that goes inside the product.
Are you a consultant? It would make sense that companies that need to hire a consultant for these types of things are the companies that aren't currently doing them.
My favorite right now is pom.xml files declaring dependencies that don't exist in the corporate Maven repository, nor in any public one. Kinda defeats the whole purpose of using Maven if I have to go download jars manually.
"What's your revision control?"
"Git." fine "And Mercurial." Okay "And SVN. And CVS. Some are hosted locally, some are accounts on Bitbucket and GitHub".
"Build system?"
"Cmake. And Scons. Depending on which version of the software you're building. And Make. And some bash scripts."
A few years ago we introduced svn in our company, to replace PVCS. After 4 years, we managed to hunt most of our source code down and get it commited somewhere. Now, we have loads of contractors, some good, some bad, most cheap, and almost nobody longer than a few months. After seeing what insane things some of these guys managed to do in subversion, I simply don't dare introducing git: The history rewriting features like rebase in their hands scare the hell out of me. At least with subversion, i know that what's in there stays in there. git is great if you can trust your people, but svn is a better choice in a large political company that does not care about what happens to its code.
And dont get me started on the abuse of backups or build systems.
You need to look harder then, we tick all three and even use Gitflow. With back ups used all the time for testing live state locally and a CI 1-click (well 5 or 6 actually...) deploy to Test/Stage/Prod.
I came from an awful IT environment and even my team met your criteria. I think you might be primarily exposed, due to the nature of your work, to the worst of the worst. We had/have a lot of technical debt, but the architectural ones are the hardest to fix (because they cost real capital). You wouldn't believe how hard it is to get sign off on new servers (at some companies). My team nearly made an executive decision to switch to GCP or AWS just to escape our own internal infrastructure group.
Believe me, there are 10.000 million dollar non-software companies that barely use version control, and with extremely awful IT departments. And there are companies making a killing selling software to said companies.
I think, at least in sfbay, it's safe to assume git. And probably some CI tool. This is true across my personal sample of perhaps 15 companies that I know well enough.
The Bay Area is better than most places, but the moment you aren't dealing with a purely software shop, the probability they don't have this stuff goes through the roof.
The PCB design tool Altium only has provision for using Subversion (which is fine, thanks), but the number of people who actually USE it is miniscule.
As a first-year programmer a project manager said to me, "I've never heard a programmer praise another's code. It's always, 'This is crap!'" So I always try to be slow to criticize. There is a lot of crap, but things might not be as simple as you think. I really believe we should take more time to read and understand before we write. Or think about G.K. Chesterton's fence:
Oh no, I always get rid of stuff I don't know why it's there.
Because if it does not break when I run "make check", it's not covered by the testing and it shouldn't have been there in the first place anyway.
And if it had a use for someone, but nobody knows, removing it is also a very efficient way to figure out :)
This all depends on the nature of the business, the app, codebase, customers, processes of dev/QA/ops, etc. For example, this cavalier attitude might work at a social media startup but not at a bank.
Thanks for posting - I hadn't really considered Chesterton's Fence as a technical metaphor but as soon as you posted I immediately thought of a person who has a habit of this very act. I'm adding this term to my lexicon.
The reason I bring this up is I am surprised that architectural choice is overwhelmingly the top technical debt concern. That doesn't seem right. I would expect it to be poor code due to time constraints.
1. The system grew alongside the devs' understanding of the problem space. There was never time to change decisions that were later found to be not the best (and the older such a decision is, the harder it is to fix).
2. The problem space changes over time. Either because the outside world changes, or because of scope or scale increases. This means that even good architectural decisions can become bad over time.
Isolated pieces of bad code can be fixes when/if they need to be touched for some other reason. Fixing currently-inappropriate architecture takes a bit more effort.
You try to make the best architectural decision that you can at the time, with the knowledge and resources you have available. Time passes and you learn new things -- maybe the problem changes, or your understanding of it improves, or your understanding of alternative implementation strategies improves. For whatever reason, you can now imagine a new architecture that would be superior, if only it were implemented to replace the old architecture. Now you have technical debt. This doesn't necessarily mean that the best decision now is to pay off the debt by reworking the architecture -- that depends upon a cost/benefit/opportunity cost analysis.
edit:
Tangentially, there's a pretty interesting presentation by Kevlin Henney titled "The Architecture of Uncertainty" [1]. My poor summary: When designing the initial architecture of a system, Kevlin suggests that the team brainstorm to identify which parts of the system have a lot of uncertainty. Each region of uncertainty then becomes a subsystem. Put interfaces between the subsystems that need to be connected. Hopefully you now have an architecture with stable interfaces, even if individual subsystems need to be completely rewritten during the course of the project.
Disclaimer I haven't watched the presentation, though I do have first hand experience refactoring large systems that grew slowly over time.
Based off of your edit's description, this technique feels like a patch that inevitably fall apart. It relies on the assumption that your team can correctly identify the centers of uncertainty, and that that uncertainty model will continue to apply. Thinking about such things is an excellent idea, but it is not sufficient - uncertainty is, after all, uncertain.
I think in many cases the more important thing is to create an abstraction that allows you to perfectly represent your existing business logic in the most concise way possible. This should let you cut down on the number of edge cases outside the model, and generally simplify the system. Simplicity in specification is important, because it will allows newcomers to quickly understand the inner workings of your code, quickly correlating business logic with real code - if they can understand it, and can work within it, then they will not be tempted to hack around it (which is the root of code deterioration). I strongly believe that human friendliness and understandability should be key design goals in ANY new system, not an after thought.
So long as no one breaks the abstraction, the 99%, day to day changes should be easy. When you finally do hit a case that requires a significant abstraction change, then your concise code will make it obvious that it's outside of your abstraction model, and can evaluate options at that point.
I have found the #1 source of architecture/reality mismatches over time being that "business" has deliberately kept developers only partially informed on a "need to know" basis.
To be clear, there was nothing malicious about it, it's just that many stakeholders only give their short term needs as input, not the long term strategy that's discussed behind closed doors in the board room. The big problem here is that non-engineers don't get that some strategies don't simply add to the problem space, but fundamentally change it.
"Why the fuck didn't you tell us sooner?" is one of the most common phrases in software development.
Quite a lot of the Scrum process is the valiant attempt to extract coherent requirements from an end user. You should be so lucky as for them to have a coherent plan at all, let alone a secret one. Insofar as they do have one, in my experience there's always a few bits with magical flying unicorn ponies as a requirement they'll get to in time.
The thing that is very frustrating in the "There was never time to change decisions..." bit is that there often is time.
If you are curious about what I mean, I invite you to try a little experiment. For a few weeks (2-4 should be sufficient) tell people, "If you see something wrong with the design, refactor it as soon as you see it. If you need some extra time on your story to compensate for that work, just bring it up at standup and we will modify the sprint commitment".
If your experience is like mine, you will find that the vast majority of people will not refactor the design at all. About 10% of the people will try to refactor something, and will end up trying to rewrite the entire app. They will do more damage than good and will probably give up half way through. If you are lucky, maybe 5% will actually refactor something and be successful.
Because I have tried this many times, I've interviewed people and asked them why they do what they do. For the people who don't refactor, the reason they usually give is: "There is no time to refactor". Which is really odd because they have explicitly been given time. What I have come to realize, though, is that people do not want time; they want absolution of responsibility. If you ask them to make the judgement call, they do the math in their head (unconsciously) and determine that they will be rewarded more and criticised less if they do feature work without refactoring. This forces the decision up to the PM/PGM/BM/BA/Whatever, for whom refactoring has no direct benefit. The result is that refactoring is rarely done, and if it is done it is the result of a large political process.
For the people who try to rewrite everything, training seems to be the overwhelming issue. They pull on a thread and the whole sweater comes apart. For some people in this category, though, giving them carte blanche to decide what to do means that they feel they can finally "do it right". Doing it right in this context means that they can replace all the code that they didn't personally write and therefore don't like. Since everyone on the team only writes a small portion of the code, it means that rewriting everybody else's portion is basically rewriting the app. Again, training seems to help because even if a person's goal is to replace everything, if they learn how to do it piece by piece they can be successful. Also if they do that, they will be required to have many coversations and may eventually learn how to work with others.
Finally, you may get one or two people who naturally know how to refactor well. It is useful to find out who these people are and to encourage them. Unfortunately, this often enrages the "my way or the highway" people. The people who are good at refactoring, if encouraged, will naturally dominate the design of the application. Often these people are suppressed by political means because they are so effective at driving the design. In order to enable these people you will need to make some tough decisions on the business end of things.
For the people who don't refactor, the reason they usually give is: "There is no time to refactor".
My #1 reason to avoid refactoring code is because that code has been there, is battle tested and hasn't had a bug filed against it in months. That is not my first instinct, either. There are lots of things I see that I had written months ago that I badly want to rewrite every time I see it. I have to restrain myself because it's not just the time spent refactoring. It's writing tests (if you're into that sort of thing). It's getting QA to hammer on it some more. It's fixing all the little bugs that you thought you had fixed that you re-broke. It's all the little bugs you've never seen before because this is a new design and you're not perfect.
Most of the time, it's not worth the refactor, even if it does slow down adding future features to that particular area. There are, of course, two exceptions: 1) if you're constantly playing bug whack-a-mole on a particular section and 2) if a section of the code you're working on is constantly changing. I grasp every opportunity to hold a meeting, stand on the nearest chair, strike a dramatic pose and shout "We are rewriting the loading system... FROM SCRATCH!" My project manager then tells me to get down and lay off the coffee, but that's ok, I've had my moment.
Certainly it is always a judgement call. I was just saying to my colleague yesterday that where a developer really makes a difference on a project is by consistently being able to make the right judgement calls. One can say there is no silver bullet, and while it is impossible to make a project go faster than it can, it's very possible to make it go orders of magnitude slower ;-)
On that note, a few things I try to keep in mind: As you say, don't gratuitously change code. You may hate the design, but if it ain't broke, don't fix it. This is probably the biggest mistake that "change the world" programmers make.
Second, try not to rewrite code -- ever. Usually there is no business case (see "don't gratuitously change code"). Even if you think there is a business case, it dramatically increases risk. My rule of thumb: anything that lasts longer than 2 weeks has a very likely chance of being cancelled. If you must rewrite, it has to take much less time than that.
Finally, keep in mind that refactoring is not rewriting, nor redesigning (though it is closer to the latter than the former). Refactoring is transforming the code so that it performs exactly the same function (bugs and all!!!) with a different "shape". Ideally you will have tools to help you refactor in such a way so that you can prove that the resultant code executes in exactly the same way as the original code. With or without the tools, you should have a suite of unit/integration tests that will alert you when you have made a mistake.
Refactoring allows you to slowly migrate code from one "shape" to another over time while not breaking it. People who are skilled at refactoring can evolve efficient design even starting with really badly written code.
Why do you want to do refactoring? While, as you point out, the cost of doing a work-around from a sub-optimal design is low, the cumulative cost of these work-arounds over time can be quite substantial. A work-around introduces complexity to the code. This complexity makes everything slightly more difficult and slightly more risky. It also makes further work-arounds more likely. These work-arounds compound the problem. Because poorly designed code usually has high coupling, problems in one area of the code can manifest in other areas without warning. As the work-arounds increase, the complexity can increase exponentially. As an example of a worst case scenario, I once worked on a project where the programmers averaged 1 line of code per day (Yes, LOC is a poor measure of productivity, but no matter how you slice it, that's just incredibly bad).
My experience has been that teams which refactor effectively outperform similarly skilled teams who don't refactor by a very large margin. Although productivity metrics are impossible, the difference is quite startling. Interestingly, I have also had some experience with teams that have very high test coverage, but which don't refactor consistently (or effectively). These teams also do not seem to benefit from dramatically improved performance. My current theory is to write tests to support refactoring and don't worry about any of the other benefits.
We're saying the same things, just in different ways. I was attempting to be humorous in my previous comment, preferring comedy to precision. :)
Interestingly, I have also had some experience with teams that have very high test coverage, but which don't refactor consistently (or effectively). These teams also do not seem to benefit from dramatically improved performance. My current theory is to write tests to support refactoring and don't worry about any of the other benefits.
That's something I had never considered. Do you have any general guidelines for testing for refactoring?
As a developer I've had some very good and very bad experiences... usually centered around two issues.. 1. I am not a morning person and am consistently late, I get work done, but not at 8/9am on the dot. And, 2. because I will take the time to understand what I am working on... this usually results in some refactoring and generally less code in the end. I once saved enough code in size to include lodash, and eventemitter into the client side of a project, by refactoring out a piece using those two libraries. I was of course chastised because it took longer than expected. I left when the writing on the wall was I would be fired anyway... That was over a year ago, and my understanding is the project I was working on with a 3 month delivery time still isn't done because no effort was made to resolve technical debt and they kept throwing more people/teams at it.
From the outside, it's rather hard to tell the difference between someone getting problems that are "inherently simple" and someone spending time finding simpler solutions to problems.
That's actually an interesting question. Aside from problems that are easily observable as difficult because lots of smart people have tried and failed, how can one objectively know whether it was difficult or easy?
Even within "just a web app" one can run into some gnarly issues or odd old code that renders otherwise simple tasks difficult, but since it's only one person working on it you can't really test whether or not it's actually difficult.
This might impact both the dev/manager disconnect as well as imposter syndrome.
> they do the math in their head (unconsciously) and determine that they will be rewarded more and criticised less if they do feature work without refactoring
Great point. They know they've been told they have time but they don't actually believe they have time.
Half of the time, the build system of companies I join is unnecessarily complicated. I often end up simplifying things once I get a handle for how it works, creating nice one command deploys or workflows that don't require chaining multiple things or memorizing a gazillion abstractions. Meanwhile, people often waste lots of times on inefficient build tooling and multiple steps to get to the end goal.
None of the companies I have been at created a good model system (although one came close, the employee responsible left due to the startup acting like a consultancy instead of focusing on a quality product) - this has often bit companies hard as data structures start mutating due to changing API & changing requirements, making certain things such as offline caching more difficult.
Architectural choice tends to roll into poor time constraints since poor time constraints puts even more pressure on the code base when poor architecture is involved.
It does suck. Most code sucks. We just get used to it. Doesn't mean it doesn't work, doesn't mean it's not valuable. But it still sucks.
As for architecture being the biggest technical debt problem that should be obvious. Architecture is often the most difficult thing to change after the fact.
I know someone who says all the time that what they want is "The cheapest, fastest (least time to complete), most predictable thing to get something done" and "that if it needs fixing, it can be done when the customer pays for it"
Even green field projects there wind up with significant technical debt in under a year or two.
Notice that nowhere in the article is there any mention of what a good architectural choice is, nor any mention of the perception of debt in relation to the size of the system[0]. My personal experience with architectural choices is that all of them are great early on when the code base is small. As that code base grows, they all suffer under the weight.
[0]: 100k-1M is a very large range. Kudos to them for filtering out small code bases, but a graph of these perceptions measured against code size would have been helpful.
(author here)
We cross-tabbed system age vs perception of amount of TD. There was a moderate association between older systems (> 6 yrs) and more perceived debt.
I did not explicitly look at size as this was not one of the original research questions, but a good point. I suspect older systems will tend to be larger (in the domains we studied, anyway). And your point about arch choices being great "early on" can, I think, be captured in the "system age" variable. I guess I'm trying to think of a system that might be young and yet quite large in LOC.. would be an interesting outlier to look at.
The reason I bring this up is I am surprised that architectural choice is overwhelmingly the top technical debt concern. That doesn't seem right. I would expect it to be poor code due to time constraints.