Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Dangerous Obsession with Primitives (swanson.github.com)
42 points by swanson on May 31, 2012 | hide | past | favorite | 28 comments


This is what refactoring is for. Start out simple (maybe a primitive is all you need) and then, if and when the need arises, refactor and introduce a value object. That's where the failure is in this story.

I've seen a lot of premature abstraculation blow up the opposite way: incomprehensible, elaborate object models attempting to be The Best Abstraction Evar around fundamentally simple concepts. These things can be hard to work with and change, contribute difficult to discern complexity to applications and cause huge performance headaches.

You aren't gonna need it, worse is better, simplicity of implementation above all else.


Yes Yes Yes, this. The world is littered with code that never made a dent in the world but was uber-prepared for the day that we'd suddenly need a new way to talk about numbers/dates/strings.


It's not quite that simple. Primitive obsession is a form of duplication, and it's particularly difficult to spot without practice. Once found, it can be a real bear to remove. Chapter 4 of my screencast [1] shows this; it took me an hour and a half to refactor a trivial application to remove its primitive obsession. (And there's strong case to be made that some of the difficulty I had in earlier episodes was due to primitive obsession as well.)

I agree that premature abstraction is a pain, too, so I wouldn't want to go down that route. However, preventing primitive obsession can be as trivial as using a two-method wrapper class. More methods and abstraction can be added on an as-needed basis. Also, removing the wrapper class is automatable with the "Inline Method" refactoring in some IDEs.

So the design tradeoffs I see are:

1- Primitive obsession: high duplication burden, high complexity burden, hard to remove

2- Wrapping primitives that don't need it: low complexity burden, easy to remove

So I prefer to error on the side of wrapping primitives. That said, I've never actually encountered a case where I wrapped a primitive that didn't need it, and I still accidentally create the primitive obsession smell a lot. It seems so innocuous at first.

[1] http://jamesshore.com/Blog/Lets-Play/


That's language-specific advice. Go handles this much more elegantly. You can create a Celsius type very simply:

type Celsius float64 // Temperature in degrees Celsius

It's no harder to use than 'typedef' in C and similarly it doesn't get in your way, but you get a real type and nice errors when you screw up. There's no reason not to do it right away.


Thanks for the comment, I definitely agree that this should have been picked up in a refactoring. I've also been on other projects that go the opposite direction of abstracting everything to the point that you don't realize some function has already been abstracted so you write a new one!


This is also what experience is for. It teaches you the probability that you will need the abstraction in the future. As a beginner, you should err on the side of YAGNI; as an expert, you probably already know what to do.


I don't see the problem with primitives here. Realistically, the alternatives are either to store a standardized float, or a struct that contains a float and a type value. Floats have issues themselves with accuracy that need to be accounted for, so it's pretty easy to see why they tend to be avoided.

Internally, you want a standard set of units. It doesn't matter what they are: grams, bread units, etc. It only matters that they're standard across the application. That reduces the complexity of the actual functionality, which is presumably totaling up the information and comparing it to whatever limits the user inputs. You don't want to have to switch all your internal logic based on user input type, you want to convert user inputs to a standard field type and work from there. Honestly, different calculations based on multiple unit types is as big a code smell as converting the input data.

So realistically, it's the same effort no matter what input type the user has: Either you're converting it at the base, converting it before you do your calculations, or writing several specific cases for each input type. It's the same number of test cases, too.


It's the same effort to the machine, not the developer. If the objects manage their own units, you can just use them naively and let them handle their conversions during arithmetic if they need it. You don't need to write the code for conversion explicitly anywhere outside the class. This is a material advantage of object-oriented programming, consolidating the knowledge about one aspect of your program to the right place to deal with it and keeping it out of the rest of the codebase.

How's your standard units solution going to work when the users consider their input units to be part of the data? Are you going to store their units alongside the converted units and convert them back whenever you want to display? If so, you've just replaced Matt's example problem with an identical problem. This is a clue that you have insufficient abstraction: a small change of requirements can force a lot of code to be written across your application. And don't assume you can dictate units to your users, there are plenty of situations in which the users and their units are vastly more important than the programmer's time and ease.


If they "consider their input units to be part of the data", by which I assume you mean something like "if I use carbs here, and BU there, keep it that way", then you need to store the unit regardless. In that case, since value is always paired with a unit and both are variable, I totally agree. Encapsulate. Two related values that cannot meaningfully be separated should be treated as a single object.

But I've never seen a system like that for something like carb counting - it's always normalized, and everything is displayed in a single unit at a time, maybe changeable by user preference. Maybe the problem is that so far nobody has built such a system. Maybe it's that doing so means more complex SQL queries and larger indexes, because you need to consider two fields instead of one when summing / averaging / etc. Maybe it's because users don't want it.

The argument for ints in this case is also an argument that you should only show a single unit at a time. So long as that's true, the encapsulation argument carries a lot less weight - at that point, I'm not sure which I'd choose. The benefits are a lot less real, though.


In my case (astronomy) we have to store the units regardless and they're always together, but in light of changing requirements, I'm not sure it would be overkill to make this explicit up-front even if there weren't explicit requirements to do so.


Oh. You could have mentioned that. :P

Astronomy is the one field where it makes perfect sense to store the units, because basically nobody agrees on what the standard should be. My professor in college joked that if you had 5 astronomers in the room, you'd get 10 different sets of "standard" units.


>How's your standard units solution going to work when the users consider their input units to be part of the data? Are you going to store their units alongside the converted units and convert them back whenever you want to display? If so, you've just replaced Matt's example problem with an identical problem.

That's kind of my point. The problem isn't primitives. The problem exists no matter what you do: you still need to do conversions based on the user's inputs and the program's output.

>How's your standard units solution going to work when the users consider their input units to be part of the data?

In which case, the user will be inputting the unit and the amount, and telling me what units they want to use as an output. Internally, once this is passed from the interface, you convert the unit to your standardized unit. Similarly, when printing it out, you convert it back to the user's selected unit. Again, this is simply calling the convert function when the data is initialized, and calling it again when it's printed.

OO does not magically solve this problem. If you're storing it in the user's selected type, then every time any function accesses the data, they're either going to have to code around the different types, or call convert() to get it into the type they want. So from a developer's perspective, this is the difference from 2 guaranteed conversions(at input and output time), versus conversions whenever the data is accessed, and at output time.

>And don't assume you can dictate units to your users, there are plenty of situations in which the users and their units are vastly more important than the programmer's time and ease.

Actually, as a developer, you DO dictate units to your users if you're doing anything more than just storing and regurgitating the data. If the user selects a salbartifast unit, then you absolutely need to know how it related to grams, BU, or whatever else you're using in order to use it in any of your calculations.


I think using OO properly would solve this. You have a carbs class that you can instantiate (and update) with two arguments: value and units. This gets converted into whatever representation you use internally, and then whenever you want to access the value, you would do carbs.grams() or carbs.breadUnits() or whatever other units are needed. I don't see a need for a convert() method (maybe as a static function if you want to be able to convert units without instantiating a carbs object, but now we're moving away from the discussion at hand).


Excellent point.


Actually, Matt and I are are both independently saying that OO does magically solve this, because it has. It doesn't look like either of us have made it clear enough for you that this is the case, but it isn't a matter of debate: the codebase I work on at work has greatly benefited from it, and this fact isn't really vulnerable to your conception of why it must be the same amount of work. You're not seeing that it isn't. This is a communication problem.


I agree completely about the standard units - I was able to get that right in the production code fortunately. The biggest gain I can see from moving away from the primitive is from a developer's point of view.

By leaving it as an int, I (and the rest of the team) have to know that the value is not suitable for display. Moving to a value object is not the only way to solve this issue; I could have introduced a `CarbsView` that handled the conversions and it would have solved the issue as well. A lot of this kind of stuff just comes down to personal (and team) preferences for designing your app.


You might be interested in Ward Cunningham's "Whole Value" pattern [1] which is part of his awesome CHECKS pattern language [2]:

[1] http://c2.com/ppr/checks.html#1

[2] http://c2.com/ppr/checks.html


I'm reminded of Bjarne Stroustrup's keynote "C++11 Style" that describes C++11's vision of "Type-rich Programming."

Slides: http://ecn.channel9.msdn.com/events/GoingNative12/GN12Cpp11S... -- "Type-rich Programming" comes in at slide 17.

Video (and Audio downloads): http://channel9.msdn.com/Events/GoingNative/GoingNative-2012... -- slide 17 comes in around a fifth of the way in, as the player does not appear to have minutes/seconds indicator. Nonetheless, one should be able to find their way using the slides (pdf) images in conjunction with the video to find their spot.

I don't have time to review the intro into those slides, if they are relevant, but the keynote is generally good anyhow, so I'd recommend viewing the whole piece.


I started to write a C++ template class that would implement strongly-typed ints (so Celsius and Fahrenheit types could behave like ints, but have distinct types).

I gave up after this "simple" idea approached 200 lines of code implementing all the operator overloads. I guess the lesson is that primitives are complex, even if you just want to give them a new name. Also, the expression int/int produces an int, but what should the expression FahrenheitInt/FahrenheitInt produce? A unitless int? A FahrenheitInt?


* Dividing or multiplying a fahrenheit temperature is meaningless, because 0ºF is arbitrary.

* Ideally, you want separate types for absolute temperatures and temperature-deltas. Absolutes cannot be added, multiplied, or divided; subtracting absolutes produces a delta; adding or subtracting an absolute from a delta produces an absolute; deltas can be added and subtracted, and multiplied by unitless values.

* Multiplying a MeterNum by a MeterNum would ideally produce a SquareMeterNum.

* If you don't have to worry about precision, MeterNum and FootNum should both be replaced by LengthNum.

* You can convert a LengthNum into, say, meters by dividing by whatever value represents "1 meter" to produce a unitless int.

* Often operators don't make sense at all. (a << b) only makes sense if b is a unitless integer, for example.

I implemented most of this (but not absolute temperatures) as a Python library. The source is at <http://timmaxwell.org/svn/src/pychem/units.py>; and <http://timmaxwell.org/svn/src/pychem/std_units.py>. (Warning: these will download rather than opening in the browser.) I also found this: <http://pypi.python.org/pypi/units/>.


Always use base SI units, no matter what the locals do. Convert at the point of input and output.


This sounds like the kind of problem that puts really expensive robots into craters.

http://youtu.be/OB-bdWKwXsU

There's a salient point in that talk where he describes a mission that failed due to a human error. Someone forgot to convert an internal representation of a unit in some algorithm which led to a miscalculation and a loss of some millions of dollars. Apparently the error made it through all of the code reviews and tests that were written. Bjarne then shows some pretty simple code that might have avoided such a disaster by specifying your types up front and letting the compiler do the work of converting them. Seems like a pretty simple concept and I wonder why it's so often missed.


The solution is actually worse than the initial problem. Unit conversion is a more complicated issue. The solution introduces all sorts of horrid conversions and rounding issues.

Consider a third of an inch. It can't be represented accurately in decimal. The same is true with other units.

You need to store in the source format which may vary based on the type of unit. As a rule, I tend to keep convertable unit values as ratios of two bignums with the unit attached as an enumeration. These can be represented and manipulated as value types (structs) in c#.


I can't even count the number of times I've seen the exact opposite. Types that were so over engineered that a simple int would have sufficed.


I also don't quite see how a "value object" would helped (or much differed) there?


The main "win" would be to keep the logic for converting between units within the Carbs class. Otherwise every screen in the app is going to have to know that the integer value needs to be converted before displaying it.


Value objects are strongly (and semantically) typed.


Of course, if you use F# you can just have units of measure on your types...




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: