On the criteria to be used in decomposing systems into modules (1972) [pdf]

tcgv · on Jan 30, 2022

I always bring up this paper when I read a post criticizing OOP. It's a bit old but still very relevant and practical. In it the concept of information hiding, closely related to encapsulation, was first described. This concept plays a central role in the strategy for effective system modularization and is IMHO the basis of OOP.

Shameless plug: A couple of years ago I wrote a post about this paper trying to expand it based on my personal experience and providing a more simpler example to elucidate the concepts presented in it [1]

[1] https://thomasvilhena.com/2020/03/a-strategy-for-effective-s...

yaantc · on Jan 30, 2022

I don't remember seeing a criticism of OOP that is a criticism of encapsulation. Everybody tends to agree that encapsulation is desirable. It seems to me criticisms of OOP are that it rolls too many things into one construct, the class: it's a type, with encapsulation of methods and data structure, plus inheritance, all into one. And it can lead to not so efficient data placement on modern CPU (AoS vs SoA).

Languages like Ada and Ocaml have object orientation extensions to the initial languages (Ada 83, and Caml/SML) that can very often be ignored. And still are very good for encapsulation. With them, modules (packages for Ada) and types are separate. It is very natural to group close types together into one module, while exporting opaque abstract types only usable through a module services.

Rolling different concepts into the class may give a more intuitive result at first, particularly when simulating real world entities. But it's also a bit limiting compared to keeping those concepts orthogonal.

mvc · on Jan 30, 2022

I think Clojure quite explicitly rejects "encapsulationism". As does the "data on the outside" philosophy of event driven systems.

rmbyrro · on Jan 31, 2022

Many times one wants to encapsulate is to protect. Data is immutable in Clojure, which goes away with problems arising from everything accessible in wild west.

almostdeadguy · on Jan 31, 2022

This is notably something haskell has largely been worse at than other languages in the ML family (only recently adding the backpack module system, which appears to be much less capable than the module systems in extended versions of SML and OCaml and is still not supported by tools like stack if I understand it right).

Haskell has modules, but until recently it had no module interfaces, so you could not write your code to depend on an abstract type and associated function definitions that could be swapped out dynamically (for ex: w/ mocks in testing).

Type classes are a related concept (i.e. an abstract definition of functions that may be implemented for a given type), but they enforce additional restrictions like coherency (i.e. only a single instance of class may be implemented for a given type). While this is advantageous in many situations, it's a huge pain in the ass when you need to do something like just mock network IO somewhere (as every combination of things you want to change out is going to need a newtype wrapper + class instances for all effects).

The preferred techniques for handling this the last time I used it were:

1. Transformer classes (usually w/ the "mtl" library + your own custom ones). Basically you could categorize types of useful effects into classes, use those class constraints in your function definitions + push any concrete implementations as high up the stack as possible, and use different monad transformer stacks to swap out the implementations of those effects. There's no getting around the single instance restriction of classes, but this allows you to only change out one "layer" of effects in code (say you have a transformer for network IO, you could change out only that concrete type), which reduces some of the labor involved. But there are subtle implications about the way transformers stack that change the meaning of your code, and the use of functional dependencies in MTL means you can only really use one instance of a class in your stack, so for using MTL to do something like things supply your functions w/ context/config values via MonadReader, you end up needing to smuggle around some unholy god object of everything you'd ever want to inject. Enjoy trying to write legible test cases w/ that.

2. Roll your own classes. This bypasses a lot of the weirdness you run into w/ transformer stacks, but its frankly a pain having to break out every possible effect you'll make use of into classes + defining instances for different use cases. And then you'll run into the reality that many libraries are making use of MTL-style classes so you can't entirely avoid them anyways (though you usually won't need to use MTL class constraints in your own code, and thus sidestep the aforementioned god objects).

3. Free monads. Basically describe your program as functions that don't actually run effects, but instead produce data structures describing a program that can be interpreted in several ways where the effects actually occur. This sounds awful but it's very easy to reason about as you have full control over the evaluation of effects in your interpreter functions, has a lot of nice advantages for testing + debugging as everything that could possibly cause a side effect is now inspectable as data, and is usually the least laborious of all solutions in my experience? This is known to be suboptimal from a performance perspective though.

All that said, backpack seems like it will be an improvement upon all these options, despite not having the full flexibility you get from first-class modules in other languages in the ML family tree.

One other note: 1ML looks like a very cool as a refinement of SML modules [1]. I have no real expertise w/ the design of programming languages to know if there are other problems with this approach, but I'd love to see a production-ready implementation.

[1]: https://people.mpi-sws.org/~rossberg/1ml/1ml-extended.pdf

_0w8t · on Jan 30, 2022

The problem with OOP is that the code using it has tendency to produce way too many objects thus putting into the code rather rigid assumptions how the system may evolve. This essentially contradicts what the paper advocates which is more like a toolbox approach.

thewarrior · on Jan 30, 2022

Loved this post

pixelmonkey · on Jan 30, 2022

Modern analysis of this classic David Parnas paper by Adrian Colyer (in The Morning Paper) here:

https://blog.acolyer.org/2016/09/05/on-the-criteria-to-be-us...

This is the paper known for introducing/elaborating concepts like "modularization" and "information hiding".

User23 · on Jan 30, 2022

> Conclusion

  We have tried to demonstrate by these examples that it is almost always incorrect to begin the decomposition of a system into modules on the basis of a flowchart. We propose instead that one begins with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others. Since, in most cases, design decisions transcend time of execution, modules will not correspond to steps in the processing. To achieve an efficient implementation we must abandon the assumption that a module is one or more subroutines, and instead allow subroutines and programs to be assembled collections of code from various modules.

This aligns well with my experience, even if the terminology feels a bit dated.

nerdponx · on Jan 30, 2022

This is a great piece of advice and definitely applies to software development in 2022.

It probably even applies to system and organization design in general.

toss1 · on Jan 31, 2022

Yes, isolating design decisions and modules that can change independently is fundamental.

I also found one of the essential keys in the first paragraph, which unfortunately didn't seem well discussed in the paper: >> At implementation time each module and its inputs and outputs are well-defined, there is no confusion in the intended interface with other system modules

I've found it extremely useful to clearly and fully define the interfaces before coding. When this is done well, it should be the case that separate (module) teams can really develop and even test independently, as they are really just "throwing it over the wall" when communicating with another module. This also enables far easier scaling, as one can focus scaling (e.g., adding more hardware) to just the essential parts of the system. So, it ends up being worth it to even either deeply analyze the range of possible interactions between the modules, or even build small simulators or throwaway versions just to really understand at the outset the elements needed in the interface.

_0w8t · on Jan 30, 2022

There are clear cases when the approach advocated in the paper just does not work. For example, one cannot hide behind an abstraction the difference between a reliable local and unreliable network storage.

Another problem is that designing module boundaries to minimize the future changes require to anticipate what may change. But as the saying goes, “it is really hard to predict especially about the future”. If the prediction was wrong, then the initial split into components was hiding the wrong thing.

For example, in the example in the paper they assumed that the task would stay the same, only hardware or the size of data set would change. But allow to change the task, and the whole proposed module split becomes wrong while design based on what was called flowcharts in the paper could require less rewrites.

lkxijlewlf · on Jan 31, 2022

Take a POS system. If at first you have a screen, keyboard and bar code scanner, you might think, "okay, what if we can't get this bar code scanner anymore?" so you modularize your HAL in a way that makes swapping out the bar code scanner easy.

However, if you decide that sales should walk around with tablets to enter transactions into and those go to the cloud for processing, well you just have to rewrite the whole thing.

The point is, you look for the things that could change right now. But you know that if too much changes, you have to redesign it. It is not about predicting the future. It is not about writing software that can absorb any change.

Jtsummers · on Jan 30, 2022

> There are clear cases when the approach advocated in the paper just does not work.

The paper calls for being deliberate in how you modularize your code. So is your alternative to have, what, no modules? To just arbitrarily divide your code between modules?

If you have modules (whatever that means in your language(s)) and you don't consider how to divide your code across them, you're inviting trouble because now you're just being cavalier and avoiding the task of thinking. Not being deliberate is careless.

> For example, in the example in the paper they assumed that the task would stay the same

Yes, he did do that. Both versions, however, largely make this task independent of the modules other than the master control module by putting the KWIC task into master control module itself. All the other modules facilitate the KWIC task and the master control module plumbs it all together. In either design, a change in the task will require changes to a variety of modules but will require changing at least the master control module in both cases. How many other changes are needed? Who knows! Depends on how big a change we're making to the task.

The second design, though, leaves the facilitating modules more independent of each other, in a "what do they have to know about each other" sense. The line storage is presented as an interface, none of the other modules have to know how it works, just how to work it. This is in sharp contrast to the first design, where the line storage model is explicitly known by each of the several modules, and any change to it requires changing most of the program.

_0w8t · on Jan 30, 2022

My experience is that anticipating future requirement changes do not work in general. So do not reflect in the design including the design of module boundaries the current assumptions about the future. Focus instead on other criteria, like reducing complexity or making the design more transparent.

User23 · on Jan 31, 2022

I think I get what you're saying and there's merit to it. However, it appears to me that you are arguing against something the submission doesn't say. If I were you I would reread it carefully and without prejudice, because based on your comments here I believe that you would enjoy learning the concepts it's presenting.

_0w8t · on Jan 31, 2022

I did read the article. My reply was about its conclusion. The article itself advocated a toolbox approach of creating useful tools first that could be written and tested independently and then assembling the application from those. But this has little to do with anticipating future changes, more about of flexibility of assembling the application itself. I.e. the conclusion was not warranted.

goto11 · on Jan 30, 2022

This also pertains to the much-misunderstood "Single Responsibility" principle. The article argues that modules should encapsulate decisions which may change, i.e shield other modules from effect of such changes. The SRP argues that each module should only encapsulate one such decision.

gandalfgeek · on Jan 30, 2022

If you're into short (10 min) video summaries: https://www.youtube.com/watch?v=NF5tRQb0Dpc

goto11 · on Jan 30, 2022

Why are those PDF's so ugly? It would be so much more readable if it was just a regular web page.

(And sorry for the negativity since this is a fascinating article. But I'm genuinely curious why the this PDF is so ugly, since I'm sure the original published version looked significantly better, and the text seems to have been OCR'ed)

Jtsummers · on Jan 30, 2022

Google can help you out: http://sunnyday.mit.edu/16.355/parnas-criteria.html (4th result for me)

goto11 · on Jan 30, 2022

Thanks!

_Microft · on Jan 30, 2022

Have a look at the footer of a page of the PDF. It says „Communications of the ACM, Decembre 1972, Volume 15, Number 12“ there.

It is a scan of an old article from a journal.

Jtsummers · on Jan 30, 2022

I really thought there had been a previous submission with more comments, but maybe it was another Parnas paper. The only past submission with comments:

https://news.ycombinator.com/item?id=8849468 - Jan 7, 2015 (5 comments)

jonjacky · on Jan 30, 2022

A Rational Design Process: How and Why to Fake It https://news.ycombinator.com/item?id=11101358

From Feb 2016, with just 2 comments. The paper is much better than the comments would suggest.

Jtsummers · on Jan 31, 2022

Maybe we need a Parnas day/week here on HN. A lot of his papers have shown up here, but seem to be light on comments in a lot of cases.

dang · on Jan 30, 2022

I thought so too, but couldn't find one.