Sure, but semantic versioning really is the wrong kind of versioning to use for a language. The major version should represent major language changes, not whether its a breaking change or not, semantic versioning isn't somehow magically a "good" way to version. It's useful for libraries / dependencies where you are dealing with many different libraries and just want to know you can upgrade without having to deal with breaking changes. For a language? Silliness. Your version is not really telling you the main things you care about.
It's much much more useful to the users to say, 2.0 introduced generics, it's distinct. If it's like other languages, generics changes the code people generate a lot, libraries start looking significantly different. It's very distinct, and if that is simply in version 1.18.0 or whatever, that is super bad usability from a language perspective.
> Sure, but semantic versioning really is the wrong kind of versioning to use for a language.
A language or API (things you program against) are pretty much the things for which SemVer makes sense.
> The major version should represent major language changes, not whether its a breaking change or not
I don’t care if changes are “major”, I care if the code I wrote for version X is expected to need modification to work correctly in version Y. SemVer gives me that, Subjective Importance Versioning does not.
Can't agree more. One important information that the version number gives me is that if upgrading to newer versions will break my code. SemVer gives me that.
Arguably a language should NEVER have breaking changes large enough to warrant a SemVer major version update. Rename or fork the language if you want to do that. Such major language overhauls in the past have been a huge waste of developer time as they go back to rewrite affected code.
It's not the only two choices available. C# also introduced var back in version 3 (2008), but it did it as a "contextual keyword" - meaning that it remains a valid identifier in any position where it used to be one, even to this day. Pretty much all new C# keywords since the very first version are of this nature:
Whether something is breaking or not is not always clear-cut. For one thing, changes can become breaking in retrospect sometimes. For example, in C#, renaming a method argument wasn't breaking until the language introduced named arguments in calls.
There are also changes which are breaking in a very non-obvious way, to put it mildly. For example, in C# again, adding any new member to a class can break existing code that happens to pass implicitly typed lambdas to overloaded methods, because overload resolution involves checking whether the lambda body is valid for a given candidate - thus, adding a member can make a lambda ambiguous. I'm not aware of anyone actually treating this as a breaking change for semver purposes, though.
Nearly every big fix in an API is technically a breaking change if you want to be pedantic. This kind of collateral damage which requires multiple points of failure doesn’t usually count as a semantic major change.
> Nearly every big fix in an API is technically a breaking change if you want to be pedantic.
A fix is not a breaking change in the API, because “breaking” refers to expected behavior (so, yes, code that relies on a bug can be broken by a fix; presumably, if you've coded to an observed behavior differing from the spec you are aware of having done so.)
The solution clearly can't be to never ever fix bugs though.
But depending on the kind of bugs (especially when they are of the "gotcha"/"UX" kind) often it's better to just create a new API version with the corrected behaviour and keep existing software apply handle the old behaviour the best they could.
But clearly for many many other kinds of bugs (security etc) we are better served with a bug fix in the old API even if that implies a possibility for breaking somebody.
Does an example of semver used in a non-pedantic way even exist?
The perfect simple world that a naive interpretation of semver dreams of leads to those endless streams of major.0.0 increments fueled by better safe than sorry. "You have been warned, your code might break, no promises". After all there could always be some xkcd 1172 "every change breaks someone's workflow". In an infinite universe of monkeys on typewriters someone would have set up mission critical infrastructure based on an elaborate log4jndi deploy mechanism.
On the other extreme you have something like Java that's in the process of dropping a frozen first digit ever since 1.2. Sure, this predates semver by quite some time but if we'd try to designate meaningful major.minor.patch names in hindsight we'd certainly not go exclusively by the occasional new keywords in the syntax like var but by new feature groups like generics, lambdas and the like, most of which have been introduced without invalidating any old syntax.
"We're awesome at backwards compatibility, but this change is noteworthy enough to warrant a major, better introduce some artificial incompatibility" is something that should never happen.
It's clear the py2 to py3 migration was painful but I'm curious to hear how you would apply "they should fork / rename the language" here.
To me it just feels like semantics. They said "here's a new major version of python" when they could also have said "we have forked python 2 and we're calling it python 3. We think it's better and we will probably abandon python 2 at some point".
(FWIW, "semantics" would be "what it means", so I figure that's not what you wanted to say: from your example, I guess you are saying it's more that it's a different wording — syntax? — for the same meaning)
But it's about setting expectations.
The only problem I ever had with py2 to py3 migration was that it was even possible to have the same codebase run against both, when languages are incompatible to such a degree (most notably, basic type has changed). It basically forced people to make the worst use of the Python dynamic nature (as soon as the stdlib started doing that, there was no going back).
Semantics refers specifically to meaning of words/language. If you say "it's only semantics" then it probably means you both understand and agree on the concepts but not the meaning of the words surrounding those concepts. That applies in this case, with the concept being breaking changes to a language along the lines of Python 2->3, and the terms being "version change" and "new language".
Was? It still is painful as some companies have decided to keep and support python2 for another 5+ years. I have programs which require the same dep from different pythons.
To me, Python represents how not to do language versioning.
In Java people regularly refer to a particular JDK version as a Java 17 or Java 11, even though they actually refer to versions 1.17 and 1.11, respectively[0]. In Clojure land they just say 1.x, even when large new features are added.
I like this because it emphasizes the community's commitment backwards compatibility, which I greatly value. I've spent a good deal of time writing Javascript, where library developers seem to have very little respect for their users and constantly break backwards compatibility. In ecosystems like that, upgrading fills me with dread. When I see a library on version 4, I have learned to keep looking - if they weren't thoughtful enough about their API design for the first 3 major releases, I shouldn't expect it to be much better going forwards.
For an application, I'm pretty open to version numbers signifying big features - Firefox and Chrome do this, and it's helpful with marketing. But for a programming language? A programming language is a tool, and when upgrading you need to carefully read the changelog anyways. A programming language is no different from a library (in Clojure it literally is a library), and backwards compatibility is /literally/ the main thing I care about. Is my tool going to intrude on /my/ schedule, and force me to make changes /it/ wants instead of being able to spend my time making changes /I/ care about? I want to know that.
[0]This is apparently an awful example as I've just learned that Java is actually doing the major version only thing. It still sort of works because the only reason they can do that is because they Will Not Break Compatiblity.
> In Java people regularly refer to a particular JDK version as a Java 17 or Java 11, even though they actually refer to versions 1.17 and 1.8, respectively.
17 -> 1.17, 11 -> 1.8, this is bothering me way to much for no good reason.
I don't think 11 ever referred to 1.8 generally, but for the longest time the `openjdk-11-*` packages in one of the Ubuntu LTSes (18.04?) actually installed Java 8 for some reason.
> Sure, but semantic versioning really is the wrong kind of versioning to use for a language.
I don't agree. I usually don't care so much when a particular feature was introduced into a language (and if I do, it's usually a Wikipedia search away). I mostly care whether or not code written assuming version X can be compiled with version Y of the compiler. Semantic versioning can tell me the latter. Making versioning arbitrarily depend on what someone considers a "big" feature doesn't help me.
> I don't agree. I usually don't care so much when a particular feature was introduced into a language
I care very much when a feature was introduced into a language, because maintaining compatibility with earlier versions of the language determines what features may be used. If I'm working on a library that needs to be compatible with C++03, then that means avoiding smart pointers and rvalues. If I'm working on a library that needs to be compatible with C++11, then I need to write my own make_unique(). If I'm working on a library that needs to be compatible with C++14, then I need to avoid using structured bindings.
If a project allows breaking backwards compatibility, then SemVer is a great way to put that information front and center. If a project considers backwards compatibility to be a given, then there's no point in having a constant value hanging out in front of the version number.
> I mostly care whether or not code written assuming version X can be compiled with version Y of the compiler.
Semantic versioning can only tell that for the case where X < Y (old code on new compiler). In order to determine it for X > Y (new code on old compiler), you need to know when features were introduced.
> In order to determine it for X > Y (new code on old compiler), you need to know when features were introduced.
I think this is a deliberate reduction of dimensionality. Go says that you don't need to worry (for long) about this case, because the toolchain must be updated regularly - and promises that it will be as pain free as possible. This simplifies for the Go team, for library authors, and library users in most cases, at the expense of maintaining a recent toolchain.
Not saying this tradeoff is for everyone, and I've never used C++ professionally so I'm probably ignorant. But are you saying it's common with production projects that use a compiler from 2003 or earlier? What's the use case?
> But are you saying it's common with production projects that use a compiler from 2003 or earlier? What's the use case?
Modern C++ compilers are not necessarily available on all platforms. For example, Solaris, AIX or old RedHat versions. Go doesn't have this problem yet, but it will.
> Not saying this tradeoff is for everyone, and I've never used C++ professionally so I'm probably ignorant. But are you saying it's common with production projects that use a compiler from 2003 or earlier? What's the use case?
Let's start with the fact that newer doesn't mean better. With already deployed compiler you have tested it and know that it works good enough (code it generates, bugs you have workarounds for, etc). Where with new compiler you are on step one. You must do work again.
Or vendors just support particular version they have patched.
> Not saying this tradeoff is for everyone, and I've never used C++ professionally so I'm probably ignorant. But are you saying it's common with production projects that use a compiler from 2003 or earlier? What's the use case?
The first difference is that there isn't just a single compiler, but rather a standard that gets implemented by different compiler vendors. It's gotten better since then, but typically it would be a while between the updated standard being released and the standard being supported by most compilers. (And even then, some compilers might not support everything in the same way. For example, two-phase lookup was added in C++03, but MSVC didn't correctly handle it until 2017 [0].)
The second difference is that the C++ compiler may be tightly coupled to the operating system, and the glibc version used by the operating system. Go avoids this by statically compiling everything, but that comes with its own mess of security problems. (e.g. When HeartBleed came out, the only update needed was for libopenssl.so. If a similar issue occurred in statically compiled code, every single executable that used the library would need to be updated.) So in many cases, in order to support an OS, you need to support the OS-provided compiler version [1].
As an example, physics labs, because that's where I have some experience. Labs tend to be pretty conservative about OS upgrades, because nobody wants to hear that the expensive equipment can't be run because somebody changes the OS. So, "Scientific Linux" is frequently used, based on RHEL, and used up until the tail end of the life-cycle. RHEL6 was in production use until Dec. of 2020, and is still in extended support. It provides gcc 4.4, which was released in 2009. Now, gcc 4.4 did support some parts of early drafts of C++11 (optimistically known at the time as C++0x), but didn't have full support due to lack of a time machine.
So when I was writing a library for use in data analysis, I needed to know the language and stdlib feature support in a compiler released a decade earlier, and typically stay within the features of the standard from almost two decades earlier.
[1] You can have non-OS compilers, but then you may need to recompile all of your dependencies rather using the package manager's version, keep track of separate glibc versions using RPATH or LD_LIBRARY_PATH, and make sure to distribute those alongside your library. It's not hard for a single program, but it's a big step to ask users of a library to make.
Disagree with this. Most middle managers wouldn't understand the difference. One canonical version is enough, any more and there's just confusion, not enlightenment.
>It's useful for libraries / dependencies where you are dealing with many different libraries and just want to know you can upgrade without having to deal with breaking changes. For a language? Silliness.
A language update comes with the most fundamental set of libraries and APIs: the standard library (doubly so in Golang, which has a lot of batteries included).
It also potentially affects the behavior (if there are breaking changes) of all other third party libs.
The "silliness" part is a non sequitur from what proceeded it (and the following arguments don't justify it either).
>Your version is not really telling you the main things you care about.
The main thing (nay, only thing) I care about (for my existing code) from a language update is whether there were breaking changes.
I could not care less to have reflected in the version number whether a big non-breaking feature was introduced.
I can read about it and adopt it (or not) whether there's a accompanying big version number change or not.
>It's much much more useful to the users to say, 2.0 introduced generics, it's distinct.
That's quite irrelevant, isn't it?
It's not useful to users that follow the language (page, forums, blogs, etc.) and would already know which release introduced generics.
And it's also not useful to new users that get started with generics from day one of their Go use either.
So who would it be useful to?
Such a use would make the version number the equivalent of a "we got big new feature for you" blog post.
> The major version should represent major language changes, not whether its a breaking change or not
Why?
Old code still work and unless you are purposefully maintaining an old system you are expected to use the last version anyway. What does it actually change that generics were introduced in version 1.18 rather than 2.0? From now on, Go has generics. As there is no breaking change, it’s not like you had to keep using the previous version to opt out.
To play devil's advocate: many people are forced by circumstances beyond their control to use various old versions, or provide libraries and want to support people who are forced to use various old versions.
They're not using 2.0 because 2.0 might have breaking changes, and they don't want to burn the version number for something that doesn't break the Go 1.0 compatibility promise. Makes sense.
I thank Go team, that the major version was not increased. You can't imagine how much wider adoption of generics will be as compared if Go went to version 2.0 There are thousands of under-educated and overly-cautious software development managers who would prevent their teams from upgrading to a major version of Go until it is "proven".
When it comes to developers there are two types who read changelogs and would know their tool well and take advantage of every small change in each minor version and then there are those who are there for the money, they will find out about a new feature only if manager instructs them to use it.
> There are thousands of under-educated and overly-cautious software development managers who would prevent their teams from upgrading to a major version of Go until it is "proven".
If semantic versioning is used correctly, like here, that's actually a reasonable-ish attitude.
It makes no sense to replace a meaningful and helpful criterion - whether it breaks code or not - by some purely subjective assessment of what's a "major change." That just leads to usual version creep, from "Go 2.2" to "Go 3.0", to "Go 4.0", to (inevitable) "Go 10", "Go 11", "Go 11 Pro", "Go 11 Ultimate Edition",...
Wait, why wouldn’t it matter for a language? I want to know if I can compile my existing project with the new version… isn’t that an important thing to know?
Because in most languages, maintaining backwards compatibility is absolutely sacrosanct. It's not that you check the version number to know if your code will still compile. You check the name of the language to know if your code will still compile. (Yes, there are exceptions to this, but those tend to be cautionary tales. Python 3 is a better language than Python 2, but took a decade to gain adoption because it broke backwards compatibility.)
Since backwards compatibility is already a given for languages, you can then have the major version number indicate feature additions, rather than always being a constant value as semantic versioning would require.
I dont know if backwards compatibility is absolutely sacrosanct in most languages... for C/C++, sure... but I know Rust has broken backwards compatibility before, as has Python as you mention, and Ruby, too. I don't think it is as sacrosanct as you think it is, especially for relatively new languages under heavy development.
Sounds like you think this is some kind of "Web 2.0" situation, but that was just a marketing term.
Languages are software; they are dependencies of other software (the only unavoidable dependency!) and as such should absolutely be versioned.
Versioning isn't for marketing or providing easy ways for users to remember when features were released. It's a tool for change management. Exciting features often come with breaking changes, but not vice versa.
Not really from my perspective. I want know if there is any reason to not upgrade due to my existing code base breaking. Extra new features I could use are not such a reason. New reserved words are such a reason. If someone is using a new feature, I can just say "make sure you have the newest version, also here's my reason for keeping close to the edge of current versions, so that you amortize the labor of keeping up to date rather than pinning and then having some big migration project in 5 years."
It is pretty subjective, but not entirely so, and I would guess there's a pretty clear consensus around whether this is a big deal or not in the community?
I agree. It's a bit odd that I just learned generics were added. I expect minor bumps in language versions to be uneventful. Additionally, I expect a language to essentially never introduce breaking changes, so semantic versioning isn't really telling me anything.
Get your point, but I kind of like it. It tells some really important info, and they are hardly alone. Python 3.x was breaking change. Until they they stayed in 2.x versioning a long time. And we will never see a Python 4.x
In my opinion, significant additive changes are addressed more poorly than any other aspect of semver, regardless of the project scope.
Additive changes can be breaking changes quite easily, as those additions are adopted within a minor version range, as automated tooling needs to distinguish their presence, as documentation fragments.
My next biggest gripe with semver—that 0.y.z has entirely different semantics from any other major version—may actually be semantically better if adopted wholesale. If your interface changes, major version bump. Else you’re fixing bugs or otherwise striving to meet extant expectations.
> Sure, but semantic versioning really is the wrong kind of versioning to use for a language. The major version should represent major language changes, not whether its a breaking change or not
Major language changes almost implies breaking changes, like Python 2 to 3 was major changes that break things everything from how modules were changed, where they were, and some syntactic and fundamental changes as well.
I think we should go back to years (for most software, in fact, not just languages). Languages change slowly enough (or should) that, e.g. "Ada 2021" should be unambiguous enough. For language implementation versions, then we can add semantic numbers afterwards.
Ruby introduces new language features all the time in minor versions while maintaining backward compatibility. I can pretty much upgrade a minor Ruby version to get access to new language features without worrying about breaking anything.
I would definitely consider a language to be an API. Using semantic versioning makes perfect sense; because it communicates to consumers of the language which versions will require them to change their source code and which will not.
There are already effectively 2 ways to deal with language changes:
1. Min version in go.mod
2. Add a build tag for what to do for new/old version of go (These tags are automatic, you just need to set them in the files)
Semantic versioning solves one specific problem that's worth solving - whether you can (expect to) automatically upgrade. That is a problem people have with languages just as much as libraries, and it is a problem that affects both big changes and small just as much as it affects libraries. It is not the only way to solve this problem, but that problem very much needs to be addressed.
When a language adds any features, if your dependencies (whether real library dependencies or just things you're copying from Stack Overflow) start using the new features, you must upgrade to the new language version. That is an inherent usability constraint, and every time a language designer chooses to add a feature, they're making a tradeoff. But if upgrading to the new language version is trivial, then it's generally a worthwhile tradeoff.
For instance, suppose I find some code that uses Python's removeprefix() method on strings. I need to use Python 3.9 or newer to use that code. It doesn't matter that this is a very small feature.
However, I can generally expect to upgrade my Python 3.8 code to Python 3.9 without trouble. It's different from, say, code that uses Unicode strings. For that code, I need to upgrade from Python 2 to Python 3, which I can expect to cause me trouble. The version numbers communicate that. It's true that Python 3 was a "big" change - but "big" isn't really the point. The point is that I can't use Python 2 code directly with Python 3 code, but I can use Python 3.8 code directly with Python 3.9 code. There are plenty of "big" changes happening within the Python 3 series, such as async support, that were made available in a backwards-compatible manner.
As it happens, Python does not use semantic versioning. But they have a deprecation policy which requires issuing warnings for two minor releases: https://www.python.org/dev/peps/pep-0387/ It's technically possible, I think, that a change like Unicode strings could happen within the Python 3.x series, but that's okay, provided they follow the documented versioning policy. This policy addresses the same question that semantic versioning does, but it provides a different answer: you can always upgrade to one or two minor versions newer, but at that point you must stop and address deprecation warnings before upgrading further.
You are, of course, free to also have a marketing version of your project to communicate how big and exciting the changes are. Windows is a great example here: Windows 95 was 4.0 (communicating both backwards incompatibility with 3.1 and major changes) and Windows 7 was 6.1 (communicating backwards compatibility with Vista but still major changes).
Windows (and vs) version numbers are a pain in the ass though. Constantly need to check that the version you need to put in some config file actually corresponds to the version you really mean.
I haven't seen anyone do a good job of it, but it seems to me that if you wanted to try, the way to do it would be to make the marketing versions and compatibility versions so different that they can't be confused (like Windows 95 or better yet XP, not Windows 7) - and then make sure that your configuration files can accept marketing versions and silently transform them to compatibility versions.
I have sympathy for your sentiment, but you don't have to come up with a proper definition nor get everyone to agree on that definition.
The Go people can just make up reasonable version numbers without having an all encompassing theory with definitions, and they only have to convince themselves, not everyone on earth.
> The Go people can just make up reasonable version numbers without having an all encompassing theory with definitions
but "breaking change" IS the criteria for reasonable version numbers that they have chosen.
"breaking change" is easily tested and well defined.
"big change" is as far from well defined as you can get, because "big" is unquantifiable and subject to judgement and interpretation; i.e. a poor candidate for drawing boundaries.
It's much much more useful to the users to say, 2.0 introduced generics, it's distinct. If it's like other languages, generics changes the code people generate a lot, libraries start looking significantly different. It's very distinct, and if that is simply in version 1.18.0 or whatever, that is super bad usability from a language perspective.