Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Python 3 was a disaster and enterprises were still undertaking pointless 2->3 upgrade projects 10 years later




A month ago I had to fix a small bug in Python 2.6 code in one of internal systems. It won't be ever migrated, no capacity and no value

It was annoying but if it hadn't happened Python would still be struggling with basic things like Unicode.

Organizations struggled with it but they struggle with basically every breaking change. I was on the tooling team that helped an organization handle the transition of about 5 million lines of data science code from python 2.7 to 3.2. We also had to handle other breaking changes like airflow upgrades, spark 2->3, intel->amd->graviton.

At that scale all those changes are a big deal. Heck even the pickle protocol change in Python 3.8 was a big deal for us. I wouldn't characterize the python 2->3 transition as a significantly bigger deal than some of the others. In many ways it was easier because so much hay was made about it there was a lot of knowledge and tooling.


> It was annoying but if it hadn't happened Python would still be struggling with basic things like Unicode.

They should've just used Python 2's strings as UTF-8. No need to break every existing program, just deprecate and discourage the old Python Unicode type. The new Unicode type (Python 3's string) is a complicated mess, and anyone who thinks it is simple and clean isn't aware of what's going on under the hood.

Having your strings be a simple array of bytes, which might be UTF-8 or WTF-8, seems to be working out pretty well for Go.


I can't say i've ever thought "wow I wish I had to use go's unicode approach". The bytes/str split is the cleanest approach of any runtime I've seen.

With the benefit of hindsight, though, Python 3 could have been done as a non-breaking upgrade.

Imagine if the same interpreter supported both Python 3 and Python 2. Python 3 code could import a Python 2 module, or vice versa. Codebases could migrate somewhat more incrementally. Python 2 code's idea of a "string" would be bytes, and python 3's idea of a "string" would be unicode, but both can speak the other's language, they just have different names for things, so you can migrate.


That split between bytes and unicode made better code. Bytes are what you get from the network. Is it a PNG? A paragraph of text? Who knows! But in Python 2, you treated them both as the same thing: a series of bytes.

Being more or less forced to decode that series into a string of text where appropriate made a huge number of bugs vanish. Oops, forget to run `value=incoming_data.decode()` before passing incoming data to a function that expects a string, not a series of bytes? Boom! Thing is, it was always broken, but now it's visibly broken. And there was no more having to remember if you'd already .decode()d a value or whether you still needed to, because the end result isn't the same datatype anymore. It was so annoying to have an internal function in a webserver, and the old sloppiness meant that sometimes you were calling it with decoded strings and sometimes the raw bytes coming in over the wire, so sometimes it processed non-ASCII characters incorrectly, and if you tried to fix it by making it decode passed-in values, it start started breaking previously-working callers. Ugh, what a mess!

I hated the schism for about the first month because it broke a lot of my old, crappy code. Well, it didn't actually. It just forced me to be aware of my old, crappy code, and do the hard, non-automatable work of actually fixing it. The end result was far better than what I'd started with.


That distinction is indeed critical, and I'm not suggesting removing that distinction. My point is that you could give all those types names, and manage the transition by having Python 3 change the defaults (e.g. that a string is unicode).

I’m a little confused. That’s basically with Python 3 did, right? In py2, “foo” is a string of bytes, and u”foo” is Unicode. In py3, both are Unicode, and bytes() is a string of bytes.

The difference is that the two don't interoperate. You can't import a Python 3 module from Python 2 or vice versa; you have to use completely separate interpreters to run them.

I'm suggesting a model in which one interpreter runs both Python 2 and Python 3, and the underlying types are the same, so you can pass them between the two. You'd have to know that "foo" created in Python 2 is the equivalent of b"foo" created in Python 3, but that's easy enough to deal with.


Ok who would suggest this when the community could take a modicum of responsibility

> With the benefit of hindsight, though, Python 3 could have been done as a non-breaking upgrade.

Not without enormous and unnecessary pain.


It would absolutely have been harder. But the pain of going that path might potentially have been less than the pain of the Python 2 to Python 3 transition. Or, possibly, it wouldn't have been; I'm not claiming the tradeoff is obvious even in hindsight here.

I think you have causation reversed: it would have been at least two orders of magnitude greater to act like moving to python 3 was harder than staying. But you do you boo :emoji-kissey-face:

Pain on whose part? There was certainly pain porting all the code that had to be ported to Python 3 so that the Python developers could have an easier time.

Yes, exactly. customers need to stop acting like a bitch if they wanna be taken seriously

It was not a disaster in any way. People just complained about having to do something to upgrade their codebases.

Except that Python took the other path when migrating from Python 1 to Python 2 and ... guess what? That was a "disaster" too.

The only difference was that by the time of Python 3, Python programs were orders of magnitude bigger so the pain was that much worse.


Differences of scale do make a qualitative difference and must be considered when doing a migration.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: