I can make things go more than 10 times faster in assembly. My main job is as manager/entrepreneur but I could read-write assembly as a result of my experience and I help-guide other people easily.
In the real world 10 times faster is nothing. You should spend the time understanding the problem in a mathematical way, and VERY IMPORTANT, documenting your work using images, text, voice and video.
This way you could make things go 100, 1000, 10000 times faster as most algorithms could be indexed, ordered in some way as to make it extremely fast, like doing log() operations instead of n squared or cubic or to the elevated to four or five(when you manage several dimensions like 3D with time or video analysis or medical tomography).
More important than that, 10 years from now it will continue working in new devices or OSs and will be something that supports the company instead of being a debt burden because the original developer is not here now(or you don't have the slightest idea of what you did so far away in the past and did not document).
The main problem is that people is not self aware that they forget things. And your brilliant idea that makes everything go 3 times faster is nuts if it makes everything way harder to understand, or if it could be forgotten even by you.
That's not good advice. Whether you need to use assembly depends on the particular situation at hand.
Here's a practical example: as a result of redesigning the algorithm to use fixed-point and implementing it in assembly, I got it to run 600x faster than the initial C version. Big O complexity was the same, the difference was in the constant factor. But the constant factor matters! In my case, it meant that you could get your computation done in half a day instead of a year.
Yes, it took me 3 weeks to get the algorithm implemented, instead of a single day, but even so — it was definitely worth it. And in many cases even a 3-fold improvement in speed is important, if you have long-running calculations.
Not knowing too much about processor architecture, I don't understand how fixed point can be much faster, since floating point ops are implemented in hardware.. I presume you used integer operations on your fixed point values, but could you explain a bit why it ends up being much faster than floating point?
It all depends on how precise your fixed point values need to be. If you can squeeze them into 8 bits (I could), you can use SSE 128-bit registers to operate on 16 values at a time. It gets even better with AVX, although that wasn't available to me at the time.
So the speedup is not just from going to fixed point, but from managing to use the vector instructions.
You're probably right that in most cases you should not write assembly code to try to make some code run faster. However, it is a very valuable skill to know how to read assembly code and spot the inefficiencies.
For a low level hacker, it is a very valuable skill to be able to write and especially read assembler code. I need that skill regularly in my day job. And I would have not acquired that skill if I had not written some assembly code. And besides, writing assembly code is fun!
Sometimes you need that 10x speed improvement to be able to do what you need to. To get your game running smoothly or your video playback work. You need to know when and how to optimize for performance.
> More important than that, 10 years from now it will continue working in new devices or OSs
They said that about x86... 20 years ago. I have applications written in Asm that still work on the latest CPUs today. The same binaries, not even needing recompilation, now run several orders of magnitude faster. I still see a lot of potential in extracting performance from x86 and although I hesitate slightly to make this prediction, I think it'll be the dominant architecture for at least 10 more years.
> I think it'll be the dominant architecture for at least 10 more years.
That needs to be qualified as "for desktops/servers" or similar. x86 haven't been the dominant architecture for at least a decade, if ever, in terms of units shipped. It's being outsold in number of units by ARM at a 10:1 ratio, and MIPS and PPC's are shipped in higher volume as well, or at least did as of a year or two ago. Possibly even 6502 and various micro-controllers, though getting numbers is harder.
Keep in mind how many CPU's are around you. Our servers have an ARM core per harddrive, and several of our RAID controllers have multiple PPC cores, for example. We have some servers with dozens of non-x86 CPUs per x86 CPU. Even some SD cards have ARM cores on them.
Now consider your car, microwave, washing machine, dish washer, tv, set-top box, phones, camera, music player, digital radio. A lot of stuff that was semi-mechanical or employed discrete logic a few years back now have CPUs that are ridiculous overkill, but used because they're so cheap there's no reason not to.
x86 is a diminishing niche if you look at electronics as a whole.
There are exceptions... things like color space conversions (though technically the last time I did that was in Cg) where that is about all you can get.
I can make things go more than 10 times faster in assembly. My main job is as manager/entrepreneur but I could read-write assembly as a result of my experience and I help-guide other people easily.
In the real world 10 times faster is nothing. You should spend the time understanding the problem in a mathematical way, and VERY IMPORTANT, documenting your work using images, text, voice and video.
This way you could make things go 100, 1000, 10000 times faster as most algorithms could be indexed, ordered in some way as to make it extremely fast, like doing log() operations instead of n squared or cubic or to the elevated to four or five(when you manage several dimensions like 3D with time or video analysis or medical tomography).
More important than that, 10 years from now it will continue working in new devices or OSs and will be something that supports the company instead of being a debt burden because the original developer is not here now(or you don't have the slightest idea of what you did so far away in the past and did not document).
The main problem is that people is not self aware that they forget things. And your brilliant idea that makes everything go 3 times faster is nuts if it makes everything way harder to understand, or if it could be forgotten even by you.