Resurrecting the SuperH architecture

rwmj · on July 1, 2015

RISC-V seems like a better bet (http://riscv.org/). It is a clean, patent-free modern architecture. It already has kernel support, and supposedly there will be both FPGAs and ASICs "soon" (http://www.lowrisc.org/). Plus you can run it under qemu: https://rwmj.wordpress.com/2015/06/11/booting-risc-v-linux-w...

unwind · on July 2, 2015

This:

There have been some minor additions, he said: the J2 adds four new instructions. One for atomic operations, one to work around the barrel shifter, "which did not work the way the compiler wanted it to [...]

Is so intriguing! Does anyone know what was wrong with the original barrel shifter design? I tried reading up on it but failed to find much reference material. I followed the link to the J-core community site to read the code, but it wasn't immediately browsable, just available for download.

I assume there were compilers for SuperH back in the day, didn't they use the shifter? Why not fix the compiler to teach it the existing instruction, rather than adding an instruction just for this? How wrong can a shifter be, really? The questions just heap up.

TapamN · on July 2, 2015

Compilers did use the shifter. I don't know if this is exactly what he was referring to, but one oddity with the SH4's dynamic shift instruction is that it only shifts to the left (there are also a limited number of shift-by-small constant (1,2,8,16) amount instructions). To shift to the right, you have to first negate the shift amount, then preform a left shift. So if use did a right shift by a non-constant, you would always see a negation of the shift amount before the shift. My guess as to why it was implemented like this was that since the SH4 had a fixed length, 2-byte instruction set, running out of possible instructions for future expansion was a real hazard, and not encoding both directions was done to save space.

On the original SH4 implementation, under certain conditions, there had to be one cycle in-between when a shift-amount was generated and when it was used, otherwise there would be a one-cycle CPU stall. A real right shift would avoid the need to schedule around this stall. This isn't necessarily something that needs an extra instruction to fix, the implementation could be designed to not need the stall, but it might difficult to work around. I don't to circuit design, but dynamic shift instructions typically look at as few bits in the shift amount to simplify and speed up the design of the shifter. The reason for the delay in the original SH4 is probably because it analysis and tags each register with information for the correct shift direction and amount, and certain units won't have this information ready for the shifter in time, hence the stall if the shift is too close the shift amount generation. (I've read this certain CPU implementations have done similar work in tagging if a register is zero or not, in order to help keep branch-on-zero/not-zero instructions quick.) If the instruction talked about is a dedicated right shift, it could be defined in a way that doesn't need a negation and extra tagging, would be much more compiler friendly, and faster.

unwind · on July 8, 2015

Thanks!

Does that mean that the shifter is actually capable of doing rotates? Otherwise the negation part doesn't make any sense.

If you have 0xf0 and want to shift it three bits to the right to get 0x1e, no amount of negated-amount left-shifting is going to do that unless the instruction is a rotate.

If, on the other hand, you can do a 8-bit rotate left of 8-3 = 5 bits, that would produce the same result and need that "negation" (which is actually an inversion).

__david__ · on July 1, 2015

We used an SH2 for the main processor of a DDS (DAT) tape drive at a company I worked at. We had a prototype that used an SH3 and I remember spending a few days hacking Linux to boot on our hardware (I think we made it to user space and then the project petered out).

GCC has supported the SH series since at least the 2.7 era (though Hitachi's compiler seemed to produce better code in those days, but only ran under DOS).

kjs3 · on July 1, 2015

I do like SuperH. Especially the later ones. The SH4 had the most delightfully odd, fully pipelined 4x4 matrix X 4x1 vector instructions. If you could fit your problem in that box, you could get so remarkable speed for the clock.

spydum · on July 1, 2015

mm superH. reminds me of the old HP Jornada's.. they also ran on SH3 processor (well, some did). back in the day these were mindblowing to me.. a real pocked-sized PC, with a modem no less!

hoggle · on July 1, 2015

I always had high admiration for the SuperH architecture and now to read about its potential to fuel the much-needed open hardware movement is fantastic news.

nickpsecurity · on July 1, 2015

Good to see them doing it. I included SuperH in my list [1] of non-Intel architectures and old hardware to use post-Snowden. Additionally, I proposed that it falling out of favor despite Japanese chip-makers backing it might make it a nice candidate for trying to get them to open the design but still sell it. Concept is a proven design which can be verified by third parties, masked by whoever, and taped out at fab of their choice. Although, there's work in moving to new nodes and that cost would be on whoever did it. The precedent is Gaisler's SPARC-based processors and I.P. [2] that are dual-licensed as commercial and GPL with tools for easy customization.

Alternatively, I proposed the security enhancements for processors showing up in academia be applied to this or another processor with low market share as a differentiator. Some of these enhancements take almost no chip real-estate, esp simple tags & tag-checks. The chip designer could also make money for the semi-custom work. Time has passed, that didn't happen for low market chips, and did happen for AMD+Intel for non-security applications [that I know of]. Matter of fact, even though my scheme didn't happen, AMD is making so much money off the other half of my proposal that they could be cited when trying to convince chip-makers to do it for security enhancements with mass-market availability. So long as they don't bear the cost of failure (huge in ASIC's) they might go for it.

Finally, anyone wanting to deploy this or other things, remember the Structured ASIC's with FPGA conversions. eASIC [3] has a long track record in this with offers down to 28nm. Gigoptix [4] does S-ASIC's down to 28nm. Tekmos [5] offers a similar product at 350nm (good for budget masks). Just make sure you design in FPGA's with ASIC transition in mind from the start & follow published advice on that (available with Google or consultation). The result is you prove it in FPGA's, even use it on FGPA boards, and then move it to S-ASIC later for reduced costs/power + maybe speed increase. Authors are right that 180nm is a sweet spot although proving it at 350nm or higher first might be smarter given costs.

[1] https://www.schneier.com/blog/archives/2013/09/surreptitious...

[2] http://www.gaisler.com/

[3] http://www.easic.com/products/28-nm-easic-nextreme-3/

[4] http://www.gigoptix.com/products/asics/asic-type/structured-...

[5] http://www.tekmos.com/products/asics/process-technologies

nickpsecurity · on July 1, 2015

EDIT to add: Triad Semiconductor [1] has a mixed-signal ASIC take on S-ASIC's. Interesting stuff. Just found it.

[1] http://www.triadsemi.com/vca-technology/

listic · on July 1, 2015

Has Xilinx bitstream been reverse-engineered and reimplemented in open-source? I haven't heard about it.

bri3d · on July 1, 2015

I don't think so - at least according to their site, the J2 build chain uses Xilinx ISE: http://0pf.org/j-core.html . The only fully open FPGA toolchain I'm aware of targets Lattice/SiliconBLUE iCE40: https://github.com/cseed/arachne-pnr

cbd1984 · on July 1, 2015

From the comments there, early MIPS architectures are also patent-expired at this point.

They might have had more actual work done with them back in the old days, so their code might be in better shape now.

kevin_thibedeau · on July 1, 2015

The key point is that this is the first widely deployed 32-bit RISC platform with a 16-bit instruction set to come off patent. That has advantages for the embedded applications being targeted in this case. You won't get that with a MIPS or ARM clone because MIPS16 and Thumb are still under patent.

thrownaway2424 · on July 1, 2015

"Resurrecting" something that's not actually dead. SuperH is still a commercially-used CPU that you can buy off the shelf, as the CPUs themselves or inside many devices.