I have always wondered why assembler is written the first way
MOVE src, dst
ADD src, dst
rather than the far more intuitive (and slightly more compact) second, something like
dst := src
dst += src
This also completely eliminates questions about which direction data goes, is 'mov a,b' a:=b or b:=a for example.
I can't see any reason for not using the established C-type notation, so why is the original style always perpetuated?
I'm aware that C approximately maps onto the original PDP ISA and has been called a high-level assembler, true or not that's irrelevant, but why the higher-level syntax has never made its way to lower level ASM has baffled me.
> why the higher-level syntax has never made its way to lower level ASM has baffled me
Traditionally assembler was both for bootstrap processes and for the older heavily resource constrained systems, so there was a lot of emphasis on making it as simple to parse as possible; opcode first, then arguments, because the opcode is first in the byte stream of almost all variable-instruction length systems.
And of course no support for complex expressions, so no point in building a full mathematic expression parser.
We don't have resource constrained systems now. I doubt parsing asm ever took up much anyway.
I'm not suggesting we permit complex (ie. general opcode combination) expressions. I deliberately never suggested it.
As to straighforward FMAC-type instructions, that's even clearer in my notation: a += b * c
But As for the proliferation of those add types you linked to, OK, possibly valid, but how much of this are you going to be manually writing compared to the very mundane style non-simd non-packed instructions? My guess is very little as you have relatively few number of such instructions (although they're doing a great deal of work over streams of data, but that's not relevant).
Oddly I think it's the weird instructions that people are going to write more often: hardly anyone writes assembler compared to the more likely use case of reading disassembly, and when they are writing it it's usually specifically to do something that's difficult or impossible to get a high level language to emit.
I'll be damned, it still exists, in all it's glory: http://terse.com/
> TERSE represents a whole new concept in low-level programming and is the first real advance in assembly language programming since the invention of the Macro Assembler.
> TERSE is an x86 specific programming language compatible with the entire processor family from the 8088 through the Pentium 4 and beyond. It is a machine-level language that gives you all of the control available in assembly language with the ease-of-use and the look-and-feel of a high-level language like C.
> TERSE is a very mature language. Conceived in 1986, implemented in 1987, proven in real world applications for over a decade, and used by Fortune 250 corporations, universities, and programmers on six continents since 1996! TERSE has virtually replaced assembly language for time and/or space critical applications in embedded x86 based and PC applications.
Good ol' Terse. Terse & HLA [0] always tended too far to the ASM side. Made writing it seem more like messy ASM. I wanted some thing more like C. Then I realized, I already had working C.
I like to say that AT&T follows natural English flow and byte order of the finally assembled binary code. Intel followed the more mathematically oriented syntax. Might be a bit back duck syndrome since I learned math and Intel first, but Intel seemed more intuitive to me as well.
However, ISAs developed later and are primarily MIPS-like follow Intel convention and extend it because it aligns better with 3AC [0] SSA[1] based optimizations.
It's very useful to have the actual instruction mnemonics spelled out, instead of relying on an implicit (often overloaded) definition of "+". E.g. there might be a 8-bit, 16-bit, 32-bit integer add, a 32-bit, 64-bit floating point add, a 16x8 vector integer add, etc.
for a long type? Or have a non-annotated opcode for the native machine word size (just '+') and annotated for anything smaller (edit: anything other) - ('+B' for byte addition).
An alternative and perhaps much better higher-level solution is to have typed ASM. I believe there is work done in this area (quick search gets http://www.cs.cornell.edu/talc/), which would allow the type to be implicit but checkably safe but for the occasional explicitly needed coercions.
Randall Hyde created a higher level assembler back in the 90's. It supported writing C like control statements, etc. It doesn't look like it has been updated to x86-64, still just 32-bit.
Very recently I picked up an FPGA dev board and started playing with implementing my first toy soft-CPU. For fun, and definitely not for profit, I decided to design my own ISA for it.
I decided to see if I could make a RISC-y MISC (minimal instruction set computer) design. From what I could see a lot of MISC-based computers had quite complex instructions.
While my resulting ISA is likely quite crap, being my very first ISA ever, it's been quite a fun exercise so far. I programmed quite a lot of asm back in the days, but thinking about which instructions are needed and why was something else.
Oh yeah also forgot to add that once you have some hardware you'll most likely want a cheap logic analyzer that supports sigrok[1] like this[2] one. Some LEDs and a breadboard etc is useful too.
I've so far been quite happy with the iCE40UP5k[1] based dev kit I got, though there are a lot of options out there[2].
The iCE40 FPGAs are a bit whimpy compared to Altera and Xilinx offerings from what I understand, but I really liked the idea of an open-source toolchain[3][4] being available.
To get a taste without committing cash you could just use a simulator[5], which I imagine you'd be using a fair bit anyway as it allows you easier access to the internal state.
As to actually programming, I've found the following resources useful. I started with Verilog mainly because code-gen tools like nMigen[6] generate Verilog, but for writing by hand it seems VHDL is preferred.
Anyway, links, first off some exercises to get going[7]. Introduction to Verilog[8], also has a nice general overview of HDL. Details of how non-blocking vs blocking statements in Verilog works[9], quite specific but was very informative for me.
There's also quite a lot of activity over at Reddit[10], and good experiences over at the EEVBlog forums[11].
Additionally, I'd like to clarify that nMigen generates Verilog indirectly; it actually generates RTLIL, which is the intermediate representation of Yosys, and then Yosys turns it into Verilog after some cleanup passes.
I'll happily admit to being biased, but nMigen is so much easier for me to work in than Verilog ever was.
Ah, I saw the redirect but figured it was better to use the plain URL. Sadly too late for an edit now.
Hadn't quite gotten the connection with Yosys right yet, as I haven't yet started with nMigen, but yeah I definitely want to go there. But like I said, I prefer getting a good handle on Verilog first so I know what to look for when things go wrong.
I was inspired to all of this by the YouTube video series by Robert Baruch[1], where he tries to recreate the 6800 CPU on a FPGA using nMigen with formal verification.
Verilog for writing RTL is fine, especially if you use the syhthesizable subset of SystemVerilog. There used to be a bit of a religious war between VHDL and Verilog, as VHDL had some syntax that prevented certain errors, but with SV and some basic coding guidelines it's fine. While I'm sure some will still be hanging onto VHDL, I'd say most of the industry is going the SV way.
SystemVerilog improves things a lot actually, but the problem is that there is (currently) no freely available, robust synthesis frontend that supports the majority of the useful SV features (e.g. interfaces). In fact I don't think there's any robust, ~complete FOSS SystemVerilog simulators either -- though Icarus and Verilator support SV to varying degrees...
So if you want to stick with FOSS tools, then you're stuck with synthesizable Verilog-2005 at best, for the moment. And standard Verilog very much sucks in a lot of ways, I would argue, synthesizable subset or not. It's an understandable choice though, in a field full of awful options. One day I'm hopeful Yosys will support most of the necessary SystemVerilog features people want for synthesis... (Then, it can also serve as an effective SystemVerilog -> Verilog translation tool, which would be very useful on its own.)
Also it's very important to keep in mind that HDLs describe hardware, the HDL code is not executed on the hardware. Think of it as writing C++ templates.
I really like conditional subroutine calls. I wish x86 had them. It makes it really easy to inline a fastpath of some safety check (e.g. nullcheck, bounds check, write barrier, etc), and have the slowpath be factored out to a common place. What PL implementations like the JVM, JavaScript engines, etc typically have to do without this is they insert a conditional branch to "deferred code" which is at the end of the function (statically predicted not-taken), but that deferred code can't be shared, because it needs to branch back to the mainline code. That costs code size. A conditional subroutine call is exactly the right mechanism to solve this!
> 16 registers divided into to areas: R0 to R7 are in fact a window to a register bank containing 256 times 8 registers while R8 to R15 are fixed. This architecture makes subroutine calls and saving registers very easy (just increment/decrement the register bank pointer which is part of the status register). All in all QNICE features 256 * 8 + 8 = 2056 registers.
This is indeed nice and the kind of thing whose availability can affect the design of low-level languages, e.g. non-ISO local variables in a Forth dialect.
I got it from a very reliable source, someone who was involved in evolving that hardware, that this feature caused Sun "an awful lot of pain" (his words, best I can recall).
My understanding is that it greatly destroyed the ability to do out-of-order execution, which was on top of the original designer's failure to understand what the compiler could do with inlining that would largely negate the value of this. From what I've read over the years, the Sparc hardware guys didn't talk to the compiler guys - a trap the designers of the DEC Alpha very carefully did not fall into.
Since this is a teaching project, I'm sure that feature is fine, but just saying.
I heard it said by a guy at a conference. This guy: https://en.wikipedia.org/wiki/Ivan_Sutherland Sutherland was interested in the Sparc, surprising given he's better known for graphics but there you go.
As it happened, Steve Furber was there too. Irrelevant but bragging rights & all that.
Sutherland was something like a cofounder of Sun Labs, so I guess he was likely to develop some interest. I'm afraid I couldn't track down anything more specific, but I'll keep my eyes peeled. An interesting point I did find: the SPARC architecture was later extended so that register windows could be saved and restored other than via SUB calls, allowing instruction reordering and their use for context switching.
The Register window article doesn't talk about Sun's experience or the issue of reordering instructions.
I didn't know he was a sun co-founder. I remember him saying that he had an interest in geometry rather than graphics, which related to chip design - but that link is to me a bit nebulous, and it was a long time ago anyway.
Sun would not advertise an horrifically expensive design mistake, so no surprise you can't find much. I've picked up a fair bit from random reading around over the years so I can't remember where much of it came from.
Perhaps email Mr. Sutherland and just ask? Worst that can happen is he doesn't respond.
(thanks for the bit about using other than SUB calls, I didn't know).
Not Sun but Sun Labs. Ivan's Sutherland, Sproull and Associates was bought by Sun in 1990 to become the seed of the new Sun Labs.
About register windows, overflows and underflows generated traps to the operating system, and for the combination of Sparc version 8 and SunOS that meant thousands of clock cycles. That was improved in later products.
Berkeley's RISC I to IV all had register windows but RISC-V doesn't with the argument that we have far better compilers now. Altera's NIOS processor had register windows which were dropped in NIOS II because it made the processor smaller without reducing the performance too much.
The AMD 29000 had a more flexible register window scheme and the Itanium a very complex scheme.
Computer history is often more like a spiral than a line and old ideas that have become bad might be good again in the future. With out-of-order execution and register renaming you might once again get better performance out of a binary with register windows.
It doesn't surprise me that having more visible registers and doing a good job of register allocation in the compiler is a better strategy (i.e., both more efficient and more flexible), but doing so means you are not so close to the metal. From the point of view of an efficiency feature that doesn't ask for complexity in the compiler, I see some appeal to register windows.
It's not just that. If you have a huge internal register file and register renaming, along with a few other tricks, then it's at least as efficient (and potentially more efficient) to just push and pop the registers that actually get used.
Is 'efficient' what you want though? For embedded, yes. For maximum speed where you are willing to blow large power use, you can be much faster but it will cost you efficiency, badly (think Xeons). Without quantifying what you're trying to do, 'efficient' is only a metric not a target.
Has there ever been any attempts to systematically generate processor instruction set designs, evaluate them against 'real' code and measure the results?
Edit: You'd need to generate compiler back ends for each design as well, which might be fun...
I can't see any reason for not using the established C-type notation, so why is the original style always perpetuated?
I'm aware that C approximately maps onto the original PDP ISA and has been called a high-level assembler, true or not that's irrelevant, but why the higher-level syntax has never made its way to lower level ASM has baffled me.