Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

having implemented it, with unoptimized verilog (ok, i wrote a verilog generator to generate it), it requires about 30% fewer LUTs on a FPGA relative to berkeley hardfloat implementation.


Is the variable length regime handling that much easier to deal with (space-wise) than the NaN and subnormal handling needed in IEEE floats? I'd think that the regime scheme would effectively be equivalent to creating a multitude of different-width subnormal routes. Is it really the NaN handling that kills IEEE float performance?


It's basically a barrel shifter; for addition you're going to need it anyways. Multiplication is a bit nastier, but most of multiplier gates are the adder gates anyways. I made a useful insight that negative numbers are basically the same as positives, with a "minus two" invisible bit.

Here is a sample 8-bit multiplier. All code was generated using a verilog DSL I wrote in Julia for the specific purpose. All verilog is tested by transpiling to c using verilator and mounting the shared object into a Julia runtime with a Julia implementation.

https://github.com/interplanetary-robot/mullinengine/blob/ma...




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: