If I have one wish for AMD, I wish that they would make FPGAs (and Xilinx) a mor...

vzidex · on Feb 10, 2022

You've struck on the fundamental problem that the FPGA industry has been trying to solve for 30+ years - how to get an FPGA into the hands of every developer, like how GPUs have propagated to be essential tools.

Nobody has come up with a good answer yet. Developing for an FPGA still requires domain-specific knowledge, and because place & route (the "compile" for an FPGA) is a couple of intertwined NP-hard problems development cycles are necessarily long. Small designs might take an hour to compile, the largest designs deployed these days ~24H.

All this to say is that while they are neat, nobody has found the magic bullet use case that will make everyone want one enough to put up with the pain of developing for them (a la machine learning for GPUs). Simultaneously, nobody has found the magic bullet to make developing for them any easier, whether by reducing the knowledge required or improving the tooling.

Effort has been made in places like High-Level Synthesis (HLS, compiling C/C++ code down to an FPGA), open-source tooling, and (everyone's favorite) simulation, but they all still kinda suck compared to developing software, or even the ecosystem that exists around GPUs these days. You'll often hear FPGA people saying stuff like "just simulate your design during development, compiling to hardware is just a last step to check everything works" - but simulation still takes a long time (large designs can take hours) and tracking down a bug in waveforms is akin to Neo learning to see the Matrix.

mindslight · on Feb 10, 2022

If the FPGA industry thinks it has been trying to do this for decades, then it has been going about it seriously wrong! Keeping your systems as black boxes, with unit prices and development prices that make them prohibitive for anything but high margin device, effectively guarantees they'll never become popular consumer commodities.

With how open development works, the straightforward minimal investment is to publicly document some devices' bitstream formats and bootstrap the ecosystem by releasing some reliable Libre place and route software. The software doesn't even have to contain all of the trade secret heuristics, it just has to work with (./configure && make && make install) and be functionally adequate enough that individual developers can scratch their own itches.

opportune · on Feb 10, 2022

Why not ship integrated FPGA in CPUs?

Being able to offload a repeated, complex MIMD computation to an FPGA treated like an instruction could be a huge win for scientific computing and any large, steady workload that is expensive enough for companies to invest in optimizing for the FPGA. If this became commonplace and relatively inexpensive then large corporations would likely fund improvements into compilers to make the developer experience simpler and faster.

pjmlp · on Feb 11, 2022

There are such CPUs, and the uptake has been minimal, because as proven by GPGPUs not every developer is capable of actually use them.

Your example could be as easily done in a GPGPU.

ksec · on Feb 11, 2022

I just wanted to note Intel tried that and it didn't work. See pjmlp reply.

I still think the idea is sound, the way to go about it needs a lot of rethinking.

hcrisp · on Feb 10, 2022

You don't seem bullish on the prospects of using Vitis [0] to deploy a machine learning model to a Xilinx FPGA?

[0] https://www.xilinx.com/products/design-tools/vitis/vitis-pla...

vzidex · on Feb 10, 2022

Disclaimer: I work in this space (not at Xilinx), comments are strictly my own opinions and do not reflect any positions of my employer, etc.

Broadly speaking, FPGA-based ML model accelerators are in an interesting space right now, where they aren't particularly compelling from a performance (or perf / Watt, perf / $, etc.) perspective. If you just need performance, then a GPU or ASIC-based accelerator will serve you better - the GPU will be easier to program, and ASIC-based accelerators from the various startups are performing pretty well. Where an FPGA accelerator makes a lot of sense is if you otherwise need an FPGA anyways, or the other benefits of FPGAs (e.g. lots of easily-controlled IO) - but then you're just back to square 1 of "there's some cases where an FPGA makes sense and many where it doesn't". Besides that, a few niche cases where a mid-range FPGA might beat a mid-range GPU on perf / Watt or whatever metric is important for you.

Again, opinions are my own and all that. As someone in the space, I am very much hoping that someone - whether an ASIC startup or Xilinx / Intel come up with a "better" (performant, cheaper, easier to use, etc.) solution than GPUs for ML applications. If the winner ends up being FPGAs, that would be really really cool! Just at the moment it's not too compelling, and I'm trying to be realistic.

All that said, FPGAs and their related supports (software, boards, etc.) are an $Xb / Y market - nothing to shake a stick at, and there are many cases where an FPGA makes sense. Just doesn't currently make sense for every dev to buy an FPGA card to drop in their desktop to play with.

throwmeawaysoon · on Feb 10, 2022

>come up with a "better" (performant, cheaper, easier to use, etc.) solution than GPUs for ML applications

you probably are aware but Xilinx themselves is attempting this with their versal aie boards which (in spirit) similar to GPUs, in that they group together a programmable fabric of programmable SIMD type compute cores.

https://www.xilinx.com/support/documentation/architecture-ma...

i have not played with one but i've been told (by a xilinx person, so grain of salt) the flow from high-level representation to that arch is more open

https://github.com/Xilinx/mlir-aie

vzidex · on Feb 10, 2022

Fascinating, thank you! Admittedly I don't keep the closest tabs on what Xilinx is doing.

brian_herman · on Feb 10, 2022

Yeah a co processor with fpga on it or even an expansion card I can buy at micro center for 100 bucks would be great!

gjsman-1000 · on Feb 10, 2022

Or even better, a useful FPGA that is inside the Ryzen processor itself. It might be very small, but if it was standardized, had an easier way to program it, and programs could load acceleration routines into it, that would be so cool.

opportune · on Feb 10, 2022

This is what I’ve been thinking about for ages. It could be a huge accomplishment and greatly improve efficiency for scientific computing, data analytics and OLAP, cloud gaming, general backend development, etc

rjsw · on Feb 10, 2022

Intel has done the equivalent with some Xeon processors.

The chiplet design of current AMD CPUs should make this even easier to do.

pjmlp · on Feb 11, 2022

PC are only open due to IBM's failure to prevent Compaq's endeveours.

If anything, we have seen the whole industry moving back to those days as means to get out of razor thin margins, specially now that desktops are a very niche market for most consumers.

ksec · on Feb 11, 2022

>If anything, we have seen the whole industry moving back to those days as means to get out of razor thin margins,

Yes. It is interesting now that Apple is sort of like the new IBM.

lnsru · on Feb 10, 2022

What’s wrong with current state except that the chips have lead time of 52+ weeks?

colejohnson66 · on Feb 10, 2022

The development side. Compiling and simulating your Verilog/VHDL can be done with open source software, but to put it on the FPGA itself, you generally need closed source (and sometimes paid) tools to generate the bitstreams. Contrast that with microcontrollers such as the ATMega which can be programmed from start to finish using an entirely FOSS stack - even the bootloader and programmer. And for some reason, these companies consider the bitstream formats trade secrets and refuse to document them at all.

throwmeawaysoon · on Feb 10, 2022

this is true in general but

1) vivado webpack edition (ie free) lets you write (and flash) a bitstream for some of the small chips. i know it at least works for the artix-7 family because i'm doing it every day lately

2) for the artix-7 (and some lattice chips) you supposedly can use OSS (https://github.com/SymbiFlow/prjxray). i haven't tried it yet but one problem i can foresee is that the OSS tools won't infer stuff like brams and dsp. in fact the symbiflow people (i think?) explicitly call this out as the part of the project that's a work in progress.

some useful links:

https://arxiv.org/abs/1903.10407

https://github.com/YosysHQ/nextpnr

https://www.rapidwright.io/

floatboth · on Feb 10, 2022

> and some lattice chips

Lattice has been by far the favorite of the FOSS community, but there's been more news:

- https://github.com/YosysHQ/apicula has appeared for Gowin FPGAs found on e.g. Sipeed Tang Nano boards (very cheap on AliExpress) - a vendor called QuickLogic made SoCs that only use the FOSS toolchain for the FPGA part, out of the box: https://www.quicklogic.com/products/soc/eos-s3-microcontroll...

throwmeawaysoon · on Feb 10, 2022

>Lattice has been by far the favorite of the FOSS community

i'm interested in the OSS flows but i haven't dug in yet. so some questions (if you have experience): isn't it only for their ice40 chips? and how smooth is the flow from RTL to bitstream to deploy?

one hesitation i have with jumping in is that i'm working on accelerator type stuff, so my designs typically need on the other of 30k-50k LUTs. will yosys+nextpnr let me deploy such a design to some chip?

floatboth · on Feb 11, 2022

I don't have that much experience (don't really have many use cases for FPGAs personally tbh) but:

Icestorm is for iCE40, Trellis is for ECP5 (which comes in variants up to 85k LUTs);

the flow is simple enough to do manually but there are things that make it one-click. This tutorial series https://youtube.com/playlist?list=PLEBQazB0HUyT1WmMONxRZn9Nm... uses one.

As for handling really big designs, I don't know.

rjsw · on Feb 10, 2022

I was productive using the previous generation Xilinx toolchain (ISE) within a few months of starting from scratch. I had hobbyist electronics experience but other than that was coming from a pure software background.