Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If I have one wish for AMD, I wish that they would make FPGAs (and Xilinx) a more open and diverse platform like the PC. Not that the PC is perfect (there's still closed-source firmware), but any improvement to the current state of FPGAs would be welcomed.


You've struck on the fundamental problem that the FPGA industry has been trying to solve for 30+ years - how to get an FPGA into the hands of every developer, like how GPUs have propagated to be essential tools.

Nobody has come up with a good answer yet. Developing for an FPGA still requires domain-specific knowledge, and because place & route (the "compile" for an FPGA) is a couple of intertwined NP-hard problems development cycles are necessarily long. Small designs might take an hour to compile, the largest designs deployed these days ~24H.

All this to say is that while they are neat, nobody has found the magic bullet use case that will make everyone want one enough to put up with the pain of developing for them (a la machine learning for GPUs). Simultaneously, nobody has found the magic bullet to make developing for them any easier, whether by reducing the knowledge required or improving the tooling.

Effort has been made in places like High-Level Synthesis (HLS, compiling C/C++ code down to an FPGA), open-source tooling, and (everyone's favorite) simulation, but they all still kinda suck compared to developing software, or even the ecosystem that exists around GPUs these days. You'll often hear FPGA people saying stuff like "just simulate your design during development, compiling to hardware is just a last step to check everything works" - but simulation still takes a long time (large designs can take hours) and tracking down a bug in waveforms is akin to Neo learning to see the Matrix.


If the FPGA industry thinks it has been trying to do this for decades, then it has been going about it seriously wrong! Keeping your systems as black boxes, with unit prices and development prices that make them prohibitive for anything but high margin device, effectively guarantees they'll never become popular consumer commodities.

With how open development works, the straightforward minimal investment is to publicly document some devices' bitstream formats and bootstrap the ecosystem by releasing some reliable Libre place and route software. The software doesn't even have to contain all of the trade secret heuristics, it just has to work with (./configure && make && make install) and be functionally adequate enough that individual developers can scratch their own itches.


Why not ship integrated FPGA in CPUs?

Being able to offload a repeated, complex MIMD computation to an FPGA treated like an instruction could be a huge win for scientific computing and any large, steady workload that is expensive enough for companies to invest in optimizing for the FPGA. If this became commonplace and relatively inexpensive then large corporations would likely fund improvements into compilers to make the developer experience simpler and faster.


There are such CPUs, and the uptake has been minimal, because as proven by GPGPUs not every developer is capable of actually use them.

Your example could be as easily done in a GPGPU.


I just wanted to note Intel tried that and it didn't work. See pjmlp reply.

I still think the idea is sound, the way to go about it needs a lot of rethinking.


You don't seem bullish on the prospects of using Vitis [0] to deploy a machine learning model to a Xilinx FPGA?

[0] https://www.xilinx.com/products/design-tools/vitis/vitis-pla...


Disclaimer: I work in this space (not at Xilinx), comments are strictly my own opinions and do not reflect any positions of my employer, etc.

Broadly speaking, FPGA-based ML model accelerators are in an interesting space right now, where they aren't particularly compelling from a performance (or perf / Watt, perf / $, etc.) perspective. If you just need performance, then a GPU or ASIC-based accelerator will serve you better - the GPU will be easier to program, and ASIC-based accelerators from the various startups are performing pretty well. Where an FPGA accelerator makes a lot of sense is if you otherwise need an FPGA anyways, or the other benefits of FPGAs (e.g. lots of easily-controlled IO) - but then you're just back to square 1 of "there's some cases where an FPGA makes sense and many where it doesn't". Besides that, a few niche cases where a mid-range FPGA might beat a mid-range GPU on perf / Watt or whatever metric is important for you.

Again, opinions are my own and all that. As someone in the space, I am very much hoping that someone - whether an ASIC startup or Xilinx / Intel come up with a "better" (performant, cheaper, easier to use, etc.) solution than GPUs for ML applications. If the winner ends up being FPGAs, that would be really really cool! Just at the moment it's not too compelling, and I'm trying to be realistic.

All that said, FPGAs and their related supports (software, boards, etc.) are an $Xb / Y market - nothing to shake a stick at, and there are many cases where an FPGA makes sense. Just doesn't currently make sense for every dev to buy an FPGA card to drop in their desktop to play with.


>come up with a "better" (performant, cheaper, easier to use, etc.) solution than GPUs for ML applications

you probably are aware but Xilinx themselves is attempting this with their versal aie boards which (in spirit) similar to GPUs, in that they group together a programmable fabric of programmable SIMD type compute cores.

https://www.xilinx.com/support/documentation/architecture-ma...

i have not played with one but i've been told (by a xilinx person, so grain of salt) the flow from high-level representation to that arch is more open

https://github.com/Xilinx/mlir-aie


Fascinating, thank you! Admittedly I don't keep the closest tabs on what Xilinx is doing.


Yeah a co processor with fpga on it or even an expansion card I can buy at micro center for 100 bucks would be great!


Or even better, a useful FPGA that is inside the Ryzen processor itself. It might be very small, but if it was standardized, had an easier way to program it, and programs could load acceleration routines into it, that would be so cool.


This is what I’ve been thinking about for ages. It could be a huge accomplishment and greatly improve efficiency for scientific computing, data analytics and OLAP, cloud gaming, general backend development, etc


Intel has done the equivalent with some Xeon processors.

The chiplet design of current AMD CPUs should make this even easier to do.


PC are only open due to IBM's failure to prevent Compaq's endeveours.

If anything, we have seen the whole industry moving back to those days as means to get out of razor thin margins, specially now that desktops are a very niche market for most consumers.


>If anything, we have seen the whole industry moving back to those days as means to get out of razor thin margins,

Yes. It is interesting now that Apple is sort of like the new IBM.


What’s wrong with current state except that the chips have lead time of 52+ weeks?


The development side. Compiling and simulating your Verilog/VHDL can be done with open source software, but to put it on the FPGA itself, you generally need closed source (and sometimes paid) tools to generate the bitstreams. Contrast that with microcontrollers such as the ATMega which can be programmed from start to finish using an entirely FOSS stack - even the bootloader and programmer. And for some reason, these companies consider the bitstream formats trade secrets and refuse to document them at all.


this is true in general but

1) vivado webpack edition (ie free) lets you write (and flash) a bitstream for some of the small chips. i know it at least works for the artix-7 family because i'm doing it every day lately

2) for the artix-7 (and some lattice chips) you supposedly can use OSS (https://github.com/SymbiFlow/prjxray). i haven't tried it yet but one problem i can foresee is that the OSS tools won't infer stuff like brams and dsp. in fact the symbiflow people (i think?) explicitly call this out as the part of the project that's a work in progress.

some useful links:

https://arxiv.org/abs/1903.10407

https://github.com/YosysHQ/nextpnr

https://www.rapidwright.io/


> and some lattice chips

Lattice has been by far the favorite of the FOSS community, but there's been more news:

- https://github.com/YosysHQ/apicula has appeared for Gowin FPGAs found on e.g. Sipeed Tang Nano boards (very cheap on AliExpress) - a vendor called QuickLogic made SoCs that only use the FOSS toolchain for the FPGA part, out of the box: https://www.quicklogic.com/products/soc/eos-s3-microcontroll...


>Lattice has been by far the favorite of the FOSS community

i'm interested in the OSS flows but i haven't dug in yet. so some questions (if you have experience): isn't it only for their ice40 chips? and how smooth is the flow from RTL to bitstream to deploy?

one hesitation i have with jumping in is that i'm working on accelerator type stuff, so my designs typically need on the other of 30k-50k LUTs. will yosys+nextpnr let me deploy such a design to some chip?


I don't have that much experience (don't really have many use cases for FPGAs personally tbh) but:

Icestorm is for iCE40, Trellis is for ECP5 (which comes in variants up to 85k LUTs);

the flow is simple enough to do manually but there are things that make it one-click. This tutorial series https://youtube.com/playlist?list=PLEBQazB0HUyT1WmMONxRZn9Nm... uses one.

As for handling really big designs, I don't know.


I was productive using the previous generation Xilinx toolchain (ISE) within a few months of starting from scratch. I had hobbyist electronics experience but other than that was coming from a pure software background.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: