More

GregarianChild · 2025-09-14T19:52:18 1757879538

Can you explain the benefit of renaming dataflow as 'neuromorphic'?

You do understand that dataflow architectures have been tried many many times? See [1] for a brief history. MIT had a bit dataflow lab for many years (lead by the recently deceased Arvind). What is the benefit of re-inventing dataflow architectures by complete amateurs who are not at all aware of the 1/2 century research tradition on dataflow architecture, and the very clear and concrete reasons when this architecture has so far failed whenever it was tried for general purpose processors?

We can not even apply Santayana's "those who forget their history are condemned to repeat it because the 'neuromorphic' milieu doesn't even bother understanding this history.

[1] https://csg.csail.mit.edu/Dataflow/talks/DennisTalk.pdf

HarHarVeryFunny · 2025-09-14T20:57:14 1757883434

> Can you explain the benefit of renaming dataflow as 'neuromorphic'?

Neuromorphic just means brain-like or brain inspired, and brains operate in asynchronous dataflow type fashion.

I'm not sure how you read into what I wrote that I was "renaming dataflow as neuromorphic", which was certainly not what I meant.

I wonder if you regard Steve Furber (who I knew from Acorn), designer of the ARM CPU, as a "complete amateur"? He also designed the AMULET async processors, as well as for that matter the SpiNNaker system for spiking neural network research.

In any case, async (dataflow) processor design, while complex, clearly isn't an impossible task, and at some point in the future when the need arises (mobile robotics?) and there is sufficient financial incentive, I expect we'll see it utilized in commercial systems.

I'm not sure why you focus on "general purpose processors" given that we're talking about ANNs and neuromorphic systems. A custom chip would seem a better bet if the goal is to minimize power usage.

GregarianChild · 2025-09-14T21:18:00 1757884680

I don't rate Furber as a "complete amateur", but he's the exception in this milieu.

> Neuromorphic just means brain-like or brain inspired,

I don't even see any evidence that 'neuromorphic' architecture is brain inspired in a non-trivial sense. Can you please provide evidence, for example, a non-trivial mapping between 'neuromorphic' architectures (say SpiNNaker) and the SOTA models of the brain that we have, e.g. the existing data-driven model simulating C. elegans brain (the MetaWorm project)?

As Steve Furber also says (personal communication): we don't know enough of how the brain works to have computer architectures that can meaningfully inspired by brains. The "neuro-" prefix is marketing. [1] documents this use and dates it back to the 19th century. See also

• Neuroergonomics

• Neurotypical

• Neurodivergent

• Neurodiverse

• Neurosis

• Neuroethics

• Neuroeconomics

• Neuromarketing

• Neurolaw

• Neurosecurity

• Neuropsychology

• Neuropsychoanalysis

• Neurotheology

• Neuro-Linguistic Programming

• Neurogastronomy

I have seen all the above used without irony.

> brains operate in asynchronous dataflow type fashion.

That's a questionable statement. To the best of my knowledge, there is no consensus as of 2025 of how to model even a single neuron. (Function of synapse is even less understood).

When I asked Steve Furber what 'neuromorphic meant, he said: "There are many things today described as neuromphric. Mead would not call SpiNNaker as neuromorphic!"

Furber also said: "Neuromorphic status: attracts no money, but works (in the sense of accelerate in niche domains)". Upon my asking what niche domains, he said: "brain simulation but nothing else". (He was referring to SpiNNacker). I asked him if SpiNNaker can accelerate back-propagation and he said: "no, because the brain does not do back-propagation".

> async (dataflow) processor design, while complex, clearly isn't an impossible task

I did not say it was impossible. It has been done many times, see my references to Arvind's lab at MIT (I spent some time there). The problem with async (dataflow) processor design is that it consistently fails to live up to its promises (PPA). There are specific technical reason for that that are quite well understood.

> why you focus on "general purpose processors" given that we're talking about ANNs and neuromorphic systems.

Because the 'neuromorphic' marketing often reads like they want to build more efficient 'brain-inspired' general purpose computers. Certainly the dataflow architectures (a la Arvind/MIT) tried to. This is one of the many issues with the 'neuromorphic' milieu: they are really vague about their goals. If they would restrict their claims to certain classes of accelerators, then their claims would be less delusional.

> the goal is to minimize power usage.

If that is the goal, they are also not very successful. On CMOS silicon changes from 0 to 1 or 0 to 1 is what consumes most of the power, this would make the constant spiking expensive, no?

[1] K. S. Kendler, A history of metaphorical brain talk in psychiatry. https://www.nature.com/articles/s41380-025-03053-6

HarHarVeryFunny · 2025-09-14T22:47:22 1757890042

> I don't even see any evidence that 'neuromorphic' architecture is brain inspired in a non-trivial sense

I'm just giving the definition of the word neuromorphic. Individual systems claiming to be neuromorphic can be judged on their own merits, but that does not change the definition of the word.

>> brains operate in asynchronous dataflow type fashion. >That's a questionable statement. To the best of my knowledge, there is no consensus as of 2025 of how to model even a single neuron

You're confusing two things - do we know absolutely everything there is to know about every type of neuron and synapse, in order to build a 100% accurate model of them? No. Do we know that neurons operate asynchronously to each other, only activating when their inputs change? Yes.

> When I asked Steve Furber what 'neuromorphic meant, he said: "There are many things today described as neuromphric. Mead would not call SpiNNaker as neuromorphic!"

Of course not - it was just an ARM-based message passing system, built to be able to efficiently simulate spiking neural networks (modelled as message passing), but with no particular specialization for that task.

> I asked him if SpiNNaker can accelerate back-propagation and he said: "no, because the brain does not do back-propagation".

That's a silly question, and you got the silly answer you deserved. SpiNNaker is basically just an ARM cluster with fast message passing. It can be used to accelerate anything relative to running on slower hardware. If you wanted to train an ANN on it, using backprop, you certainly could, but why would you WANT to mix spiking neural nets and backprop ?!

> If that is the goal, they are also not very successful. On CMOS silicon changes from 0 to 1 or 0 to 1 is what consumes most of the power, this would make the constant spiking expensive, no?

Sure, but biological neurons don't spike constantly. They spike only when an input changes that causes them to activate - maybe on average only 10's of times per second. This is the whole point of an async (no clock)/dataflow chip design - for chip elements to only consume power when their inputs change - not to route a GHz clock to them that continuously draws power (by flipping between 0 and 1 billions of times a second) even if the element's inputs are only changing 10's of times a second (or whatever).

GregarianChild · 2025-09-15T08:47:16 1757926036

> definition of the word neuromorphic

The definition of the term is so vague as to be useless for scientific progress. What exactly is excluded by "brain inspired"? If I paint my computer grey because the brain is grey, it's brain inspired? You know that regular expressions were invented to model neurons [1]? If nothing is excluded the term is useless. But in reality, the term is used explicitly to deceive as already done in the 19th century see [2].

Heraclitus says somewhere something to the effect that "good thinking is dry". Established terminology like

• Asynchronous CPU

• Delay Insensitive Circuit

• Quasi-Delay-Insensitive Circuit

• Clockless CPU

• Asynchronous circuit

• Self-timed circuit

is dry and useful. 'Neuromorphic' suggests an understanding of how the brain works that is just not there in 2025.

> Do we know that neurons operate asynchronously to each other, only activating when their inputs change? Yes.

I strongly disagree.

We do not know how exactly the brain codes information. If you have timed coding, it's not asynchronous. And that doesn't even address the chemical signals that abound in the brain. The Hodgkin-Huxley equations give a good approximation to how the neuron produces a "spike" - the signal that travels down the axon. But it's only an approximation. Every type of ion channel in the cell adds another 10 or so unkown parameters to the equations, which can only be approximately found experimentally. So we have a fairly good picture of how a single neuron behaves in most circumstances, but not complete. Ion channels are are crucial to how the brain works. They are the nonlinearity that makes computation possible, like the transistor is to the computer. There are several hundred types of ion channel that are known about so far and probably thousands more that are not yet discovered.

> That's a silly question

Not in the context I asked: I asked him after he had suggested that 'neuromorphic' would be great to reduce the energy consumption of LLMs. I wanted to know why/how, given that LLMs are trained today with BP. I give him the benefit of the doubt here, since we were in a rush and could not go deeper into the subject.

> mix spiking neural nets and backprop ?!

It's an active area of research to do just this. Google SpikeProp, e.g. [3]. Lot's of papers in this space at the moment. Why? I don't know as I don't follow this space. Potential reasons: (i) BP is natural, (ii) BP works super well for DeepLearning, (iii) the 'neuromorphic' community has failed so spectacular and want to try something that works, (iv) to get funding, (v) we really do not know how the brain works, and in that case, why not try out crazy things?

[1] S. Kleene, Representation of events in nerve nets and finite automata. https://www.dlsi.ua.es/~mlf/nnafmc/papers/kleene56representa....

[2] K. S. Kendler, A history of metaphorical brain talk in psychiatry. https://www.nature.com/articles/s41380-025-03053-6

[3] https://homepages.cwi.nl/~sbohte/publication/esann.pdf

HarHarVeryFunny · 2025-09-15T11:58:00 1757937480

> Not in the context I asked: I asked him after he had suggested that 'neuromorphic' would be great to reduce the energy consumption of LLMs. I wanted to know why/how, given that LLMs are trained today with BP.

I assume this whole Q&A with Furber is a fantasy of yours that never happened, or alternatively you had done zero homework about SpiNNaker and spiking neural networks beforehand, and understood nothing about dataflow as a way to reduce energy consumption (as your last post proved).

You may think that leading off your responses with a bunch of bullet points you just got from Google makes you sound well-informed, but it doesn't - it comes across like a high schooler with nothing to say trying to pad their essay.

GregarianChild · 2025-09-15T18:22:44 1757960564

> fantasy of yours that never happened

Shall we bet money on this?

You are right, I don't know much about SpiNNaker. All I know about SpiNNaker is from a talk of Furber's where he said "Neuromorphic status: attracts no money, but works (in the sense of accelerate in niche domains)". When I asked niche domains, he replied: "brain simulation but nothing else".

> understood nothing about dataflow as a way to reduce energy consumption (as your last post proved).

If you don't tell me concretely, where I am wrong about dataflow, I cannot learn and improve ...

GregarianChild · 2025-09-14T10:42:10 1757846530

The 'brain-inspired' community has always been doing this, since Carver Mead introduced the term 'neuromorphic' in the late 1980s. Reselling banalities as a new great insight. My favourite is "Neuromorphic computing breakthrough could enable blockchain on Mars" [1]. What else can they do? After all, that community has now multiple decades of failure under it's belt. Not a single success. Failure to make progress in AI and failure to say anything of interest about the brain. To paraphrase a US president: In this world nothing can be said to be certain, except death, taxes and neuromphicists exaggerating. (Aside: I was told by someone who applied to YC with a 'neuromorphic' startup that YC said, they don't fund 'neuromorphic'. I am not sure about details ...). The whole 'brain talk' malarkey goes back way longer. In particular psychology and related subjects, since their origins as a specialty in the 19th century, have heavily used brain-inspired metaphors that were intended to mislead. Already in the 19th century that was criticised. See [3] for an interesting discussion.

There is something interesting in this post, namely that it's based on non-Nvidia GPUs, in this case MetaX [2]. I don't know how competitive MetaX are today, but I would not bet against China in the longer term.

[1] https://cointelegraph.com/news/neuromorphic-computing-breakt...

[2] https://en.wikipedia.org/wiki/MetaX

[3] K. S. Kendler, A history of metaphorical brain talk in psychiatry. https://www.nature.com/articles/s41380-025-03053-6

janalsncm · 2025-09-14T17:18:45 1757870325

> I was told by someone who applied to YC with a 'neuromorphic' startup that YC said, they don't fund 'neuromorphic'.

There is something refreshingly consistent in a VC that is laser focused on enterprise CRM dashboard for dashboards workflow optimization ChatGPT wrappers that also filters out the neuromorphicists.

Reminds me of how the Samurai were so used to ritual dueling and reading their lineages before battle but when the Mongolians encountered them they just shot the samurai mid-speech.

nostrebored · 2025-09-14T17:20:28 1757870428

There is — some of these things make money and the others don’t :)

justthisguy8578 · 2025-09-15T03:07:01 1757905621

This is 100% true. It's just very effective and profitable PR. There IS a minority stream in the field that uses the branding to get funding and then builds real tech. But you'll never know it as neuromorphic as the label comes off once it works. Look up Synaptics (touch pad) history.

GregarianChild · 2025-08-21T14:00:54 1755784854

This paper

https://arxiv.org/abs/2407.02944

ventures some guesses how Nvidia does this, and runs experiments to confirm them.

GregarianChild · 2025-08-20T21:59:43 1755727183

How is this different from superoptimisation?

Also, how do you ensure that newly generated kernels are correct w.r.t. the original naive kernel that you use as specification?

jafioti · 2025-08-20T22:15:42 1755728142

very similar to superoptimisation, but most superoptimisers try to tackle turing-complete code. by just doing a very limited space of computation (linear algebra with 12 primitive ops) the search remains tractable.

the search space is designed to remain logically equivalent at all times, by virtue of how its built (applying rewrite rules we know dont change the logical equivalence).

GregarianChild · 2025-08-21T06:49:38 1755758978

If the search space never leaves the programs that are equivalent to the original specification, that will probably limit the optimisations you can discover. (E.g. if you start out with standard matmul, you will not discover Strassen's algorithm.) This is not a criticism, I'm just trying to understand your algorithm.

jafioti · 2025-08-21T16:32:45 1755793965

could be...im not opposed to looking into this to see if there's no possible trajectory from naive to strassen's without leaving logical equivalency.

all the optimizations for matmul so far have been straightforward trajectories from naive (tiling, smem caching, tensor core offload, etc.)

GregarianChild · 2025-08-21T17:55:48 1755798948

There is an old CACM post that explains how to use a bit of randomness to avoid only doing semantics preserving program changes.

https://cacm.acm.org/research/stochastic-program-optimizatio...

GregarianChild · 2025-06-17T14:04:50 1750169090

Chisel has a compiler to Verilog. That is not the problem. Many semi-companies use a tool-chain to generate much Verilog from higher-level sources.

The rumour I heard was this: The problem with Chisel was that (at least in the past) the Chisel compiler did not preserve port structure well. So if you had a Chisel file that translated to 80M LoCs Verilog, then verified the 80M Verilog (which is very expensive), then made a tiny change to the source Chisel, the resulting new Verilog uses different port names even for the parts that were not affected by the change. (To quip: the (old?) Chisel compiler was a bit of a hash function ...) So you have to re-verify the whole 80M of Verilog. That is prohibitively expensive, compared to only reverifying the parts that truely need to change. The high verification costs forced by this problem were rumoured to nearly have sank a company.

This is a compiler problem, not a Chisel language problem. I was told that the compiler problem has been fixed since. But I did not check this.

GregarianChild · 2025-06-15T14:25:50 1749997550

The "intersection of all sets such that" is not vague at all. It's perfectly formally defined in ZF* set theories. But it's impredicative. One of the guiding ideas behind type theories is to minimise impredicative constructions as much as possible. After all, impredicative definitions are circular ... Of course there is no free lunch and the power of impredicative constructions needs to be supplied in other ways in type theories ...

GregarianChild · 2025-06-07T09:30:29 1749288629

Modern GPU instructions are often VLIW and the compiler has to do a lot to schedule them. For example, Nvidia's Volta (from 2017) uses 128-bit to encode each instruction. According to [1], the 128 bits in a word are used as follows:

• at least 91 bits are used to encode the instruction

• at least 23 bits are used to encode control information associated to multiple instructions

• the remaining 14 bits appeared to be unused

AMD GPUs are similar, I believe. VLIW is good for instruction density. VLIW was unsuccessful in CPUs like Itanium because the compiler was expected to handle (unpredictable) memory access latency. This is not possible, even today, for largely sequential workloads. But GPUs typically run highly parallel workload (e.g. MatMul), and the dynamic scheduler can just 'swap out' threads that wait for memory loads. Your GPU will also perform terribly on highly sequential workloads.

[1] Z. Jia, M. Maggioni, B. Staiger, D. P. Scarpazza, Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. https://arxiv.org/abs/1804.06826

inkyoto · 2025-06-08T10:19:22 1749377962

Personally, I have a soft spot for VLIW/EPIC architectures, and I really wish they were more successful in the mainstream computing.

I didn't consider GPU's precisely for the reason you mentioned – because of their unsuitability to run sequential workloads, which is most applications that end users run, even though nearly every modern computing contraption in existence has them today.

One, most assuredly, radical departure from the von Neumann architecture that I completely forgot about is the dataflow CPU architecture, which is vastly different from what we have been using in the last 60+ years. Even though there have been no productionised general purpose dataflow CPU's, it has been successfully implemented for niche applications, mostly in the networking. So, circling back to the original point raised, dataflow CPU instructions would certainly qualify for a new design.

GregarianChild · 2025-06-08T14:51:34 1749394294

The reason that VLIW/EPIC architectures have not been successful that for mainstream workloads is the combination of

• the "memory wall",

• the static unpredictability of memory access, and

• the lack of sufficient parallelism for masking latency.

Those make dynamically scheduling instructions is just much more efficient.

Dataflow has been tried many many many times for general-purposed workloads. And every time it failed for general-purposed workloads. In the early 2020s I was part of an expensive team doing a blank-slate dataflow architecture for a large semi company: the project got cancelled b/c the performance figures were weak relative to the complexity of micro-architecture, which was high (hence expensive verification and high area). As one of my colleagues on that team says: "Everybody wants to work on dataflow until he works on dataflow." Regarding history of dataflow architectures, [1] is from 1975, so half a century old this year.

[1] J. Dennis, A Preliminary Architecture for a Basic Data-Flow Processor https://courses.cs.washington.edu/courses/cse548/11au/Dennis...

GregarianChild · 2025-06-06T20:42:10 1749242530

I'd be interested to learn who paid for this machine!

Did Sandia pay list price? Or did SpiNNcloud Systems give it to Sandia for free (or at least for a heavily subsidsed price)? I conjecture the latter. Maybe someone from Sandia is on the list here and can provide detail?

SpiNNcloud Systems is known for making misleading claims, e.g. their home page https://spinncloud.com/ lists DeepMind, DeepSeek, Meta and Microsoft as "Examples of algorithms already leveraging dynamic sparsity", giving the false impression that those companies use SpiNNcloud Systems machines, or the specific computer architecture SpiNNcloud Systems sells. Their claims about energy efficiency (like "78x more energy efficient than current GPUs") seem sketchy. How do they measure energy consumption and trade it off against compute capacities: e.g. a Raspberry Pi uses less absolute energy than a NVIDIA Blackwell but is this a meaningful comparison?

I'd also like to know how to program this machine. Neuromorphic computers have so far been terribly difficult to program. E.g. have JAX, TensorFlow and PyTorch been ported to SpiNNaker 2? I doubt it.

floren · 2025-06-06T23:29:13 1749252553

As an ex-employee (and I even did some HPC) I am not aware of any instances of Sandia receiving computing hardware for free.

prpl · 2025-06-07T01:14:08 1749258848

no but sometimes they are for demonstration/evaluation, though that wouldn’t usually make a press release

rbanffy · 2025-06-07T13:33:39 1749303219

Unless the manufacturer makes it.

SubiculumCode · 2025-06-07T17:51:34 1749318694

I don't know, but just wanted to say that my son got a job there as a mechanical engineer, and I couldn't be more proud. He can't tell me much because of classified status, but I can tell he loves his job and who he works with. Just sending praise to Sandia

DonHopkins · 2025-06-07T10:19:08 1749291548

Deep Mind (Google’s reinforcement learning lab), Deep Seek (Alibaba’s LLM initiative), Deep Crack (EFF’s DES cracker), Deep Blue (IBM’s chess computer), and Deep Thought (Douglas Adams’ universal brain) all set the stage...

So naturally, this thing should be called Deep Spike, Deep Spin, Deep Discount, or -- given its storage-free design -- Deep Void.

If it can accelerated nested 2D FORTRAN loops, you could even call it Deep DO DO, and the next deeper version would naturally be called Deep #2.

JD Vance and Peter Thiel could gang up, think long and hard hard, go all in, and totally get behind vigorously pushing and fully funding a sexy supercomputer with more comfortably upholstered, luxuriously lubricated, passively penetrable cushioned seating than even a Cray-1, called Deep Couch. And the inevitable jealous break-up would be more fun to watch than the Musk-Trump Bromance!

justinclift · 2025-06-07T16:42:55 1749314575

> or -- given its storage-free design -- Deep Void.

Sounds like the big brother of Dev Null? :)

GregarianChild · on Dec 31, 2024

> both parallelism and pipelining

You are right, but this can be make more precise: pipelining is a specific form of parallelism. After all the different stages of the pipeline are executing in parallel.

adrian_b · on Dec 31, 2024

You use "parallelism" with a different meaning than me.

I use "parallel" with its original meaning "one besides the other", i.e. for spatial parallelism.

You use "parallel" with the meaning "simultaneous in time", because only with this meaning you can call pipelining as a form of parallelism.

You are not the only one who uses "parallel" for "simultaneous in time", but in my opinion this is a usage that must be discouraged, because it is not useful.

If you use "parallel" with your meaning, you must find new words to distinguish parallelism in space from parallelism in time. There are no such words in widespread use, so the best that you can do is to say "parallel in space" and "parallel in time", which is too cumbersome.

It is much more convenient to use "parallel" only with its original meaning, for parallelism in space, which in the case of parallel execution requires multiple equivalent execution units (unlike pipelined execution, which in most cases uses multiple non-equivalent execution units).

When "parallel" is restricted to parallelism in space, pipelining is not a form of parallelism. Both for pipelining and for parallelism there are multiple execution units that work simultaneously in time, but the stream of data passes in parallel through the parallel execution units and in series through the pipelined execution units.

With this meaning of "parallel", one can speak about "parallel execution" and "pipelined execution" without any ambiguity. It is extremely frequent to have the need to discuss about both "parallel execution" and "pipelined execution" in the same context or even in the same sentence, because these 2 techniques are normally combined in various ways.

When "parallel" is used for simultaneity in time it becomes hard to distinguish parallel in space execution from pipelined execution.

mafribe · on Dec 31, 2024

The pipeline stages (say: fetch, decode, execute, memory access, register write back), are organised "parallel in space" as transistors on chip. The point of having a pipeline is so the stages can execute "parallel in time".

More generally, parallel in space is interesting because it is a necessary precondition for parallel in time.

adrian_b · on Jan 1, 2025

In its original meaning, which is still the meaning used in mathematics and physics, "parallel" provides more information than just saying that the parallel things are located in different places in space. Such an information can be provided by other words.

One should not say that the pipeline stages are parallel, when the intended meaning is that they are separate or distinct, which is the correct precondition for their ability to work simultaneous in time.

"Parallel" says about two things that they are located side-by-side, with their front-to-back axes aligned, which is true for parallel execution units where the executions of multiple operations are initiated simultaneously in all subunits, but it is false for pipelined execution units, where the executions of multiple operations are initiated sequentially, both in the first stage and in all following stages, but the initiation of an execution is done before the completion of the previous execution, leading to executions that are overlapped in time.

The difference between parallel execution and pipelined execution is the same as between parallel connections and series connections in any kind of networks, e.g. electrical circuits or networks describing fluid flow.

Therefore it is better if the terms used in computing remain consistent with the terms used in mathematics, physics and engineering, which have already been used for centuries before the creation of the computing terminology.

GregarianChild · on Dec 15, 2024

I agree that the RISC-V ecosystem has some issues that I hope will be sorted out. But not all RISC-V cores are "baby cores". I imagine that XiangShan [1], albeit student work, will spur work towards bigger, more performant OOO cores.

[1] https://github.com/OpenXiangShan/XiangShan