The SPUs were basically the next evolution if the ps2's two vector units, which were often running middleware too. It was really hard to write code that ran fast (Sony were advocate of writing assembly in an excel spreadsheet), but generally the vector units were doing about the same thing for every game/developer (bruteforce stuff: transforming vertices, calculating vertex lighting, generating multi-pass command streams for the rasterizer). So most developers were just using Sony provides examples, or improved versions from middleware developers. Very few developers wrote their own VU programs, or even needed to.
The cell now has 7 vector units, with comparatively more memory, but there was no default job for them, all the vertex transformation now ran on the GPU's vertex shaders. And Sony initially stuck to their guns of "SPU programs should be written in assembly, in a spreadsheet"
Because the single PPU really sucked and was nowhere near fast enough for anything, Sony eventually relented and releases a version of GCC which would compile c++ code to the SPUs. Fast to develop, but nowhere near the performance of an excel spreadsheet designed SPU program.
This resulted in a whole bunch of games running code on the SPUs that was really badly optimised. But at least it reduced load on the PPU.
I'm not actually sure how they had their spreadsheets set up, probably had a bunch of conditional formatting setup to highlight pipeline hazards, along with formulas to show total and wasted cycles counts.
Both Architectures had exposed pipelines, meaning the result of an operation would take a few cycles to show up in the destination register and some operations would take longer than others. You might have to insert a bunch of NOPs to make sure the data would be ready for the next instruction that needed it. Both Architectures were also dual issue, meaning two completely independent operations, operating on completely independent registers would be manually packed into a single instruction by the programmer. There also would be restrictions on which types of instructions could go in each half of the instruction, if you didn't have an instruction, you have to put a NOP there.
I'm pretty sure Sony liked the spreadsheets because it forced the programmer to see where all the NOPs were. The programmer would be expected to refactor things and manually unroll loops until all the NOPs were filled with useful instructions and peak performance was reached.
Why the downvotes? It's true. If you stray from AAA titles, you get plenty of PS3-only games that had terrible lags and struggled to even manage 30fps on reduced resolution. GUST titles like the Atelier series come to mind.
The cell now has 7 vector units, with comparatively more memory, but there was no default job for them, all the vertex transformation now ran on the GPU's vertex shaders. And Sony initially stuck to their guns of "SPU programs should be written in assembly, in a spreadsheet"
Because the single PPU really sucked and was nowhere near fast enough for anything, Sony eventually relented and releases a version of GCC which would compile c++ code to the SPUs. Fast to develop, but nowhere near the performance of an excel spreadsheet designed SPU program.
This resulted in a whole bunch of games running code on the SPUs that was really badly optimised. But at least it reduced load on the PPU.