Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A graphics card where the graphics core is actually an x86 application running on FreeBSD running on the "GPU"? That you could even log in to? That sounds out-of-this-world-amazing. What a shame it wasn't released like that.


Actually Xeon Phi is almost the same thing except it's not sold as GPU. You can ssh into it or run software rasterizer on it's cores and only thing missing would be host graphics driver.


It's also missing the texture units.

What I hear from my friend who works in the GPU industry, one of the main outcomes from the Larrabee research project was: It is possible to do most of graphical operations on a SIMDed general purpose CPU, with good performance. Everything except texture decoding, which really needs dedicated texture decoding silicon. And with texture units taking up 10% of the silicon, Intel really needs to decide if they want to sell it as a GPU or as a general purpose compute unit with 10% more cores. Intel chose the latter, and you can't really blame them, as you can sell dedicated compute units for more money.

Sony ran into the same problem with the PS3 and the cell. They originally designed it so the game developers could implement whatever rendering method they wanted, in software, on the SPUs. But the performance wasn't high enough, partly due to texture decoding taking up too much time. By the time it was discovered this was a problem, the cell was more or less finalises. Sony were considering adding a second cell to the console, to brute force through the problem, but eventually they asked Nvidia to hack together a traditional GPU.


> Intel really needs to decide if they want to sell it as a GPU or as a general purpose compute unit with 10% more cores. Intel chose the latter, and you can't really blame them, as you can sell dedicated compute units for more money.

From the article:

> Remember - KNC is literally the same chip as LRB2. It has texture samplers and a video out port sitting on the die. They don't test them or turn them on or expose them to software, but they're still there - it's still a graphics-capable part.

So the core space is still used right? They didn't choose 10% more cores, they just chose to turn it off an not test it, but it still uses the die space.


On knights corner, yes. The silicon design was finished by the time this decision was made.

It was a long term advantage, they removed the texture samplers for the next version, knights landing, which allowed them to fit more cores on that chip.


What does "texture decoding" actually consist of here - mipmap+interpolation?


Nothing terribly complex.

For the typical case of 2D uncompressed textures with some kind of trilinear/anisotropic filtering enabled, you are basically just calculating 8 addresses based on texture coordinates and clamping and mipmap levels, doing a gathered load (and CPUs hate doing gathered loads). Remember to optimise for the case that each group of 4 addresses are typically, but not always right next to each other in memory.

Then you use trilinear interpolation to filter your 8 raw texels into a single filtered texel. With the exception of the gathered load, none of these operations is actually that expensive on CPUs, but shaders do a lot of these texture reads and it's really cheap to and faster to implement it all in dedicated hardware.

You can also put these texture decoders closer to the memory, so the full 8 texels don't need to travel all the way to the shader core, just the final filtered texel. And since each texture decoder serves many threads of execution, you have chances to share resources between similar/identical texture sample operations.

And while the texture sampler is doing it's thing, the shader core is free to do other operations (typical another thread of execution). It's not that the CPU can't decode textures, its just that CPU cores without dedicated texture decoding hardware can't compete with the hardware which has the dedicated texture decoding hardware.


Also borders/mirroring, anisotropic filtering, decoding compressed formats like DXT# and BC#. To complicate things, there're also 1D and 3D textures, cubemaps, and texture arrays.


And most games on PS3 end up using SPU very little or completely leaving it for middleware used since it's was too complicated to work with for average programmer. Though there was really nice DICE presentation on their deferred shading using PS3 SPUs: http://www.dice.se/wp-content/uploads/2014/12/Christina_Coff...


The SPUs were basically the next evolution if the ps2's two vector units, which were often running middleware too. It was really hard to write code that ran fast (Sony were advocate of writing assembly in an excel spreadsheet), but generally the vector units were doing about the same thing for every game/developer (bruteforce stuff: transforming vertices, calculating vertex lighting, generating multi-pass command streams for the rasterizer). So most developers were just using Sony provides examples, or improved versions from middleware developers. Very few developers wrote their own VU programs, or even needed to.

The cell now has 7 vector units, with comparatively more memory, but there was no default job for them, all the vertex transformation now ran on the GPU's vertex shaders. And Sony initially stuck to their guns of "SPU programs should be written in assembly, in a spreadsheet"

Because the single PPU really sucked and was nowhere near fast enough for anything, Sony eventually relented and releases a version of GCC which would compile c++ code to the SPUs. Fast to develop, but nowhere near the performance of an excel spreadsheet designed SPU program.

This resulted in a whole bunch of games running code on the SPUs that was really badly optimised. But at least it reduced load on the PPU.


>Sony were advocate of writing assembly in an excel spreadsheet

What was supposed to be the advantage of this over a traditional plain text assembly language? Using Excel macros to generate code?


I'm not actually sure how they had their spreadsheets set up, probably had a bunch of conditional formatting setup to highlight pipeline hazards, along with formulas to show total and wasted cycles counts.

Both Architectures had exposed pipelines, meaning the result of an operation would take a few cycles to show up in the destination register and some operations would take longer than others. You might have to insert a bunch of NOPs to make sure the data would be ready for the next instruction that needed it. Both Architectures were also dual issue, meaning two completely independent operations, operating on completely independent registers would be manually packed into a single instruction by the programmer. There also would be restrictions on which types of instructions could go in each half of the instruction, if you didn't have an instruction, you have to put a NOP there.

I'm pretty sure Sony liked the spreadsheets because it forced the programmer to see where all the NOPs were. The programmer would be expected to refactor things and manually unroll loops until all the NOPs were filled with useful instructions and peak performance was reached.


Maybe the spreadsheet was used to track the pipeline state which is necessary for good performance.


And impossible to finish any game by the schedule


Most PS3 games I've played had terrible performance problem, I guess now I know why.


Why the downvotes? It's true. If you stray from AAA titles, you get plenty of PS3-only games that had terrible lags and struggled to even manage 30fps on reduced resolution. GUST titles like the Atelier series come to mind.


Apparently Naughty Dog used the SPUs heavily for calculating animations if I remember correctly from when I watched this:

https://www.youtube.com/watch?v=4ZFtP8LbUYc (An Uncharted Tech Retrospective)


On the PS3 they got around the problem by working together with the likes of Gran Turismo and other top AAA studios, to create a GPU profiler and also introduced their Phyre Engine.

I remember by the time Gran Turismo was launched, there was an event showing the new tooling to developers.


But the fun part is when there is a windows/mac/linux graphics driver talking to it and supplying the latest DirectX and OpenGL APIs.


VirtualGL is here and working so I don't think it's hard to do. Sadly hardware pricing and lack of interest leave very little chance that anyone would ever implement it.


Is it still running embedded FreeBSD?


I love how there are 3 different, mutually exclusive replies to your question so far.


It's a "normal" socket processor which happens to include graphics-like cores on-die.

You could probably install FreeBSD on it.


No. As far as I aware it's run Linux.


Not sure of the downvotes. If I am following the thread correctly, this is about KNC/KNL?

From a KNC card:

Linux <hostname>-mic0 2.6.38.8+mpss3.4.3 #1 SMP Mon Feb 16 16:08:55 PST 2015 k1om GNU/Linux

From KNL:

Linux <hostname> 3.10.0-229.20.1.el6.x86_64.knl2 #2 SMP Tue Dec 8 22:27:38 MST 2015 x86_64 x86_64 x86_64 GNU/Linux


I was talking about KNC since it's only hardware I ever had access to since Intel sold them 90% off on Amazon.


KNC effectively runs BusyBox with tweaks.


Busybox is not an OS.


If Busybox is not an OS, then neither is GNU.


GNU is not an OS either. GNU/Hurd is. GNU/Linux is.


That is factually correct.


I do not disagree.

I think you should do some reading on what an operating system is.


I think you should do some reading on my comment.


All language runtimes are just like OSes when targeted to run bare metal.

However I don't know if that is the case here.


why the heck even use openCL?!

leave a openCL layer there for the unwashed masses, but provide a decent API for anyone wanting exactly what the product is.


Well, the problem is that Intel started with something way worse than OpenCL when they brought their first Phi to market: OpenMP. The claim was to just wrap your loops in OpenMP directives and everything will be fine. Yea well...

If they had went full on OpenCL, including a unified programming for vector and multicore parallelism (the way CUDA works), I think Phi might have taken off much more already.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: