in 3d you normally represent by a 4x4 matrix, which can be easily reduced to 6 floats (x y z translation, and x y z rotations). The problem with doing that little work is you'll very quickly be bottlenecked by copying memory to/from the GPU. If you want to do any raycasts for instance, or query for occlusion/bounds testing, you need to upload data to the GPU, run a (potentialy very slow, serial) query, then copy the result back. The results normally aren't worth it.
The question then becomes how slow is copying to/from the GPU versus doing the computation on the CPU when it comes to 100s/1000s of active bodies.
I agree that normally (small examples) it wouldn't be worth it, but for any large simulations, and say, video games of the future, I have yet to see someone attempt to do this. Perhaps it truly isn't beneficial enough... but who knows.
> The question then becomes how slow is copying to/from the GPU versus doing the computation on the CPU when it comes to 100s/1000s of active bodies.
The amount of time spent inside a GPGPU kernel for updating 1000 rigid bodies wouldn't even be as much as the length of time to copy the data back and forth. You've also got to consider the acceleration structure used for the collision detection. If you have a hierarchical tree-like structure (BVH, BSP tree) then how do you update it in parallel. you need to spin off thousands of tasks for it to be worth running on the GPU. If you have that many dynamic bodies, you're probably going to be draw call limited in trying to render them, unless they're exceptionally simple objects (Particles for example, which are already GPU accelerated in modern Game Engines).
> I agree that normally (small examples) it wouldn't be worth it, but for any large simulations, and say, video games of the future, I have yet to see someone attempt to do this. Perhaps it truly isn't beneficial enough... but who knows.
In modern AAA video games, the active body count in a scene would be no greater than the hundreds. Think of a scene from Assassin's Creed, or Battlefield, and think about how many truly dynamic things there are in the level. Chances are, there's you(the player), a handful of other players, a handful of explosive barrels, maybe 5-10 vehicles, and a few extras for bodies. For something like Assasin's creed, the computation is most likely in the animation, where hundreds of physics raycasts from the players hands to the various "climbable" points are performed, and the updating of the very detailed skeletal mesh. Modern games utilise almost 100% of the GPU time already on rendering for lighting, AA, shadows, ambient occlusion, transparency, reflections. To add more stress to the GPU is unnecessary really, considering that the computations are so short for a few hundred bodies that the bottleneck will be copying the data over and back.
I wrote my masters thesis on GPGPU accelerated Bounding Volume Hierarchies (A common structure used in Raytracing and in Collision Detection). The overhead of a small copy stalling the GPU is quite severe. It's worth it if you can do all your work on the GPU without having to copy back, but that's currently not feasible for interactive simulations.
Anything that uses NVidia Physx will have GPU physics in some sense, but very few (if any) of those actually use GPU acceleration for their rigid body simulations