GPU ray tracing tutorial – 10 articles

jeroenhd · on June 15, 2022

Back when I took the author's course on computer graphics in Utrecht (took me a few times to pass it, by no fault of his) I thought it was very strange that the course started out with ray tracing rather than traditional GPU rendering. After all, when you think graphics, you think OpenGL/Vulkan/DirectX, right?

Only after having to implement both types of renderer do you really get an appreciation of how elegant ray tracing really is in comparison. The basic ray tracer from this tutorial clocks in less than 200 lines of C++ excluding the headers! Then there are optimisations like BVHs/BLAS/TLAS which are all so simple to think and reason about compared to the inner workings of a GPU rendering pipeline.

I should find the time to go through this guide again and find out how I can get more performance out of my old ray tracer now that I've grown a few years older and wiser.

This tutorial is more about optimizing a ray tracer than writing one from scratch. If you're looking to learn the basics, I recommend reading through the tutorial the same author wrote eighteen years ago [1]. It covers the more basic concepts of a ray tracer without telling you exactly what to copy paste unless you "cheat" and download the code archive, which is a great way of teaching concepts to programmers in my opinion, as it gives you the opportunity to think for yourself.

With modern C++ you'd probably want to write your code a bit different (VC++ 6 wasn't the best C++ even at its time) and the compute limitations at the time are dwarfed by even your average integrated GPU, but the core concepts haven't changed.

[1]: https://www.flipcode.com/archives/Raytracing_Topics_Techniqu... (I needed to fix the encoding for the page in Firefox to get the math to show up right)

pengaru · on June 15, 2022

Years ago I read the flipcode article you linked and wrote a raytracer as an introductory exercise to 3D graphics. I even ended up reaching out to Jacco afterwards via linkedin to thank him for the making the tutorial, to my surprise he replied.

It does seem like an excellent way to get introduced to 3D graphics. All the fully-featured triangle rasterizers are a huge pile of complexity. Especially if you're going hardware-accelerated and will also have the burden of navigating existing platform-specific GPU APIs.

There's something to be said for just writing boring CPU-based code in your favorite language and getting to focus on the actual subject being explored.

jiggawatts · on June 16, 2022

Over decades I've written a scanline renderer, a ray tracer, a radiosity renderer, and a DirectX game engine.

Both ray tracers and the related ray-based radiosity renderers are so simple that I wrote one of the latter for fun in a mandatory IT training class using JavaScript in a browser because the locked-down training PCs had no dev tools. I used literally Windows Notepad and got a nice looking Cornell box within a few hours.

However, what bugs me to a visceral level is how "against the grain" the performance of random sampling is. It's mathematically beautiful, a simple application of Monte Carlo integration, but from a computer engineering perspective it is hideous. Random unbiased sampling produces "reference grade" images but only through tremendous brute force to suppress noise from things like caustics or textured reflective materials. Biased sampling always introduces complex, unpredictable errors into the final render, which makes me feel "itchy" for the want of a better word.

Then to layer insult onto injury, random memory access patterns are murder to every modern CPU and GPU cache hierarchy design. You get the worst case performance of uncached RAM reads for every sample. Working around this takes heroic effort of complex acceleration data structures, "sorting & bundling" of rays, and more. To this day, it is effectively an unsolved problem, albeit NVIDIA has had some really impressive progress lately.

E.g.: https://www.youtube.com/watch?v=MUDveGZIRaM

Some examples of how there's still a lot of fundamental research being done in the 2020s for rendering caustics efficiently: https://www.youtube.com/watch?v=2qqDwaZlkE0

pixelpoet · on June 15, 2022

Lotta Dutch guys in the comments :)

I wrote similar stuff about ray tracing for learning sort of recently here: https://news.ycombinator.com/item?id=30715009

kragen · on June 15, 2022

I've written several raytracers and rasterizers that are smaller than 200 lines of C++, though quite likely they're worse pedagogically than Jacco's (slashdotted, but available at https://web.archive.org/web/20220615174927/https://jacco.omp...) tutorial, and they also don't illustrate useful optimizations. Hopefully, what mine lack in cluefulness and performance they make up in breadth, diversity, and brevity: they are written in C, C++, Python, JS, Lua, and Clojure, with output to JPEG files, PPM files, X11, the Linux framebuffer, ASCII art, Unicode Braille art, and the browser <canvas>.

· http://canonical.org/~kragen/sw/aspmisc/my-very-first-raytra... 184 lines of C, including vector arithmetic, input parsing, and PPM output. I'm not sure what you mean by "excluding the headers" — this one doesn't have any headers of its own (why would a 200-line program have headers of its own? Are you on a Commodore 64 such that the compilation time for 200 lines of code is so high that you need separate compilation?) but it #includes math.h, stdio.h, stdlib.h, and string.h, which total almost 1800 lines of code on my machine and presumably 15× that by the time you count their transitive includes.

· http://canonical.org/~kragen/sw/dev3/circle.clj 39 lines of Clojure, including the model, which is a single sphere; it uses java.awt.image for JPEG output. About half of the code is implementing basic vector math by hand. A minified version is under 1K: http://canonical.org/~kragen/sw/dev3/raytracer1k.clj

· https://gitlab.com/kragen/bubbleos/blob/master/yeso/sdf.lua 51 lines of Lua for an SDF raymarcher including animation, the model itself, and live graphical output. SDFs are cool because it's often easier to write an SDF for some shape than to write code to evaluate the intersection of an arbitrary ray with it. This one runs either in X-Windows, on the Linux framebuffer, or in an unfinished windowing system I wrote called Wercam.

I feel like basic raytracing is a little simpler than basic rasterizing, but I don't think the difference is hugely dramatic:

· http://canonical.org/~kragen/sw/torus is a basic rasterizer in 261 lines of JS, which is larger than the three raytracers I mentioned above, but about 60% of that is 3-D modeling rather than rendering, and another 5% or so is DOM manipulation. On the other hand, one of the great things about raytracing is that if you want to raytrace a sphere or torus or metaballs or whatever, you don't need to reduce them to a huge pile of triangles; you can just write code to evaluate their surface normals and intersect a ray with them, and you're done.

· http://canonical.org/~kragen/sw/netbook-misc-devel/rotcube.p... The smallest I've been able to get a basic rasterizer down to, 15 lines of Python, just rotating a point cloud, without polygons. You might argue that rotating a point cloud is stupid because it doesn't look very 3-D, but Andy Sloane's donut.c does okay by just having a lot of points and applying Lambertian shading to the points in the point cloud: https://www.a1k0n.net/2011/07/20/donut-math.html. If your point cloud is generated by intersecting a field of rays with some object, its density variation will approximate the Lambertian brightness of the object as illuminated by that ray field.

· http://canonical.org/~kragen/sw/dev3/rotcube.cpp in C++ rotating an ASCII-art pointcloud is 41 lines; and

· http://canonical.org/~kragen/sw/dev3/braillecube.py with wireframes in Braille Unicode art it's 24 lines of Python, but that's sort of cheating because it imports a Braille Unicode art library I wrote that's another 64 lines of Python. Recording at https://asciinema.org/a/390271.

So I think that the core of either a (polygon!) rasterizer or a raytracer, without optimizations, is only about 20 lines of code if your ecosystem provides you with the stuff around the edges: graphical display (or image file output), model input, linear algebra, color arithmetic. If you have to implement one or more of those four things yourself, it's likely to be as big as the core rasterizer or raytracer code.

For a polygon rasterizer, it's something like:

    tpoints = [camera_transform @ point for point in points]
    framebuffer.fill(background)
    painter = lambda poly: min(tpoints[i].z for i in poly.v)
    for poly in sorted(polys, key=painter)):
        normal = tpoints[poly.normal]
        if normal.z > 1:  # backface removal, technically an optimization
            continue
        p2d = [(p.x / p.z, p.y / p.z) for p in [tpoints[i] for i in poly.v]]
        lambert = normal.dot(light_direction)
        color = min(white, max(black, lambert * light_color + ambient))
        framebuffer.fill_poly(p2d, color)

While a Whitted-style raytracer is more like this:

    for yy in range(framebuffer.height):
        for xx in range(framebuffer.width):
            ray = vec3(xx, yy, 1).normalize()
            hits = [(o, o.intersect(ray)) for o in objects]
            hits = [(o, o.p) for o, p in hits if p is not None]
            if hits:
                o, p = min(((o, p) for o, p in hits),
                          key=lambda t: t[1].z)  # nearest
                framebuffer[xx, yy] = o.shade(p)
            else:
                framebuffer[xx, yy] = background

But this presumes you've previously transformed the objects into camera space, it leaves .intersect and .shade to be defined (potentially separately for each object), and it doesn't do the neat recursive ray-tracing thing that gives you those awesome reflections. For a sphere, intersection is about 7 lines of code evaluating the quadratic formula (which you can cut to 3 if you have a quadratic-equation solver in your library), and basic Lambertian shading is about the same as in the rasterizer; your surface normal is (p - sphere.center).normalize().

The core of my Lua SDF raymarcher I linked above is simpler than that. Here I'm using the iteration count as part of the shading function to fake ambient occlusion, which is pretty bogus because it depends on where the camera is in a totally non-physically-based way, but it looks pretty 3-D.

    local function torus(p, c, r1, r2)
       return length2(length2(p[1]-c[1], p[3]-c[3]) - r1, p[2]-c[2]) - r2
    end

    local function render_pixel(x, y, palette)
       local p, n = {x,y,1}         -- near clipping plane: z=1
       local q = normalize(p)       -- ray direction

       for i = 0, 255 do
          n = i
          local r = scene_signed_distance_function(p)
          p = add(p, mul(r, q))

          if p[3] > 10 then return palette(0) end  -- far clipping plane
          if r < 0.02 then break end
       end

       return palette(max(0, min(255, 48 - n - math.floor(p[1]*-16+p[2]*32))))
    end

I know what BVHs are, even though I've never implemented them. I'm so clueless that I didn't know what BLAS and TLAS are, but Jacco explains them in part 6 of his series: https://web.archive.org/web/20220605013040/https://jacco.omp....

dahart · on June 15, 2022

> I feel like basic raytracing is a little simpler than basic rasterizing, but I don't think the difference is hugely dramatic

It certainly gets hugely dramatic once you include shadows & reflections.

delta_p_delta_x · on June 16, 2022

> It certainly gets hugely dramatic once you include shadows & reflections.

It's remarkable just how 'hacky' high-quality rasterised graphics is.

For shadows, render a scene from the PoV of every light source, create shadow maps, and then transform those shadow maps to camera space.

For reflections, use stencil buffers; for global illumination, use radiosity maps; for ambient occlusion, either bake it in or take a big runtime penalty and render them real-time...

Ray-tracing (and derivatives, like (bidirectional) path tracing, light transport, etc) should really be called simulations, and the physics and mathematics behind it is so straightforward, but extremely accurate. Even the simplest Whitted ray-tracers can produce fairly photorealistic rendering with simplified geometry, and extending ray-tracers to include very complex effects (subsurface scattering, PBR, even general relativistic ray-tracing) is comparatively straightforward.

The only problem is the absolute battering that ray-tracing does on traditional hardware.

kragen · on June 21, 2022

Yes, that's true, there are a lot of things that are easier with ray-tracing; I'd add refractions, laser-sparkle interference patterns, and volumetric ray-marching to the list.

On the other hand, if you want to draw hidden lines (as for a mechanical drawing), draw lines at edges between facets (wireframishly) or to outline surface curvature, or add null halos around foreground elements, I think those are easier to do with a polygon or NURBS rasterizer.

hoosieree · on June 16, 2022

"A Ray-Tracer in 7 Lines of K": https://www.nsl.com/k/ray/ray.k

jbikker · on June 16, 2022

Here's one in 9 lines of C: :)

https://pastebin.com/c7n66AVm

kragen · on June 21, 2022

This is gorgeous! You packed in not just a texture but even a scene description! And reflections! However, I have a couple of quibbles:

1. It is not C.

2. It is not 9 lines.

To elaborate on the first point, it's C++, using the functional cast syntax int(C*N), C++ includes, and a non-constant static initializer, none of which are legal in C.

To elaborate on the second point, it's not "9 lines" of C++ in the sense that My Very First Raytracer is "184 lines of C"; it's only 9 lines in the sense that it has two lines of #includes, and then you've chosen to insert six newlines into it at essentially arbitrary locations! Conventionally formatted, it's 45 lines of C++, which seems in keeping with my estimate that the basic raytracing algorithm is about 20 lines of code if you have linear algebra, while having to implement linear algebra adds another bit of code that's slightly larger than 20 lines.

I'm not sure how to define logical lines of code for K, but it seems relevant that one of those 7 lines defines two nested functions.

Here's the reformatted version of your Tinytrace, which I look forward to studying in more detail:

    #include <cmath>  // sqrt
    #include <cstdio>  // fopen
    int i, p = 0, h[] = { 3 << 16, 8 << 24, 0, 41944064, 8 };

    FILE *f = fopen ("o", "wb");
    int main() {
      fwrite (h, 2, 9, f);

      for (; p < 9 << 17;) {
        float x = 0,
          T = .2,
          Y = p % 1024 / 430. - 1,
          R = p++ / 327680. - 1,
          E = 3,
          C = 1 / sqrt (Y * Y + R * R + 1),
          I, t, N, A;
        Y *= C;
        I = 1 | -(Y < 0);
        R *= C;

      a:
        t = x - I;
        A = T - I;
        N = Y * t + R * A - C * E;
        A = N * N - t * t - A * A - E * E + 1;
        N += sqrt (A);
        if (N < 0) {
          E += N * C;
          x -= N * Y;
          T -= N * R;
          t = x - I;
          A = T - I;
          N = 2 * (Y * t + R * A - C * E);
          I = -I;
          Y -= N * t;
          R -= N * A;
          C += N * E;
          goto a;
        }

        fputc (i, f);

        if (R < 0) {
          N = (3 - T) / R;
          i = Y * N;
          R *= .4 - (((i + int(C*N)) & 1) + .6);
        }

        i = R > 1 ? 255 : R * 255;
      }
    } // "TINY TRACE" edition - JB'22 (but reformatted)

For anyone else who wants to run it, you will probably want to rename the output file "o" to "o.tga", because it's a TARGA-style uncompressed image.

_glass · on June 16, 2022

From the example http://canonical.org/~kragen/sw/netbook-misc-devel/rotcube.p... how does this for expression work? cube = [(x, y, z) for x in -1, 1 for y in -1, 1 for z in -1, 1]

My interpreter can't do it, and I don't really understand how this is supposed to work :)

Thanks!

kragen · on June 21, 2022

Sorry, I wrote that 10 years ago. I guess Python 3 requires you to say "for x in (-1, 1) for y in (-1, 1) for z in (-1, 1)", while Python 2 parsed those as tuples even without the parens. The parens make the code clearer anyway.

helpyhelperx · on June 16, 2022

It works on Python38 with

cube = [(x, y, z) for x in range(-1, 1) for y in range(-1, 1) for z in range(-1, 1)]

I had the same question. Maybe it's a Python2 construct?

_glass · on June 17, 2022

thanks, this is working. now I also get the math.

jazzyjackson · on June 15, 2022

Looks like a great resource.

I was about to ask how you fixed the page encoding but figured it out for safari, just needed to click View > Text Encoding > ISO Latin 1

jbikker · on June 16, 2022

You mean it looks funny in Safari without that setting? I am learning a ton about wordpress caching plugins, after last night's HN deluge... Perhaps I can do a tutorial on that next. ;)

jazzyjackson · on June 16, 2022

Because the page doesn't specify a character encoding (ie with a <meta charset="windows-1252"> tag), the browser defaults to UTF-8, so any non-ascii character shows up as a big fat �.

rustybolt · on June 16, 2022

When I did the course in Utrecht they did actually switch to the shader approach.

jbikker · on June 16, 2022

That must have been very recently then? IIRC, until two years ago there was ray tracing, then rasterization, in that order. As it should be. :)

Anyway if you're in Utrecht there's a nice 'Advanced Graphics' course. Best time of my life, once a year.

jazzyjackson · on June 15, 2022

I got rejected by the server so here's an archive

https://web.archive.org/web/20220615174927/https://jacco.omp...

henkie_b · on June 16, 2022

Looks like the original is back. Thanks for posting a link to the backup!

chunkyks · on June 15, 2022

This feels like an opportunity to link my own, much sillier, raytracer: https://github.com/chunky/sqlraytracer

jbikker · on June 16, 2022

Very nice. :) Every language deserves its own ray tracer!

kragen · on June 15, 2022

This is super awesome.

chunkyks · on June 16, 2022

Thank you!

jazzyjackson · on June 15, 2022

fantastic, I love examples of SQL used as a general purpose language

chunkyks · on June 16, 2022

"Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy."

https://en.m.wikipedia.org/wiki/Turing_tarpit

rollulus · on June 15, 2022

Based on the name of the author, Jacco Bikker, I can blindly recommend this. He has decades of experience and knows how to combine modern computer graphics algorithms from the academic world with real world low level optimizations and a sauce of magic to create state of the art ray tracers.

Disclaimer: my master thesis work was part of his PhD dissertation. As a teenager I read his articles on flipCode and due to sheer coincidence he ended up as my thesis supervisor.

jbikker · on June 16, 2022

Hi Roel. :) Was a pleasure to work with you!

henkie_bk5 · on June 15, 2022

The last article of the series on BVH construction, ray tracing and GPU ray tracing (using OpenCL) is now complete: 10 articles cover the basics as well as advanced GPGPU topics.

koprulusector · on June 15, 2022

I get an HTTP 403 Forbidden response from the linked URL, FYI. (USA)

jbikker · on June 16, 2022

Server is swamped... Hope to have things back online soon.

tjhill · on June 16, 2022

https://web.archive.org/web/20220615174927/https://jacco.omp...

rcshubhadeep · on June 16, 2022

same here. I am from India

dahart · on June 15, 2022

This is awesome! I’ve been wanting to see what it would look like in OpenCL.

Side note, I hope we graduate away from the terms BLAS & TLAS soon. They make the most sense for a strictly 2-level hierarchy, but the power of ray tracing comes from multi-level, where “top” and “bottom” are ambiguous. This is why OptiX uses IAS (Instance AS) and GAS (Geometry AS) instead.

jbikker · on June 15, 2022

Well if you have more than 2 levels you can always collapse, which is the preferred solution. Just walk the scenegraph, and flatten the matrices. Now you're left with an array of BLASses, each with their final matrix.

RTX has hardware support for this flattened structure only; OptiX will become significantly slower if you force it to have more levels.

dahart · on June 15, 2022

Yes, very true. Or at least you can usually collapse. Memory is the issue, and collapsing multilevel doesn’t always fit.

There is some cost in the current hardware to multi-level traversal, that’s true. But still top & bottom terminology isn’t future-proof, doesn’t translate to CPU, and we might not be limited to 2-level GPU traversal forever. More to the point perhaps is that “top” and “bottom” aren’t words that describe the function. It’d be better to have terms that say what it does rather than where to find it, right?

Full disclosure, I work on OptiX. (But to be clear I literally have no idea if/when we might see multilevel traversal in hardware. I just happen to be in favor of seeing it someday, and if/when it does, TLAS and BLAS will become more awkward or get replaced.)

skratlo · on June 15, 2022

I find their C++ style particularly appalling.

alar44 · on June 15, 2022

It really is hideous. I think they skipped newlines to keep line count as low as possible. /Eyeroll

jazzyjackson · on June 15, 2022

It's just ANSI style without extra newlines, a matter of taste I suppose.

patoroco · on June 15, 2022

The file has disappeared :(

yohann32 · on June 15, 2022

Fantastic, thanks!

motbus3 · on June 16, 2022

link broken

_pn3l · on June 15, 2022

403 from Europe. Cool GDPR policy.

jeroenhd · on June 16, 2022

403 from everywhere, probably got slashdotted. Here's the archive link: https://web.archive.org/web/20220615204139/https://jacco.omp...

jbikker · on June 16, 2022

Yes the site got slashdotted... reached out to the ISP, should be back online soon.

matttb · on June 15, 2022

It's also 403 from NA, don't think it has anything to do with GDPR. This comment has an archived link: https://news.ycombinator.com/item?id=31759691

memorable · on June 16, 2022

403 from Asia (Vietnam) too.