Agreed. The fact that a pixel is an infinitely small point sample - and not a square with area - is something that Monty explained in his demo too: https://youtu.be/cIQ9IXSUzuM?t=484
A pixel is not the sample(s) that is value came from. Given a pixel (of image data) you don't know what samples are behind it. It could have been point-sampled with some optical sensor far smaller than the pixel (but not infinitely small, obviously). Or it could have been sampled with a gaussian bell shaped filter a bit wider than the pixel.
A 100x100 thumbnail that was reduced from a 1000x1000 image might have pixels which are derived from 100 samples of the original image (e.g. a simple average of a 10x10 pixel block). Or other possibilities.
As an abstraction, a pixel definitely doesn't represent a point sample, let alone an infinitely small one. (There could be some special context in which it does but not as a generality.)
> A 100x100 thumbnail that was reduced from a 1000x1000 image might have pixels which are derived from 100 samples of the original image (e.g. a simple average of a 10x10 pixel block). Or other possibilities.
And if a downsampling algorithm tries to approximate a point sample, it'll give you a massively increased chance of ugly moire patterns.
The audio equivalent is that you drop 3/4 of your samples and it reflects the higher frequencies down into the lower ones and hurts the quality. You need to do a low-pass filter first. And "point samples from a source where no frequencies exist above X, also you need to change X before doing certain operations" is very different and significantly more complicated than "point samples". Point samples are one leaky abstraction among many leaky abstractions, not the truth. Especially when an image has a hard edge with frequencies approaching infinity.
But from the pixels alone, you don't know whether the moire is an artifact of sampling of something that was free of moire, or whether an existing image of moire was sampled and reproduced.
Eh, calling it infinitely small is at least as misleading as calling it a square. While they are both mostly correct, neither Monty’s explanation nor Alvy-Rays are all that good. Pixels are samples taken at a specific point, but pixel values do represent area one way or another. Often they are not squares, but on the other hand LCD pixels are pretty square-ish. Camera pixels are integrals over the sensor area, which captures an integral over a solid angle. Pixels don’t have a standard shape, it depends on what capture or display device we’re talking about, but no physical capture or display devices have infinitely small elements.
Camera pixels represent an area, but pixels coming out of a 3D game engine usually represent a point sample. Hand-drawn 2d pixel art is explicitly treating pixels as squares. All of these are valid uses that must coexist on the same computer.
> Camera pixels represent area, but pixels coming out of a 3D game engine usually represent a point sample
Having worked on both game engines and CG films, I think that’s misleading. Point sample is kind of an overloaded term in practice, but I still don’t think your statement is accurate. Many games, especially modern games are modeling pixels with area; explicitly integrating over regions for both visibility and shading calculations. In fact I would say games are generally treating pixels as squares not point samples. That’s what DirectX and Vulkan and OpenGL do. That’s what people typically do with ray tracing APIs in games as well. Even a point sample can still have an associated area, and games always display pixels that have area. The fact that you can’t display a point sample without using area should be reason enough to avoid describing pixels that way.
Conservative rasterization is definitely an area-based square pixel technique. I'm not sure how prevalent it is now that it's been available for a few years. Multisample antialiasing patterns are also constructed with square pixels in mind, but it's relatively rare in games these days due in part to the popularity of deferred rendering.
I'm not sure how raytracing as done by video games can be construed a more area-based than point sample based. It seems to me like ray-hit testing is done in a point-sample manner, and games aren't doing multiple rays per pixel. (They're usually doing less than one ray per pixel, and averaging over several frames that sampled from different locations within a square pixel, then running it all through a blurring denoiser filter and upscaling, but the result is so bad for anything other than a static scene that I don't think it should be used to support any argument.)
> I’m not sure how ray tracing done by video games can be construed a more area-based than point sample based. It seems to me like ray-hit testing is done in a point-sample manner, and games aren’t doing multiple rays per pixel.
Ignoring the denoiser, games are quite commonly using Box Filter for pixels. That’s the square in square pixels. The point sampling is in service of an integral whose shape is a square, and that’s the problem Alvy Ray is talking about.
That point sampling is distinctly different from the “point sample” that represents the pixel value itself, so let’s not conflate them. The averaging over an area that is square shaped, whether it’s multiple samples or over time, is the reason that the pixel shape is summarized as square, not as a point.