The emacs-gif-screencast package does this very well. Each keystroke (technically, each command) is recorded as a frame, with timestamps used to set the appropriate duration of each frame. e.g. if you stare at the screen for 30 seconds, no additional frames are captured, and that frame lasts 30 seconds (which you can then edit the duration of later in other software).
Note that I don't know how well it would work for long sessions. Theoretically it would be fine, but the way it uses ImageMagick to convert the frames to an animation can be heavy on memory, so that could be a problem. However, the frames could be stitched together with other software that didn't present that problem...
Note that I don't know how well it would work for long sessions. Theoretically it would be fine, but the way it uses ImageMagick to convert the frames to an animation can be heavy on memory, so that could be a problem. However, the frames could be stitched together with other software that didn't present that problem...