Would love to hear some metrics on training it on your personal computer rather than a "cloud GPU box". I don't care if it takes 3 months to train if I have something good, offline, and free(ish, but just pay electric bills)
Each H100 can do 60 TFLOPS of f32 operations, while a single RTX 3080 can do roughly half that (just under 30). So complete back-of-the-envelope answer would be 16x as long (since nanochat is targeting four hours with 8xH100)
64 hours isn’t too bad at all!
(An RTX 2080 can only do 10 TFLOPS for fp32, so that would be again 3x as long.)