Submissions from gilesthomas.com

		Writing an LLM from scratch, part 32a – Interventions: training a baseline model (gilesthomas.com)
		3 points by ibobev 13 days ago \| past \| discuss
		Writing an LLM from scratch, part 32B – Interventions: gradient clipping (gilesthomas.com)
		1 point by ibobev 13 days ago \| past \| discuss
		Writing an LLM from scratch, part 32c – Interventions: removing dropout (gilesthomas.com)
		1 point by ibobev 13 days ago \| past \| discuss
		Writing an LLM from scratch, part 32d – Interventions: adding attention bias (gilesthomas.com)
		1 point by ibobev 13 days ago \| past \| discuss
		Writing an LLM from scratch, part 32d – Interventions: adding attention bias (gilesthomas.com)
		6 points by gpjt 15 days ago \| past
		Writing an LLM from scratch, part 32c – Interventions: removing dropout (gilesthomas.com)
		1 point by gpjt 16 days ago \| past
		Writing an LLM from scratch, part 32B – Interventions: gradient clipping (gilesthomas.com)
		2 points by gpjt 17 days ago \| past
		Writing an LLM from scratch, part 32a – Interventions: training a baseline model (gilesthomas.com)
		1 point by gpjt 18 days ago \| past
		Getting a Custom PyTorch LLM onto the Hugging Face Hub (gilesthomas.com)
		1 point by ibobev 24 days ago \| past
		Getting a Custom PyTorch LLM onto the Hugging Face Hub (gilesthomas.com)
		1 point by gpjt 24 days ago \| past
		Writing an LLM from scratch, part 31 – the models are now on Hugging Face (gilesthomas.com)
		1 point by ibobev 34 days ago \| past
		Writing an LLM from scratch, part 31 – the models are now on Hugging Face (gilesthomas.com)
		2 points by gpjt 35 days ago \| past
		Digging into the LLM-as-a-Judge Results (gilesthomas.com)
		1 point by ibobev 43 days ago \| past
		Digging into the LLM-as-a-Judge Results (gilesthomas.com)
		1 point by ibobev 44 days ago \| past
		Writing an LLM from scratch, part 30 – digging into the LLM-as-a-judge results (gilesthomas.com)
		1 point by gpjt 44 days ago \| past
		Using DistributedDataParallel to train a base model from scratch in the cloud (gilesthomas.com)
		10 points by ibobev 45 days ago \| past
		LLM from scratch, part 29 – using DDP to train a base model in the cloud (gilesthomas.com)
		2 points by gpjt 45 days ago \| past
		LLM from scratch, part 28 – training a base model from scratch on an RTX 3090 (gilesthomas.com)
		540 points by gpjt 81 days ago \| past \| 121 comments
		Why smart instruction-following makes prompt injection easier (gilesthomas.com)
		2 points by ibobev 3 months ago \| past
		Writing an LLM from scratch, part 27 – what's left, and what's next? (gilesthomas.com)
		1 point by gpjt 3 months ago \| past
		Writing an LLM from scratch, part 26 – evaluating the fine-tuned model (gilesthomas.com)
		4 points by gpjt 3 months ago \| past
		Writing an LLM from scratch, part 25 – instruction fine-tuning (gilesthomas.com)
		2 points by gpjt 3 months ago \| past
		Writing an LLM from scratch, part 24 – the transcript hack (gilesthomas.com)
		1 point by gpjt 3 months ago \| past
		Retro Language Models: Rebuilding Karpathy's RNN in PyTorch (gilesthomas.com)
		1 point by ibobev 3 months ago \| past
		Writing an LLM from scratch, part 23 – fine-tuning for classification (gilesthomas.com)
		1 point by ibobev 3 months ago \| past
		Retro Language Models: Rebuilding Karpathy's RNN in PyTorch (gilesthomas.com)
		3 points by gpjt 4 months ago \| past
		Writing an LLM from scratch, part 23 – fine-tuning for classification (gilesthomas.com)
		1 point by gpjt 4 months ago \| past
		Writing an LLM from scratch, part 22 – training our LLM (gilesthomas.com)
		254 points by gpjt 4 months ago \| past \| 10 comments
		Revisiting Karpathy's 'The Unreasonable Effectiveness of RNNs' (gilesthomas.com)
		1 point by ibobev 4 months ago \| past
		Revisiting Karpathy's 'Unreasonable Effectiveness of Recurrent Neural Networks' (gilesthomas.com)
		2 points by gpjt 4 months ago \| past
		More