The biggest difference is when you feed a sequence into a decoder only model, it will only attend to previous tokens when computing hidden states for the current token. So the hidden states for the nth token is only based on tokens <n. This is where you hear the talk about "causal masking", as the attention matrix is masked to achieve this restriction. Encoder architectures on the other hand allow for each position in the sequence to attend to every other position in the sequence.
Encoder architectures have been used for semantic analysis, and feature extraction of sequences, and encoder only for generation (i.e. next token prediction).
Encoder architectures have been used for semantic analysis, and feature extraction of sequences, and encoder only for generation (i.e. next token prediction).