Take, for example, an 8-bit/byte store. Without BWX the sequence would be something like:
bic a0, #3, t1 and a0, #3, t4 ldl t2, (t1) insbl a1, t4, t3 mskbl t2, t4, t2 bis t2, t3, t2 stl t2, (t1) ret zero, (ra)
stb a1, 0(a0) ret zero, (ra)