You can't just train with the negative examples showing filtered content, as tha...

You can't just train with the negative examples showing filtered content, as that could lead to poor generalization. You'd need to supplement with samples from the training set to prevent catastrophic forgetting.

Otherwise it's like taking slices out of someone's brain until they can't recite a poem. Yes, at the end they can't recite a poem, but who knows what else they can no longer do. The positive examples from training essentially tell you what slices you need to put back to keep it functional.