Well, any heuristic like that is possibly going to give either false positives o...

Well, any heuristic like that is possibly going to give either false positives or false negatives when applied to a set larger than the training data. A pure white-list approach is definitely an option for determining the accepted inputs, but generally some heuristic that attempts to accept inputs "like" the examples will probably be better.