Well, any heuristic like that is possibly going to give either false positives or false negatives when applied to a set larger than the training data. A pure white-list approach is definitely an option for determining the accepted inputs, but generally some heuristic that attempts to accept inputs "like" the examples will probably be better.