Hi everyone, I wrote the article. I do consider this overfitting because we are ...

thaumasiotes · on Aug 6, 2021

> Besides, overfitting is defined procedurally, not by how well it performs.

Huh? That's the opposite of the truth.

Compare https://en.wikipedia.org/wiki/Overfitting :

> In statistics, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably".

> The essence of overfitting is to have unknowingly extracted some of the residual variation (i.e. the noise) as if that variation represented underlying model structure.

Procedural concerns are not part of the concept. Conceptually, overfitting means including information in your model that isn't relevant to the prediction you're making, but that is helpful, by coincidence, in the data you're fitting the model to.

But since that can't be measured, instead, you measure overfitting through performance.

elandau25 · on Aug 6, 2021

If we are taking wikipedia as ground truth, the next line is:

An overfitted model is a statistical model that contains more parameters than can be justified by the data.

Another definition from https://www.ibm.com/cloud/learn/overfitting#:~:text=Overfitt...:

Overfitting is a concept in data science, which occurs when a statistical model fits exactly against its training data.

We can argue over the precise definition of overfitting, but when you fitting a high-capacity model exactly to the training data, that is a procedural question and I would argue falls under the overfitting umbrella.

fny · on Aug 6, 2021

I'm going to agree this isn't over fitting. And it's dangerous to imply over fitting is ever a good thing.

If your objective function is to ID one type of Batman, you just have a very specific objective function which could also suffer from over fitting (i.e. it's unable to perform well against out of sample Batman of the same type).

The reason I push back is that many healthcare models are micromodels by your definition: they may only seek to perform well on images from one machine at a single provider location on a single outcome, but they also have over fitting issues from time to time since the training data isn't diverse enough for the hyper specific objective.

elandau25 · on Aug 6, 2021

It is important to note that these micro-models are only supposed to be used in the annotation process. During annotation there is a separate process for QA where there will be some form of human supervision. Micro-models are NOT supposed to be used for production environments.

100% agree on the healthcare front, which actually perfectly underlies the point here. These models are overfit to one specific modality but often used for generic purposes. One reason why it is important to define micro-models is to point them out when they are deployed in a live production environment, which I agree is very dangerous. Many healthcare models are truly not ready for live diagnostic settings. On the other hand, these same models often do perform well on assisting the actual annotation of new data when applied to the right domain and with appropriate human supervision. This is a perfect encapsulation of the distinction we are trying to make.

fny · on Aug 6, 2021

You're missing my point. You can make a good micro-model that is very, very targeted and does not overfit. Sure, it won't generalize outside your target, but you know you can use it at clinic X with machine Y to predict Z.

This is why I'm saying you aren't describing overfitting but instead having a very, very specific objective function.

ALittleLight · on Aug 6, 2021

I would think a clear sign of overfitting is if your training loss is decreasing while your validation loss is increasing. That tells me your model is learning to perform better on the training data at a cost of performing worse on more general data.

Is that happening when you train these micromodels? If not, I have a hard time seeing how it's overfitting because the model is still performing well for the data you train it on and use it on. If that is happening, then I don't see the benefit of it. A model that wasn't overfit would just do better at the task of collecting additional training data.

I think the approach you're talking about makes sense - create a simple model rapidly and leverage it to get more training data which you can then use to refine the model to be better still. I just don't think the term "overfitting" describes that process well - unless I'm misunderstanding something.

elanning · on Aug 6, 2021

As I understand it, overfitting is low bias, but high variance. It's perfectly fitting 5 linear data points with a complex polynomial, when the underlying function was a line. Thus the polynomial doesn't generalize well to more data points not in the training set. Your model seems to be fitting points in the training set and the evaluation set just fine. Of course if different batman's were in the evaluation set, it would suddenly be doing terrible, but you can pretty much do that to every machine learning model. It wouldn't fit a lot of underlying assumptions of statistics and machine learning, eg i.i.d and evaluation sets/training sets being from the same distribution. Your definition of overfitting thus seems more like transfer learning, in some sense.

elandau25 · on Aug 6, 2021

I see what you are saying, but in that context then you lose what most people's intuitive definition of overfitting is. If I train a model on one image as my train set and then change one random pixel and run that model on this eval set then your argument would be that this is not overfitting because you are performing well on the eval set you created the model for.

My argument is that compared to models, as most people use them, micro-models are low bias and high variance, and thus overfit. That's why I set a distinction between a batman model and a batman micro-model.

zwaps · on Aug 6, 2021

On your target data your model is not high variance.

The way you use over-fitting is misleading. In fact, according to the article, the model is fit just right for its purpose. If it were fit any less, given the five pictures, it might not work at all. Your confusion arises because what you actually change is the objective and the DGP in question.

It should be clear to anyone that over-fitting and under-fitting is conceptually tied to the DGP under consideration. It makes no sense to speak of a model being "generally over fit" (!)

An "intuitive definition" of over-fitting that does not take into account this crucial fact will always be problematic.

For instance, if you train a model to have zero error, it does not imply it is over fit. If your training set is broad enough, and the production environment has the same exact underlying DGP, then the model is simply fit well. In practice, the training data is not the same as all the data coming from the latent DGP that the model eventually encounters. For that reason, such a model would be overfit.

However, in this case, the model does not seem to fail on any DGP that corresponds to the task: Identifying one type of Batman. It is therefore not overfit.

I am sorry, but op is right.

elandau25 · on Aug 6, 2021

Fair enough, but target data in this sense IS a full distribution of Batmen. This approach is towards the goal of creating a broad dataset and fitting a full Batman model. We are training on a narrower subset of our actual target data and fitting to that narrow subset, whether you want to call that overfitting or not I suppose depends on your perspective.

I agree intuitive definitions are often murky, but given we are already throwing in murky notions of intention that are implicit in the word "target", I think an at least colloquial usage of overfitting is appropriate.

Sometimes we try micro-models on broader domains than what we expect they will work for, and they work fine. Sometimes not. The point is that the target here is not well defined because we are just using them as annotation tools with some human supervision and not in a "typical" production environment.

MillenialMan · on Aug 6, 2021

Is that overfitting? I thought overfitting was explicitly when your model fits itself so well to the training data that its ability to generalise to your target environment begins to decline. The size of the training set and the performance on it don't really matter, all that matters is that performance in the specific environment you intend to deploy the model in starts dropping. In your case, you're training the models for a very narrow deployment target, and they remain competent.

Couldn't you use this logic to say that AlphaGo is overfit because it can only play Go, not chess?