One goal of ML is to automatically discover the hidden casual process that explains the results of real-world, such as mimic and guesses the process of human thoughts so to make similar predictions. However, we humans live in a dynamic world and we keep adjusting our mental process to accommodate the evolving world. But in ML, the old distributions on training may not valid when the model is deployed. It doesn't make sense that in 99% cases a model is trained and deployed then never changed later. That is the major reason why people discussing generalization and a good testing score is always just a score but never means good performance in real-world.
ML model should adjust their learned process on new distributions and the ability to change the behavior of a deployed model is a key point to be a general model. We roughly term this as an open-world learning problem.
Training a model only as a Data Manipulator
One observation from humans is that we do not overfit to many details of the input. Humans' generalization comes from the fact that learning is only on abstractive operation level and we perform good separation between data and their operations. For example, we may automatically forget many numbers but remember the math operations or details of many classes (like more than 10 classes) but as long as the data are presented, we can still perform good classification. Sadly, current ML, especially end2end deep learning, wish to learn everything from the data into the model.
Inspired by the above discussion, we set our trial on taking a data manipulator as an approach to solving the open-world learning problem. We wish to see that when the data is changed during testing, the data manipulator (as an ML model) can still accommodate the change to a certain degree.
As a classic task, traditional classification task only focuses on closed-world learning, where the classes are pre-defined in the training data. The extreme case is when a new class comes during testing/prediction, how could that model handle that? Of course, it will assign one old class to an example belonging to a new class, which is obviously a mistake.
The first step is we need to reject all new classes to make the results correct. Openset recognition is such a problem in CV and we also make a text-classification paper. This problem can be further decomposed as a combination of anomaly detection and closed-world classification and all unknown new classes should be detected first by the anomaly detection before passing into closed-world classification, which may not be a very novel problem.
Going further, we still want to a classifier to support new classes (so to adapt to a new distribution), where you can always expand, for example, the dense layer before the softmax function with a new row of parameters. But this is still not perfect as new classes can still mixed-in with existing. Or say, we need the anomaly detection part to be open-world, too. So the rejection can also be dynamic.
Open-world Classification As such, we really want the classifier to support unlimited classes and the set of known classes can be dynamically maintained, including both adding a new class or delete an old one. Any class not in the known set should be rejected.
All the problem comes when the model remembered (overfit) a specific set of classes too much. We humans probably cache existing classes into our brain too much for a small set of classes. For a large set, as we cannot remember all classes, we actually do many comparisons: give an example, compare with existing examples in existing classes. If none of them is similar, we say we don't know those classes and take that as a new one. Now another example coming from that new class, if we find they are similar, we can do classification on that new class now. In the end, we probably only learn the comparison part but no specific class or specific set. Our paper aims to train such a comparator to manipulate an arbitrary set of classes.