This paper addresses the problem of classification in situations where the
data distribution is not homogeneous: Data instances might come from
different locations or times, and therefore are sampled from related but
different distributions. In particular, features may appear in some parts of
the data that are rarely or never seen in others. In most situations with
non-homogeneous data, the training data is not representative of the
distribution under which the classifier must operate. We propose a method,
based on probabilistic graphical models, for utilizing unseen features
during classification. Our method introduces, for each such unseen feature,
a continuous hidden variable describing its influence on the class whether
it tends to be associated with some label. We then use probabilistic
inference over the test data to infer a distribution over the value of this
hidden variable. Intuitively, we learn the role of this unseen feature from
the test set, generalizing from those instances whose label we are fairly
sure about. Our overall probabilistic model is learned from the training
data. In particular, we also learn models for characterizing the role of
unseen features; these models use meta-features of those features, such as
words in the neighborhood of an unseen feature, to infer its role. We
present results for this framework on the task of classifying news articles
and web pages, showing significant improvements over models that do not use
unseen features.