Like most statistical smoothing approaches, kernel-based methods suffer from the so-called ``curse-of-dimensionality''
when applied to multivariate data:
The proportion of the training
data that lie in a fixed-radius neighborhood of a point decreases to zero at an
exponential rate
with increasing dimension of the input space.
Due to this problem,
the bandwidth of a weighting kernel must be chosen very big so as to
contain a reasonable sample fraction,
and the estimates produced are typically highly biased.
One possible way to reduce the bias of local linear estimates
is to estimate the ``shape'' of the
weighting kernel using training data.
figure=singidx.eps,width=
figure=singctur.eps,width=
|
To express this idea formally, we consider
local linear estimates of the form
The task is to choose
and L so as to
guarantee a high predictive performance.
To specify ,
we define the ``entropy'' of the weighting
kernel
k(xt,x0) as a function of :
A key issue for the practical realization of this approach
is the numerical minimization of (5). We suggest to use a gradient-descent algorithm where the difficulty consists of evaluating the error gradient
.
We use implicit differentiation
together with the chain rule to compute this
magnitude.
Note that the bandwidth parameter
must be updated after each gradient descent
step which can be computationally expensive. We developed a special algorithm
to carry out this computation efficiently.
We applied kernel shaping to a five- and a ten-dimensional toy example, and a third, real-world example. On the two artificial data sets, kernel shaping clearly outperformed local linear regression using a spherical kernel. Also, it led to a small performance improvement in the real-world example.