Like most statistical smoothing approaches, kernelbased methods suffer from the socalled ``curseofdimensionality''
when applied to multivariate data:
The proportion of the training
data that lie in a fixedradius neighborhood of a point decreases to zero at an
exponential rate
with increasing dimension of the input space.
Due to this problem,
the bandwidth of a weighting kernel must be chosen very big so as to
contain a reasonable sample fraction,
and the estimates produced are typically highly biased.
One possible way to reduce the bias of local linear estimates
is to estimate the ``shape'' of the
weighting kernel using training data.
figure=singidx.eps,width=
figure=singctur.eps,width=

To express this idea formally, we consider
local linear estimates of the form
The task is to choose
and L so as to
guarantee a high predictive performance.
To specify ,
we define the ``entropy'' of the weighting
kernel
k(x_{t},x_{0}) as a function of :
A key issue for the practical realization of this approach
is the numerical minimization of (5). We suggest to use a gradientdescent algorithm where the difficulty consists of evaluating the error gradient
.
We use implicit differentiation
together with the chain rule to compute this
magnitude.
Note that the bandwidth parameter
must be updated after each gradient descent
step which can be computationally expensive. We developed a special algorithm
to carry out this computation efficiently.
We applied kernel shaping to a five and a tendimensional toy example, and a third, realworld example. On the two artificial data sets, kernel shaping clearly outperformed local linear regression using a spherical kernel. Also, it led to a small performance improvement in the realworld example.