Summary
Some basics of statistical learning; least squares and k nearest neighbors; statistical decision theory; local methods in high dimensions
Skeleton notes
Abbreviations
Notations
Fonts:
Symbols
Least square model:
Residual sum of squares (RSS):
The parameters we need is the set that minimizes RSS, which requires
So we can solve the parameters easily.
For input data , calculate the Euclidean distance between
and other input data
.
Choose the nearest neighbors based on the distance.
Output prediction is determined by average of the corresponding outputs of the selected inputs.
Metric of Distance
For the calculation of distance, metric must be implemented. The book used examples of Euclidean metric. Another metric that can be inspiring is the hyperbolic space. I talked about this in Popularity vs Similarity in Growing Networks.
Mixture of Gaussian
Mixture of Gaussians can be described by generative model. I am not really sure what that is. It seems to me that the final data is basically generated from Gaussians of different parameters which are generated randomly.
Given input and output
;
Following a joint distribution ;
Based on input and output, we look for a function that predicts the behavior, i.e., ;
How well the prediction is defined by squared error loss .
With the distribution, we predict the expected prediction error (EPE) as
The book derived that the best prediction is .
Different loss functions lead to different EPE’s.
About Probability Distribution
Question: Can we simply solve the probability distribution and find out the function of prediction? The conclusion says the best prediction of is the conditional mean. Is it effectively solving
from the probability distribution?
Some comments on this section
Curse of high dimensions: edge length of a cube of volume is
. An extreme example:
.
Small volume leads to high variance.
Homogeneous sampling doesn’t work in high dimensions. Since most points will fall near the edges.
Volume of 10D sphere as a function of radius.
Requires huge number of sample points in high dimensions.
Weird SubSection
I didn’t not get the point of this subsection. It seems that the authors are talking about whether it is proper to assume the relation between input and output is deterministic.
© 2018, Lei Ma| GitHub| Statistical Mechanics Notebook | Index | Page Source| changelog| Created with Sphinx