参考:http://www.faqs.org/faqs/ai-faq/neural-nets/part2/section-16.html
早上写了半篇博客,确实是半篇,提到了normalize/standardize/rescale feature,那么到底该不该normalize/standardize/rescale呢???
简单总结一下这篇文章的观点(还是不翻译了吧,看原文更容易理解作者意思):
1、Should I standardize the input variables (column vectors)?
For lack of better prior information, it is common to standardize each input to the same range or the same standard deviation. If you know that some inputs are more important than others, it may help to scale the inputs such that the more important ones have larger variances and/or ranges.
If the input variables are combined via a distance function (such as Euclidean distance) in an RBF network, standardizing inputs can be crucial.
it is essential to rescale the inputs so that their variability reflects their importance, or at least is not in inverse relation to their importance.
If the input variables are combined linearly, as in an MLP, then it is rarely strictly necessary to standardize the inputs, at least in theory. The reason is...
2、what range is better?
In particular, scaling the inputs to [-1,1] will work better than [0,1], although any scaling that sets to zero the mean or median or other measure of central tendency is likely to
be as good, and robust estimators of location and scale (Iglewicz, 1983) will be even better for input variables with extreme outliers.
It is also bad to have the data confined to a very narraw range such as [-0.1,0.1], as shown at lines-0.1to0.1.gif, since most of the initial hyperplanes will miss such a small region.
Thus it is easy to see that you will get better initializations if the data are centered near zero and if most of the data are distributed over an interval of roughly [-1,1] or [-2,2].
3、Two of the most useful ways to standardize inputs are:
o Mean 0 and standard deviation 1
o Midrange 0 and range 2 (i.e., minimum -1 and maximum 1)
Note that statistics such as the mean and standard deviation are computed from the training data, not from the validation or test data. The validation and test data must be standardized using the statistics computed from the training data.(注意实现细节,如果自己实现,过程一定要正确)
3、Should I standardize the target variables (column vectors)?
Standardizing target variables is typically more a convenience for getting good initial weights than a necessity. However, if you have two or more target variables and your error function is scale-sensitive like the usual least (mean) squares error function, then the variability of each target relative to the others can effect how well the net learns that target.
So it is essential to rescale the targets so that their variability reflects their importance, or at least is not in inverse relation to their importance. If the targets are of equal
importance, they should typically be standardized to the same range or the same standard deviation.
The scaling of the targets does not affect their importance in training if you use maximum likelihood estimation and estimate a separate scale parameter (such as a standard deviation) for each target variable.
If you are standardizing targets to force the values into the range of the output activation function, it is important to use lower and upper bounds for the values, rather than the minimum and maximum values in the training set.(使用Y的最大值和最小值,而不是训练集中看到的Y的极大值和极小值)。If the target variable does not have known upper and lower bounds, it is not advisable to use an output activation function with a bounded range. You can use an identity output activation function or other unbounded output activation function instead; see Why use activation functions?
4、Should I standardize the variables (column vectors) for unsupervised learning?
The most commonly used methods of unsupervised learning, including various kinds of vector quantization, Kohonen networks, Hebbian learning, etc., depend on Euclidean distances or scalar-product similarity measures. The considerations are therefore the same as for standardizing inputs in RBF networks--see Should I standardize the input variables (column vectors)? above. In particular, if one input has a large variance and another a small variance, the latter will have little or no influence on the results.
If you are using unsupervised competitive learning to try to discover natural clusters in the data, rather than for data compression, simply standardizing the variables may be inadequate.
5、Should I standardize the input cases (row vectors)?
Whereas standardizing variables is usually beneficial, the effect of standardizing cases (row vectors) depends on the particular data. Cases are typically standardized only across the input variables, since including the target variable(s) in the standardization would make prediction impossible.
There are some kinds of networks, such as simple Kohonen nets, where it is necessary to standardize the input cases to a common Euclidean length; this is a side effect of the use of the inner product as a similarity measure. If the network is modified to operate on Euclidean distances instead of inner products, it is no longer necessary to standardize the input cases.
Standardization of cases should be approached with caution because it discards information. If that information is irrelevant, then standardizing cases can be quite helpful. If that information is important, then standardizing cases can be disastrous. Issues regarding the standardization of cases must be carefully evaluated in every application. There are no rules of thumb that apply to all applications.