ps:this article is not very strict,just some ml and mathematic basic knowledge.My english is poor too.So If this passage make you confuse and uncomfortable.Please give me a feedback in the comment :-D
Prior to this(在此之前), we learned the concept of single feature version. There is a model consists of only one feature ,denote as X. But in some situations, such as a real estate agency,we have to considerate more elements,size,floor,age....and so on.After analyzing all there factor , a useful and conprehensive model will emerge.
First,we shall need to learn some notitions:
The description of model function also changes:
For this new version, in this linear model with two or more features,we entitle this model as 'multiple linear regression'.
When we have many features,our model will become complex: f(x)=w1x1+w2x2+w3x3+w4x4+w5x5......... It not only results in too many code,but also randers a heavy time and space cost. so we can use veciorization to simplify this model.Fonnlty , we have a linear mathematical library in python.
#in python library ‘NumPy’,we can use it
w=np.array([x1,x2,x3])
x=np.array([w1,w2,w3])
b=4
f=np,dot(x,w)+f
#the function 'dot' implements dot product.
# less codes,faster counting
So,we get a useful skill. But how to iteration? Yep,we have more features than before. these features also need to iterate.In the gradient descend,just memory that simultaneously update
(repeat simyltaneously update) w1=w1-.... ...... b=b-,,,,,
In other hands,we can also say ‘rescale’.when one of feature has a too big or small scale or range,like this
In the pervious blog, I use some notion which defined in other subject to discribe how to judge what time can we abort iteration. Making sure you parameters is working correctly.J should decrease after every iteration.And If J decreases by < 0.001(a communal number) in one interation , then declare converge.
how to choose learning rate Too big rate may be diperse in a gradience .
however , a too small is reliable than bigger though it will make it slower. Well,our method is set a enough small value as original rate.If J can converge with this rate,we try to lightly improve rate until we have a suit rate for this model. like:0.001,0,005,0,01.....progressivily imporve and guess.
Feature engineer: design new features by tranforming or combincing basing on original features.
polynomial regression: no-linear,maybe multiple features.