代价函数定义: 代价函数 j ( θ 0 , θ 1 ) = 1 2 m ∑ m = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 j(\theta_{0},\theta_{1}) = \frac{1}{2m} \sum_{m=1}^m(h_{\theta}( x^{(i)}) - y^{(i)})^2 j(θ0,θ1)=2m1m=1∑m(hθ(x(i))−y(i))2 即所有样本 x ( i ) x^{(i)} x(i)通过模型 h θ ( x ( i ) ) h_{\theta}( x^{(i)} ) hθ(x(i))计算出来预测值,与实际值 y ( i ) y^{(i)} y(i)的方差,方差越小,说明模型 h θ ( x ) = θ 0 + θ 1 x h_{\theta}( x) = \theta_{0} + \theta_{1}x hθ(x)=θ0+θ1x对样本拟合度越高
假设代价函数 j ( θ 0 , θ 1 ) j(\theta_{0},\theta_{1}) j(θ0,θ1)与 θ 0 , θ 1 \theta_{0},\theta_{1} θ0,θ1的关系如下
注意:线性回归的代价函数是一个凸函数,有唯一全局最小值,有兴趣的朋友自行查阅资料推导当 j ( θ 0 , θ 1 ) j(\theta_{0},\theta_{1}) j(θ0,θ1)的值为全局最小值,如何求 θ 0 , θ 1 \theta_{0},\theta_{1} θ0,θ1呢?
梯度下降定义: 梯度下降法的核心是,首先随机找一个点(即随机给 θ 0 , θ 1 \theta_{0},\theta_{1} θ0,θ1 赋值),每次在原来点的基础上,在 θ 0 \theta_{0} θ0方向上移动 − α ∂ ∂ θ 0 j ( θ 0 , θ 1 ) -\alpha{\frac{\partial}{\partial\theta_{0}} }j(\theta_{0},\theta_{1}) −α∂θ0∂j(θ0,θ1)距离,在 θ 1 \theta_{1} θ1方向上移动 − α ∂ ∂ θ 1 j ( θ 0 , θ 1 ) -\alpha{\frac{\partial}{\partial\theta_{1}} }j(\theta_{0},\theta_{1}) −α∂θ1∂j(θ0,θ1)距离,不断重复以上步骤,即可让 θ 0 , θ 1 \theta_{0},\theta_{1} θ0,θ1不断向最小值的点 θ 0 m i n , θ 1 m i n \theta_{0min},\theta_{1min} θ0min,θ1min靠拢。
为何要移动 − α ∂ ∂ θ 0 j ( θ 0 , θ 1 ) -\alpha{\frac{\partial}{\partial\theta_{0}} }j(\theta_{0},\theta_{1}) −α∂θ0∂j(θ0,θ1)∂ ∂ θ 0 j ( θ 0 , θ 1 ) {\frac{\partial}{\partial\theta_{0}} }j(\theta_{0},\theta_{1}) ∂θ0∂j(θ0,θ1)是目标函数 j ( θ 0 , θ 1 ) j(\theta_{0},\theta_{1}) j(θ0,θ1)在 θ 0 \theta_{0} θ0方向上的斜率,当斜率小于0时,此时 θ 0 \theta_{0} θ0小于 θ 0 m i n \theta_{0min} θ0min,即 θ 0 : = θ 0 − α ∂ ∂ θ 0 j ( θ 0 , θ 1 ) \theta_{0} := \theta_{0} -\alpha{\frac{\partial}{\partial\theta_{0}} }j(\theta_{0},\theta_{1}) θ0:=θ0−α∂θ0∂j(θ0,θ1)会让 θ 0 \theta_{0} θ0变大,往 θ 0 m i n \theta_{0min} θ0min靠近,同样道理当斜率大于0时, θ 0 \theta_{0} θ0会变小,往 θ 0 m i n \theta_{0min} θ0min靠近。当 θ 0 \theta_{0} θ0越靠近 θ 0 m i n \theta_{0min} θ0min,斜率变化越来越小, θ 0 m i n \theta_{0min} θ0min斜率等于0, θ 0 \theta_{0} θ0靠近 θ 0 m i n \theta_{0min} θ0min速度越来越慢,直到 θ 0 ≈ θ 0 m i n \theta_{0} \approx \theta_{0min} θ0≈θ0min重复计算, θ 0 \theta_{0} θ0的值几乎不变,同样道理可以求出 θ 1 \theta_{1} θ1
∂ ∂ θ 0 j ( θ 0 , θ 1 ) , ∂ ∂ θ 1 j ( θ 0 , θ 1 ) {\frac{\partial}{\partial\theta_{0}} }j(\theta_{0},\theta_{1}),{\frac{\partial}{\partial\theta_{1}} }j(\theta_{0},\theta_{1}) ∂θ0∂j(θ0,θ1),∂θ1∂j(θ0,θ1)计算分别将 j ( θ 0 , θ 1 ) = 1 2 m ∑ m = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 j(\theta_{0},\theta_{1}) = \frac{1}{2m} \sum_{m=1}^m(h_{\theta}( x^{(i)}) - y^{(i)})^2 j(θ0,θ1)=2m1m=1∑m(hθ(x(i))−y(i))2代入,得到 在将 h θ ( x ) = θ 0 + θ 1 x h_{\theta}( x) = \theta_{0} + \theta_{1}x hθ(x)=θ0+θ1x代入,最后发现,每次循环我们计算偏导数,就是计算整个训练样本的总和。