线性回归的梯度下降法与训练集关系

tech2025-03-12  3

线性回归的梯度下降法与训练集关系

代价函数定义: 代价函数 j ( θ 0 , θ 1 ) = 1 2 m ∑ m = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 j(\theta_{0},\theta_{1}) = \frac{1}{2m} \sum_{m=1}^m(h_{\theta}( x^{(i)}) - y^{(i)})^2 j(θ0θ1)=2m1m=1m(hθ(x(i))y(i))2 即所有样本 x ( i ) x^{(i)} x(i)通过模型 h θ ( x ( i ) ) h_{\theta}( x^{(i)} ) hθ(x(i))计算出来预测值,与实际值 y ( i ) y^{(i)} y(i)的方差,方差越小,说明模型 h θ ( x ) = θ 0 + θ 1 x h_{\theta}( x) = \theta_{0} + \theta_{1}x hθ(x)=θ0+θ1x对样本拟合度越高

假设代价函数 j ( θ 0 , θ 1 ) j(\theta_{0},\theta_{1}) j(θ0,θ1) θ 0 , θ 1 \theta_{0},\theta_{1} θ0,θ1的关系如下

注意:线性回归的代价函数是一个凸函数,有唯一全局最小值,有兴趣的朋友自行查阅资料推导

j ( θ 0 , θ 1 ) j(\theta_{0},\theta_{1}) j(θ0,θ1)的值为全局最小值,如何求 θ 0 , θ 1 \theta_{0},\theta_{1} θ0,θ1呢?

梯度下降定义: 梯度下降法的核心是,首先随机找一个点(即随机给 θ 0 , θ 1 \theta_{0},\theta_{1} θ0,θ1 赋值),每次在原来点的基础上,在 θ 0 \theta_{0} θ0方向上移动 − α ∂ ∂ θ 0 j ( θ 0 , θ 1 ) -\alpha{\frac{\partial}{\partial\theta_{0}} }j(\theta_{0},\theta_{1}) αθ0j(θ0,θ1)距离,在 θ 1 \theta_{1} θ1方向上移动 − α ∂ ∂ θ 1 j ( θ 0 , θ 1 ) -\alpha{\frac{\partial}{\partial\theta_{1}} }j(\theta_{0},\theta_{1}) αθ1j(θ0,θ1)距离,不断重复以上步骤,即可让 θ 0 , θ 1 \theta_{0},\theta_{1} θ0,θ1不断向最小值的点 θ 0 m i n , θ 1 m i n \theta_{0min},\theta_{1min} θ0min,θ1min靠拢。

为何要移动 − α ∂ ∂ θ 0 j ( θ 0 , θ 1 ) -\alpha{\frac{\partial}{\partial\theta_{0}} }j(\theta_{0},\theta_{1}) αθ0j(θ0,θ1)

∂ ∂ θ 0 j ( θ 0 , θ 1 ) {\frac{\partial}{\partial\theta_{0}} }j(\theta_{0},\theta_{1}) θ0j(θ0,θ1)是目标函数 j ( θ 0 , θ 1 ) j(\theta_{0},\theta_{1}) j(θ0,θ1) θ 0 \theta_{0} θ0方向上的斜率,当斜率小于0时,此时 θ 0 \theta_{0} θ0小于 θ 0 m i n \theta_{0min} θ0min,即 θ 0 : = θ 0 − α ∂ ∂ θ 0 j ( θ 0 , θ 1 ) \theta_{0} := \theta_{0} -\alpha{\frac{\partial}{\partial\theta_{0}} }j(\theta_{0},\theta_{1}) θ0:=θ0αθ0j(θ0,θ1)会让 θ 0 \theta_{0} θ0变大,往 θ 0 m i n \theta_{0min} θ0min靠近,同样道理当斜率大于0时, θ 0 \theta_{0} θ0会变小,往 θ 0 m i n \theta_{0min} θ0min靠近。当 θ 0 \theta_{0} θ0越靠近 θ 0 m i n \theta_{0min} θ0min,斜率变化越来越小, θ 0 m i n \theta_{0min} θ0min斜率等于0, θ 0 \theta_{0} θ0靠近 θ 0 m i n \theta_{0min} θ0min速度越来越慢,直到 θ 0 ≈ θ 0 m i n \theta_{0} \approx \theta_{0min} θ0θ0min重复计算, θ 0 \theta_{0} θ0的值几乎不变,同样道理可以求出 θ 1 \theta_{1} θ1

∂ ∂ θ 0 j ( θ 0 , θ 1 ) , ∂ ∂ θ 1 j ( θ 0 , θ 1 ) {\frac{\partial}{\partial\theta_{0}} }j(\theta_{0},\theta_{1}),{\frac{\partial}{\partial\theta_{1}} }j(\theta_{0},\theta_{1}) θ0j(θ0,θ1)θ1j(θ0,θ1)计算

分别将 j ( θ 0 , θ 1 ) = 1 2 m ∑ m = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 j(\theta_{0},\theta_{1}) = \frac{1}{2m} \sum_{m=1}^m(h_{\theta}( x^{(i)}) - y^{(i)})^2 j(θ0θ1)=2m1m=1m(hθ(x(i))y(i))2代入,得到 在将 h θ ( x ) = θ 0 + θ 1 x h_{\theta}( x) = \theta_{0} + \theta_{1}x hθ(x)=θ0+θ1x代入,最后发现,每次循环我们计算偏导数,就是计算整个训练样本的总和。

最新回复(0)