E = g ( f θ ( x ) , y ) \Epsilon = g(f_\theta(x),y) E=g(fθ(x),y)
f θ ( x ) f_\theta(x) fθ(x)为当前模型下y的预测值 g ( . ) g(.) g(.)为误差的计算方式θ ′ = θ − η ∗ ▽ θ ∗ E \theta^{'} =\theta-\eta*\bigtriangledown_\theta*\Epsilon θ′=θ−η∗▽θ∗E η \eta η:学习率 ▽ θ \bigtriangledown_\theta ▽θ:梯度 E \Epsilon E:误差loss
