将权重随机初始化为非常大,放大十倍 parameters[‘W’ + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1]) *10
运行结果: On the train set:Accuracy: 0.83 On the test set:Accuracy: 0.86
权重矩阵W随机初始化为 parameters[‘W’ + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1])
运行结果: On the train set:Accuracy: 0.9966666666666667 On the test set:Accuracy: 0.96
He初始化建议使用的ReLU激活层 权重矩阵W初始化为 parameters[‘W’ + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1]) * np.sqrt(2./layers_dims[l-1]) 运行结果: On the train set:Accuracy: 0.9933333333333333 On the test set:Accuracy: 0.96
随机初始化的起始代价高于He初始化,但是在经过一次迭代之后,代价基本持平,并且两种趋势基本一致。
需要在代价函数中加入正则化项,并且在反向传播时更改仅涉及dW的计算公式。对于每一个dW,必须添加相对应的正则化项的梯度
如:dW3 = 1./m * np.dot(dZ3, A2.T) + lambd/m * W3 设置λ值为0.7: parameters = model(train_X, train_Y, lambd = 0.7) 运行结果: On the train set:Accuracy: 0.9383886255924171 On the test set:Accuracy: 0.93
验证:L2正则化使决策边界更平滑。如果λ太大,则也可能“过度平滑”,从而使模型偏差较高。 将λ值设置为0.9 运行结果: On the train set:Accuracy: 0.9241706161137441 On the test set:Accuracy: 0.93
将λ值设置为0.5 运行结果: On the train set:Accuracy: 0.9478672985781991 On the test set:Accuracy: 0.94
将λ值设置为0.3 运行结果: On the train set:Accuracy: 0.919431279620853 On the test set:Accuracy: 0.945
将λ值设置为0.1 运行结果: On the train set:Accuracy: 0.9383886255924171 On the test set:Accuracy: 0.95
将λ值设置为0.05 运行结果: On the train set:Accuracy: 0.9383886255924171 On the test set:Accuracy: 0.935
总结:在λ取0.9,0.7,0.5,0.3,0.1,0.05中,结果表明λ取0.1效果最好,0.05发生过拟合现象
当λ取0,0.1,0.2,…,1绘制在测试集上的精度曲线:
总结:不加入正则化项的效果最差,当λ取0.6时效果最好。
