Factors affecting the accuracy of Neural Network
There are several parameters on which the accuracy of Machine Learning model depends. Parameters here are not weights, you will see important parameters below. In this tutorial I will share the parameters that directly affect the accuracy of any major Machine Learning model.
I assume that you have tried to multilayer neural network as recommended in the previous tutorial.
Following are the parameters that need to be tuned.
No of hidden layers and units:
Hidden layers and the units are in relation with each other, the more the one thing the less the other would work better in general.
Standard Deviation of weights:
You might have noticed that the loss sometimes was way too high at several steps and could give Na
as a result or even you might see a very low accuracy, in order to minimize it you can add standard deviation for the weights.
Replace the weights with the following in the multi-layer neural network.
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1],stddev=0.03)),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2],stddev=0.01)),
'h3': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_3],stddev=0.01)),
'out': tf.Variable(tf.random_normal([n_hidden_3, n_classes],stddev=0.03))
}
No of steps/iteration:
This is also one of the important feature, I have seen people doing the same neural network with 200k iteration to achieve 97%, and some have done it in 14k steps.
I achieved around 95% in 12k steps after trying several times. Try it around 5k and increase immediately when you see there is an improvement.
L2 and Dropout regularization:
These both techniques might work well together but sometimes they don’t. One of the reason is the standard deviation. Sometimes L2 is enough to make it work better and sometimes only Dropout is enough. Using both together would definitely require the manual setup of SD. Also the Dropout makes a positive impact late in the game. Also L2 has a constant and Dropout has the probability that need to be tuned well.
Learning rate and its decay:
Learning rate is used in the gradient descent equation, higher the learning rate the faster the learning in the start but soon it will become constant overtime, and lowering the learning rate would make the learning slow but it would even learn in the future. Learning rate decay is another technique to increase the accuracy, it simply decreases the learning rate along with iterations. This works well if the number of steps are very high.
Change the optimizer assignment to this:
global_step =tf.Variable(0) # count the number ofsteps taken.learning_rate =tf.train.exponential_decay(0.3, global_step, 3500, 0.86, staircase=True)optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)
See More about Exponential Decay. Here is the full code after adding the standard deviation and learning rate decay.
There is question that does Stochastic Gradient Descent increase the accuracy? We know SGD are essential in deep learning especially because of its speed but the accuracy is not guaranteed to increase, it might decrease or remains same. Simple Gradient Descent are usually more accurate because of its convergence to the global minimum but take a lot of time since it iterates over the whole dataset. But in some models like ours SGDs have improved accuracy also, and two of its parameter that affect the accuracy are batch size and no of iteration. Too small batch size or no of iteration with respect to the training set would decrease the accuracy.
Play with these parameters and see how they affect your model and let me know. In the next tutorial, I will share the accuracy comparisons.