Hyperparameters

2023-01-21

2 minute read

Notes , Mathematics , ML , NNs

Hints when starting on a new problem

Start by getting better than change results, as a baseline for improvements
Strip the problem space down to a simpler version, e.g. just learn to classify 0 and 1, rather than all the digits of MNIST
Focus on getting decent values hyperparameters one by one (e.g. $\lambda$ or $\eta$), rather than randomly jumping around hyperparameter space
Start with getting decent learning rates etc. before scaling up the number of neurons
Initially jump about by largish amounts, looking for a decent value, then fine tune
Can be useful to intertwine various hyperparameter optimizations, as they influence each other
Pay very close attention to validation accuracy

Use a small subset of data to start with
Experiment with a stripped down version by removing some hidden layers
Increase the frequency of monitoring, e.g. every n batches, rather than every n epochs

Use early stopping - if there was no improvement over the last 10 epochs, stop and validate (or try different hyperparameters)

Start by finding the threshold value for $\eta$ at which the training cost immediately begins decreasing, instead of oscillating or increasing. A decent initial value is $\eta = 0.01$. If it starts decreasing right away, keep multiplying by 10 till it doesn’t. If it didn’t decrease right away, divide by 10 until it does. This will give an order of magnitude estimation for the threshold value of $\eta$
A quick value for $\eta$ is half the threshold value

Start with 0, until $\eta$ is set, then set to 1. Next use the validation set to find a decent value, scaling up and down by factors of 10. Then fine tune $\lambda$, after which return to fine tuning $\eta$

Relatively independent of other hyperparameters
Mainly influences the time needed to train stuff, rather than the actual training results
Find some decent values for the other hyperparameters, then try a few batch sizes, scaling $\eta$ up and down. Choose whichever size gives the best $\frac {\text{accuracy}}{\text{clock time}}$