Sigmoid ($\sigma(x) = \frac {1}{1 - e^{-z}}$)
- Can saturate when $z$ is large
- Has a nice derivative: $\sigma’(x) = \sigma(x)(1 - \sigma(x)) $
- Transforms $(-\infty; \infty) \to (0, 1)$
Tanh ($\tanh(z) = \frac {e^z - e^{-z}}{e^z + e^{-z}}$)
- Is a rescaled version of the sigmoid: $\sigma(z)=\frac {1 + \tanh(\frac z 2)}{2}$
- Transforms $(-\infty; \infty) \to (-1, 1)$, so is zero centered
- May require normalization of outputs (or even inputs) to a prob distribution
- $\tanh’(z) = 1 - \tanh^2(z)$
- Is found empirically to provide only a small or no improvement in performance over sigmoid neurons
- Can also saturate
Rectified Linear neuron ($\text{ReLU}(z) = \max(0, z)$)
- Doesn’t saturate
- Doesn’t learn if the weighted input is negative, as the gradient is then 0