Artificial neurons

1 minute read

Sigmoid ($\sigma(x) = \frac {1}{1 - e^{-z}}$)

  • Can saturate when $z$ is large
  • Has a nice derivative: $\sigma’(x) = \sigma(x)(1 - \sigma(x)) $
  • Transforms $(-\infty; \infty) \to (0, 1)$

Tanh ($\tanh(z) = \frac {e^z - e^{-z}}{e^z + e^{-z}}$)

  • Is a rescaled version of the sigmoid: $\sigma(z)=\frac {1 + \tanh(\frac z 2)}{2}$
  • Transforms $(-\infty; \infty) \to (-1, 1)$, so is zero centered
  • May require normalization of outputs (or even inputs) to a prob distribution
  • $\tanh’(z) = 1 - \tanh^2(z)$
  • Is found empirically to provide only a small or no improvement in performance over sigmoid neurons
  • Can also saturate

Rectified Linear neuron ($\text{ReLU}(z) = \max(0, z)$)

  • Doesn’t saturate
  • Doesn’t learn if the weighted input is negative, as the gradient is then 0