Posted on 2020-08-24 at 11:40:00 UTC-0600

generalization

They’re pretty sure that it performs regularization by starting off the supervised training in a good spot, instead of by somehow improving the optimization path.