Posted on

deep-learning neural-scaling

This paper is all about trying a bunch of different changes to the training setup to see what affects the power law exponent over dataset size. Here are some of the answers:

Here are some other things to test that I thought of while I read this: