The authors discuss four trends in AI research that have negative consequences for the community.
It’s important to allow researchers to include speculation, because speculation is what allows ideas to form. But the paper has to carefully couch speculation inside a “Motivations” section or other verbage to ensure the reader understands its place.
It’s extremely important to define concepts before using them. Terms like internal covariate shift or coverage sound like definitions without actually being such.
If a paper presents more than one tweak or change, there have to be ablation studies.
Don’t include equations just to sound more technical. Do it to communicate.
Giving something an intuitive name makes it easier for that concept to stick without having to argue for it. Examples include reading comprehension, curiosity, thought vector, and consciousness prior.
Deconvolution is a great example of this. Even worse, some names for tough problems have been reused for easier problems until now language understanding and reading comprehension refer to making accurate predictions on specific datasets. That’s bad.
These words carry multiple meanings and cause miscommunication. For example, model interpretability has no universally agreed-upon meaning. Generalization means three things:
and that causes confusion. Bias is an example that is more well-known.
Here the authors make the distinction that suitcase words are fine for naming academic departments or setting aspirational goals, but using those terms in technical settings causes confusion.
complacency in the face of progress: There’s this assumption in ML that “strong results excuse weak arguments”, which is completely wrong.
growing pains: Rapid growth makes it more likely for junior researchers to misappropriate terms or technologies. Also, there’s a high ratio of papers to reviewers due to this growth, so reviewers are likely to be less experienced.
misaligned incentives: Researchers are incentivized to make titles and claims that sound good to lay readers. Investors and reporters cause researchers to overstate their success.
Strong empirical papers have error analysis, ablation studies, and robustness checks. A question to ask yourself when including an explanation: Would I rely on this explanation for making predictions or for getting a system to work? Also, be clear on which problems are solved and which are not.
Reviewers can set better incentives for authors if they ask themselves if they would have accepted the paper had they been less thorough. Retrospective papers that solidify foundations and those that challenge conventional thinking should be emphasized during review.
Here are some of the counterpoints.
The authors feel that their standards won’t take anything more than a few extra days to implement and write, but in cases where ideas can’t be shared without breaking the standard, it’s better to break it than to leave something unsaid.