With the growing output in machine learning papers for surgical applications, it is becoming more important for surgeons and surgical researchers to have a better understanding of key principles in machine learning research. Thus, the previous chapters of this book have provided a foundation for understanding the key techniques and existing applications for techniques in artificial intelligence (AI), especially deep learning. Although it is difficult to cover all possible methodologies and applications of AI to surgery, we hope that the previous chapters have armed you with the ability to begin learning more about the field.
Although additional mathematical and computational knowledge and skills are necessary to begin engaging in AI research, reading the literature is a good way to continue one’s education and expand the scope of one’s appreciation for the field. While this chapter does not review individual methods of machine learning (some methods are covered in Chapters 3-6), it is intended to provide a basic strategy to reading and assessing predominantly clinical papers that utilize machine learning (as opposed to technical papers in computer science or engineering on the development of machine learning methods). This strategy is certainly not the only way (nor is it likely the best way) to evaluate a clinical machine learning paper; however, it provides a systematic framework to support the curious clinician or surgical researcher in evaluating such work.
JAMA's user guide on evaluating machine learning papers provides an excellent framework for reviewing machine learning literature.10
As was introduced in Chapters 1 and 3, AI draws from a wide range of fields. Although one does not have to be familiar with all of psychology, neuroscience, linguistics, and so on, knowledge of core mathematical and statistical principles is vital to understanding and assessing papers that use techniques in AI. Many studies in the medical literature today are based on frequentist statistics that are often taught in high school, college, and medical school. Yet, multiple studies have demonstrated that statistical literacy among clinicians (both those in training and those in practice) is poor. A survey assessing physicians’ knowledge of risk and benefit of common screening tools and therapeutics demonstrated that 79% of the surveyed physicians overestimated the benefits of these tools, whereas 66% overestimated the harms. Furthermore, 67% of these physicians reported low confidence in their own awareness of the probability of risk or benefit.1 In a large study of over 4700 obstetrics and gynecology residents, only 42% could correctly define P value on a multiple-choice examination.2