An understanding of the information input and export processes is critical to the appropriate use of large data sets. As some would say, garbage in equals garbage out.
Proper study design and knowledge of programming and statistics are imperative to the safe use of big data in health care.
Many data sets and registries exist. The selection of the proper data source for a specific question or problem is necessary to achieve success.
Collaborative practice including multidisciplinary teams of clinicians, statisticians, medical anthropologists, and programmers, among others, is encouraged when working with big data.
Artificial intelligence has the capacity to change the way humans see and use data. Ethics panels and review boards must remain in place to safeguard the public and investigators.
The 21st century has seen the rise of “big data” in many different industries, including biology and medicine. The use of large data sets in academia has enhanced researchers’ ability to tackle large-scale issues and identify broader patterns. Furthermore, enhanced computing power and sophisticated programming have unlocked the world of artificial intelligence. When combined, big data and artificial intelligence will improve our ability to study health care and to deliver personalized medicine. The applications for big data in surgery are similar to those in health care and health at large, and they are limitless (Figure 2-1). Because the development and deployment of artificial intelligence are highly dependent on the proper utilization of data, it is important for surgical researchers and clinicians to first understand what types of data currently exist in database form as well as the strengths and weaknesses of specific databases.
Applications of big data in surgery.
To date, there has been no formal definition of big data. A well-formed and unambiguous understanding of the term big data is important for a shared understanding of the term. Widely used features to describe big data are the 3 V’s: volume, variety, and velocity.1 Succinctly, volume refers to the size of the data, velocity refers to the frequency of update, and variety refers to the diversity of the data. The ambiguity of these terms reflects the challenges posed by the use of big data in general (Figure 2-2).
The challenges of harnessing the power of big data.
Volume, variety, and velocity are in no way completely descriptive of large data sets, but they do illustrate some important features. A threshold for what volume constitutes big data is not consistently given. Some studies cite terabytes, petabytes, or exabytes as thresholds; others simply state that size is relative and intentionally do not define it.2 Velocity refers ...