We know that there is a strong desire to improve human wellbeing, and medical care is one way to help people stay healthy or combat diseases. Yet the diversity of medical conditions and the associated treatments are immense. Moreover, there are numerous types of systems that have evolved towards helping with these goals. As a result, it is not surprising that healthcare has numerous types of data in part because medical care is complex and involves a variety of concepts that are being measured and communicated. In this lesson, we will evaluate how medical concepts get transformed in actual data. We'll then explore the diversity of data types that is produced from these varied processes. After this lesson is complete, you will be able to identify different types of medical processes and then explain why specific data types emerged from these varied human endeavors. Allow me to discuss how we move from diverse processes to diverse concepts and finally to very data types. First, health data often comes from multiple sources. The patient's continuum of care flows over time through a variety of healthcare visit types and provider types. Thus, the data is going to be complex. It's going to come from multiple sources because there are many different types of processes involved in diagnosing and treating various diseases and medical conditions. Second, data are often formatted or structured differently, and this produces a significant challenge for those who want a complete understanding of the patient's care across all of these sources. It is not surprising that data type formats vary between different health systems. Next, different workflows or human objectives lead to a variety of data types. For example, much of medical communication is placed in written natural language or text. Thus, clinical notes are a substantial portion of the electronic medical record today. Kaiser Permanente of Northern California has well over 44 petabytes of clinical records. This includes images, but a significant fraction of this data on the form of clinical notes. We have numerous texts that comes from laboratory tests, what we call observations. We also have data from devices. An ECG or an EKG will produce 1,000 data points per second. We also have images. A 2D mammogram could be 120 megabytes per image. A 3D MRI may be 150 megabytes per image. These are all present-day sources of data in clinical care. As new technologies are adapted or created, like wearable devices, it gives us even more data points to track. The next step to manage complexity is to put clinical processes into categories that can be communicated by both humans and computers. Our ontologies and terminology systems are important here. Once we can agree on specific codes for specific diseases or drugs, it is much easier to create data associated with these codes. Let's review an example about what it takes to move from concepts to data. Imagine a group of doctors who diagnose diseases might be required to communicate their patient's diseases to a state public health department. They first need to think about the categories of conditions. For example, cancer and then various types of cancers that occur on various parts of the patient's bodies. With these categories, they might investigate their clinical systems and only see clinical notes to document these conditions. Rather than send notes that cannot easily be process, the doctors may decide to add an ICD-10 code for the patient's diseases. This could result in a text file separated by commas that has patient's medical numbers and demographics along with specific ICD-10 codes. Sometimes we obtained the data file after this process has occurred, and we forget that at some point in time, groups of people move from categories and concepts to distinct data fields. Once concepts are translated into data, it might be challenging to reduce the data down to actionable information for the leaders who were responsible to make decisions. Thus, when we talk about data, we really mean the raw numbers, text strings, or images resulting from workflow processes of hospitals, clinics, or other health organizations. In contrast, information is an output after the raw data, which have been processed and analyzed in a formal way. So the results are directly useful to those involved in making decisions. Data becomes information to the process of collection and the application of analytics. As discussed in the first two modules, we must ask ourselves several questions related to how the data were created and for what purpose. Thus, understanding the context of data involves asking ourselves a few questions related to data origins. What does the data entry form indicate the data means? Considering the clinician, the nurse, the practitioner, who is entering information and at what point of care into the electronic health record? What did that person have in mind when they store this information? Is it consistent across data entry personnel? Is it consistent across units, departments, and organizations? We need to consider these contextual questions to effectively aggregate data and transform it from raw data into information. What was the context of any transformations that may have occurred to this data? Did it arrive in an electronic health record fully complete, well-understood, documented, and a data dictionary existed to transform this information? Or did someone need to spend some time transforming this information? Could a system have transform this information into some type of other format? What is the context of any interpretation that might have occurred to this data? Assuming the data was created in an electronic format, did it stay in that way throughout the course of acquisition into storage, aggregation, and analysis? Or did some inferences need to be imposed on this information to retain the understanding of the data that we have? In the next lessons, we will offer concrete examples to help answer these critical questions. To do this, we will focus on specific types of healthcare data.