Welcome back. We've been talking about a variety of ways that data are collected and stored and how this can lead to fragmentation. Given the common need to integrate data for analytics, a common task is to standardize or map data between different sources. These are sometimes referred to as target and source. After this lesson, you will know what it means to perform data mapping and make clear about why this sometimes is a time-consuming process. Now, I will define the task of data mapping. Existing data can be a rich resource for information. Thus, organizations often choose to send data from one system to another or transfer data from one unused legacy system to a newer system. In addition, they might send data outside of the organization for compliance with outside agencies or to participate in a registry. Before data can be moved however, it must be mapped from an existing system to the new system. Many people refer to this process as moving source data to the target database. For example, in the source system, a birthday might be stored by showing month, date and year, whereas in the target system, the birthdates might be stored numerically. Some systems may save first name, last name in one field, whereas another system might separate the first and last name in two distinct fields. A system might record weight in pounds, but perhaps the target database records weight in kilograms. With these conflicts, our first step is discovering the details and contexts of the source data. The first data source that we identified on our sample workflow from the ABA project was the EHR source data, because this is where the clerk identified when the patient arrived for her visit. It also identified whether payment was collected. But there is another source of documentation in this workflow, and that source is paper documentation, because the form asks someone to enter information and that information is also data. It might not be easy to retrieve this information because it's manually written on a piece of paper, but it is still information that we can at some point go back and retrieve. Data location is where the data is entered or recorded. The location is found by interviewing those who input the data and asking them, "Where do you enter this? " It is also a really good idea to sit with individuals and observe them as they go through their workflow. There is a tremendous amount of knowledge that you will gain just by observing and shadowing someone. When possible, ask for screenshots from their computer workflows and copies of all paper forms used in the process that you are observing. This will be helpful when it's time to document the type of data as well as the data content. Data type refers to the kind of data item. Type defines the values, the target source can take. Examples include structured, non-structured data, numeric characters or even a string. As discussed in past lessons, structured data refers to information that is typically well-organized, readily searchable, and has finite values. Unstructured data however is essentially the opposite. The lack of structure makes reporting a very time intensive endeavor. Now, let's talk about target data requirements. Understanding your target source is like checking the weather before you leave on vacation. If you know where you're going and the weather conditions that you can expect, you can pack accordingly and have a better chance of a comfortable trip. The same concept applies to target data requirements. If you know what is expected, it will be easier to execute the transfer of data. Understanding your target source and requirements before gathering data is essential. Data mapping is a process of showing the relationship between a source and a target. Just like maps, there is a starting place and a destination for each data item linked in the process. So, before data can be mapped, it is important to understand the target structure. Target data requirements include content and technical specifications, but we're going to focus only on content in this lesson, because you'll learn more about technical specifications later in the course. Understanding your target includes knowing what data fields are available, what type of data the target is going to receive, in what format the data must be sent, and how the data fields are defined. I'd like to spend a little time now and talk about the importance of defining data fields. Defining data elements is incredibly important. All of you have had a conversation with someone where a term is used and you thought that that term meant one thing. However, maybe the other person thought the term meant another thing. Later you find out by the end of your conversation, that you are both completely confused. The same thing applies to data. If you pull data from a system without a consistent definition, you will often have the same confusion. Let's consider a few real examples. First, consider the admission date from a hospital dataset. This seems like a straightforward data field but it's not. There are several questions that must be answered, is it an admission to the emergency room or do you mean an inpatient admission date? Do you mean the admission that was ordered assigned or do you mean the date the patient is transferred to the inpatient unit? The last question can be tricky, because if you have a patient who arrives at the ED at 10:00 PM on Tuesday and doesn't get moved into the inpatient unit until 7:00 AM on Wednesday, you now have two different calendar dates. You must be able to know which one to choose for your data to be consistent. Consider another example, the data field is titled "Location of injury" and it is from a burn registry. There was a data field title "Location of injury" and my colleague had the opportunity go back and see in past documentation how people responded to "Location of injury". She saw answers such as with body parts, such as arms, legs, face, and scalp. But she also saw states such as Arizona, California, New Mexico. She also saw cities and actual addresses like 2200 Oak Avenue and she even saw landmarks. "Oh, I was at Baylor Park" None of those answers are incorrect, it's just that none of these answers were consistent because the title of the data element was not defined for end users to be able to give a consistent response. Okay. That's the end of that lesson. We look forward to seeing you for the next lesson on entity resolution.