Welcome to Health Care Data Analytics, Secondary Use of Clinical Data. This is Lecture b. The component, health care data analytics, covers the topic of health care data analytics which applies the use of data, statistical and quantitative analysis, and explanatory and predictive models to drive decisions and actions in health care. The objectives for this unit. Secondary use of clinical data are to describe the secondary uses or re-uses of clinical data including, but not limited to, the electronic health record, or EHR. Discuss the limitations and challenges for re-using clinical data. And conduct a data re-use analysis for health care quality measurement utilizing a sample data set. In this lecture, we'll discuss the limitations and challenges for re-using clinical data. We'll talk about some caveats for the re-use of clinical data, how we may overcome those caveats, and the need for interoperability. What are the caveats for re-use of clinical data? In the following slides, we'll describe how operational clinical data may be inaccurate, incomplete, of unknown provenance, and of insufficient granularity. There are also a number of idiosyncrasies of such data. To start, the data in the electronic health record may be inaccurate. Documentation is not always a top priority for busy clinicians and for a variety of reasons. They may enter data into the record that's not accurate. One analysis of the EHR systems used by four known national leaders in health care assessed the use of data for studies on the treatment of hypertension. They found five categories of reasons why the data were problematic. Some data was missing. Other data was erroneous. Some data was interpretable. Other data was inconsistent. And a great deal of data was inaccessible in text notes. Data may also be incomplete. For example, not every diagnosis is recorded at every visit. In other words, the absence of evidence is not always evidence of absence. This is an example of a concern known by statisticians as censoring. Incomplete data makes tasks that are seemingly simple quite challenging. Such as the identification of diabetic patients. It also undermines the ability to automate quality measurements. For example, one study found that quality measures were under reported based on under-capture of data due to variation in clinical workflow and documentation practices. Another study found that quality measures are usually correct when they're present, but they are often missing in primary care electronic health records. One challenge for clinical data is that we may not know the providence of the data. In other words, where the data came from. For example, the figure in this slide looks at data used to answer the question of whether a medication was administered to a patient. There may be a number of places in the record where that data exists. For example, when the clinician does order entry with an intent to administer. There is also the medication administration record, which can be another indicator of the events. In addition, there's data from the pharmacy whether the drug is available to be given. And finally, there's data used for medication reconciliation. None of these data sources is perfect and they sometimes conflict. When we're using such data, we should know where it comes from. In addition to provenance, there's also the problem of granularity. For example, diagnostic codes that are assigned for billing purposes may be generalized to a broader diagnostic class. An example would be when a patient with a set of complex cytogenetic and morphologic indicators of a pre-leukemic state is described as having myelodysplastic syndromes, or MDS, for billing purposes. But this data would be insufficient for other purposes, such as research, where a more specific diagnosis would be required. What are the idiosyncrasies of clinical data? Let's start by revisiting censoring where some of the data is missing. There may be left censoring in that the early part of the data may be missing. For example, the first instance of a disease in a record may not be when the disease was first manifested. This may occur because perhaps it was not yet diagnosed. There is also right censoring, where the data source may not cover a long enough time interval to cover the entire time course of the patient and their disease. Another idiosyncrasy is that the data might not be captured from other clinical or non-clinical settings. For example, the data may reside at other hospitals or health systems. Or it may not be available at all, such as may happen with over the counter drugs. There may be biases in how clinicians or others in health systems test or treat a patient. There may also be institutional or personal variation in practice or documentation styles. Finally, there may also be inconsistent use of coding or other standards. How might we overcome these caveats with clinical data? What's recommended for optimal re-use of EHR data? There should certainly be assessing of data and using it appropriately. There should also be adaptation of best evidence approaches for use of operational data. There is also a need for standards and interoperability and of course, an appropriate use of informatics expertise guiding its use. One activity that these challenges have spurred is a focus on interoperability. The Office of the National Coordinator for Health IT has developed an interoperability road map for the next ten years. The emerging approaches include a standard application programming interface, or API, for query and retrieval of data. This is needed for both documents which are very prevalent in health records, as well as discrete data elements. One emerging standard for this is the Fast Health Interoperability Resources, or FHIR standard. The link on this slide goes to a webpage that gives an overview of the FHIR standard from a clinical user perspective. Another approach is to leverage a growing number of clinical data research networks. These include, the HMO Research Network, which facilitates research among the larger managed care organizations. Another is the FDA Mini-Sentinel Network, which focused initially on safety surveillance of medications but has expanded to other areas. A more recent development is PCORnet from the The Patient-Centered Outcomes Research Institute. Which is developing clinical data research networks, or CDRNs, that aggregate data on more than 1 million patients each. PCORnet has developed a common data model for a subset of this data that will allow more interoperability and facilitate more usage of it. This concludes Lecture b of the unit on Secondary Uses of Clinical Data. Summarizing this lecture, we've seen that there are a number of caveats for re-use of clinical data. We need to understand and use the best practices that may allow us to overcome these caveats. These efforts include a focus on interoperability and leveraging clinical data research networks. This also includes the unit titled Secondary Uses of Clinical Data. In this unit, we've seen that there are many opportunities for the secondary use or re-use of clinical data. However, we must be cognizant of the caveats of using this type of data and we must implement best practices for its use. This includes achieving consensus on approaches to standards and interoperability. And leveraging established and emerging clinical data research networks.