Welcome to Healthcare Data Analytics, Risk Adjustment and Predictive Modeling.
This is Lecture C.
The component Healthcare Data Analytics covers the topic of healthcare data analytics
which applies the use of data, statistical and quantitative analysis
and explanatory and predictive models to drive decisions and actions in healthcare.
The objectives for this unit Risk Adjustment and Predictive Modeling are to:
Define risk adjustment, predictive modeling, and validations of models in healthcare.
Identify the healthcare and other data needed to perform risk adjustment
and predictive modeling.
Relate risk adjustment and population segmentation to allocation of healthcare resources
and healthcare redesign.
Discuss uses of risk adjustment and modeling and value based models of care.
Delineate the use of health information technology in the creation, delivery,
and evaluation of prediction models and describe ethical considerations
and risk adjustment in population management.
In this lecture, we'll describe the systems used in the creation, delivery,
and evaluation of prediction models including health information technology systems.
We'll also describe the significant ethical considerations required
when implementing these models.
What do we need to better complete risk adjustment and predictive modeling?
We must improve the systems we use to deliver health information technology.
More importantly, however, our data must improve in quality.
We need more sources of data and we need our cultural systems, how peoplethink, feel,
and react to risk scores and models to change.
We must also reflect on the ethical considerations in more detail.
How and when to share what is acceptable to society and individuals
and how we manage benefits and risks.
How do we use health information technology or HIT now?
How can we use it better in the future?
The entire trajectory of HIT use is changing so fast that it's difficult to keep up.
Many innovations exist now that could change healthcare significantly
but are not widely distributed.
For instance using genetic data, we can predict who's likely to get muscle aches
or even muscle breakdown from a class of cholesterol lowering drugs known as statins.
We do not do these tests now however, because the genetic testing is difficult to get,
the storage and retrieval of this information is not possible
in most HIT systems in structured or encoded fashion.
And most providers don't know what to do instead of using statins.
HIT is used for prediction in three ways, creation, delivery and evaluation.
Researchers and people in industry often create algorithms
that risk stratify or predict outcomes.
Researchers study a condition or group of patients,
analyze their data and propose a risk score.
Industry also analyzes data but they use it for the benefit of program planning.
The word industry in this situation is loosely defined as large health plans,
health product companies and data oriented companies that work with healthcare,
such as Siemens or Optum.
Some examples of questions industry would try to answer using HIT and prediction would be,
how much of our health care resources will a specific group of patients use?
Which patients will benefit most from certain procedures?
How can data collected from person tracking devices
like Fitbits be used to provide feedback and increase use?
In terms of delivery, HIT is used in reports produced by health systems.
Additionally, rules might be put in electronic health record systems
or EHR's to deliver recommendations at the point of care.
Patients and families use HIT when they receive information via e-mail or the web.
The use of HIT for evaluation in terms of implementation
and benefit-harm analysis in the real world is often limited.
Industry sees the algorithms as important intellectual capitaland,
and researchers only rarely have taken the time
to see what happens after their rules are implemented.
This is changing, however,
as more Informatics Research looks at dissemination and implementation.
Here's an example of the steps required to produce these risk scores;
on the far left, a group of data sources are shown.
Starting from the top, EHR data is collected about health care delivered to patient
and the results of their studies.
Historically, this information is highly context dependent and largely narrative,
with narrative meaning notes about patient conditions and treatments.
For instance, researchers have shown that the diagnosis of diabetes is often recorded
for billing purposes when a test to rule out diabetes is completed.
This billing code would go to the next data source known as Claims,
but not the result of the tests that ruled out diabetes.
Claims, therefore, has the bills for procedures, visits, and diagnoses.
Pharmacy Benefits, a kind of claim,
has information about medications covered by insurance.
Finally, personal device data may track a person's steps, workouts, or diet.
In the top diagram, EHR data alone is extracted, transformed,
and loaded into population management systems or enterprise data warehouses.
This workflow usually occurs at a health system.
Next, the business intelligence
or analytics team at the health system might produce reports and risk scores.
Sometimes, this information is put back into the EHR.
The other sources, claims, pharmacy data, and device data might be purchased
and going to an aggregation program at an industry data warehouse
where risk adjustment, actuarial analysis or program planning might occur.
Industry might also use this information to anticipate new markets for products.
In either case, the data is taken away from where and how it was collected,
leading to significant confusion at times about the meaning of the scores.
In the future, we may have new ways to create, deliver,
and evaluate risk predictions using HIT.
For Creation, new self-evolving algorithms may be
programmed to periodically evaluate themselvesand,
and report potential improvements to their ability to predict.
In addition, new types of data will completely revolutionize prediction.
For example, genetic and protein data will change how we predict response to treatments
and future risks.
Personal tracking devices such as heart rate monitors
will be used to detect risks of future problems from heart dysrhythmia
or sedentary lifestyles and finally, social determinants of health
will be used to better allocate societal resources and programs.
Included here are things such as the environment and local social capital
like parks or the availability of healthy food.
Delivery will increasingly go to multiple parties, people and families,
providers more directly through application programming interfaces or API's.
In fact,
universal API's are now being built onstandards
like fast health care interoperable resources or fire and others.
As David Bates and colleagues point out,
patients will be more quickly triaged in health care settings
to determine urgency and treatment as well as to avoid harm from adverse events
from drugs and readmissions from complications.
Patients who are a high cost, high need,
the 5 percent who need 50 percent of health care resources
will have more integrated tailored programs.
Finally, Evaluation will be built into creation
but will also allow more careful considerations and tailoring.
You can't deliver risk scores or models without data.
Data has historically come from a few sources,
but the Data Ecosystem shown here in an image by Eric Shadd is growing.
Global positioning systems, weather, traffic and other sources
will be combined with increasing amounts of biomedical and personal health data
to produce better predictive models of disease for diagnosis,
therapy selection and treatment.
Our understanding of genetics is evolving rapidly.
Soon, our idea of disease will be much more nuanced.
Rather than leukemia as the condition,
we'll have clear markers of the dominant clones in leukemia,
potential treatments and monitor evolution of the cancer
and react more quickly to treat more effectively.
The mechanism of using very precise information about a patient in disease
is called Precision Medicine.
Precision Medicine is a prediction technique.
As shown in the image from the National Cancer Institute,
cancers can be Geno-typed and specific treatments found or even created for each cancer.
Although this approach is new,
it highlights how prediction models might be used in every treatment for every patient.
If you're interested in this topic,
read about immunotherapy or Cas9 CRISPR approaches in the news or on health care blogs.
The evolution of Precision Medicine is shown here.
Historically,
our knowledge of small cell lung cancer was limited to three traditional cell types;
squamous, large cell, and adenocarcinoma.
And in 1987, one genetic mutation KRAS was discovered
that represented about 27 percent of cases.
In 2004, we found one more genetic mutation,
eGFR which represented about another 10 percent of cases.
By 2009, we had added seven more genetic mutations for a total of nine.
Now diagnosing the genetic mutation in more than 50 percent of cases.
This massive change is meaningful because we can attack the mutation in different ways.
We won't use a drug where the genetic mutation gives resistance to it.
Instead, opting for other therapies.
What are some of the problems with our view of the future?
The first is the four V's of data; volume, velocity, variety and veracity.
As data increases in volume from terabytes to petabytes,
the size of our HIT systems must increase to accommodate such large amounts of data.
Processing power and algorithm efficiency must also increase.
Efficiency is very important because the velocity of data is increasing.
Velocity being both the rapidity with which the data is generated
and the decreasing time for us to act on that data.
Imagine a thousand patients with heart monitors.
For each patient, the heart is beating 80 times per minute
with continuous electrical output for each beat.
If we could act quickly enough, we might detect a string of 10 to 15 beats,
a few seconds that are the precursor to a heart attack,
and act to save the life of the individual.
Variety indicates the diversity of data sources.
How do you combine them?
How do you make sense of their many storage forms?
And finally, Veracity.
How appropriate is the data for the use to which you want to put it?
Or any use.
Is it inaccurate?
Too old?
Incomplete?
Although data issues are important, the way in which we take up data and use it in society
and in our organizations depends largely on cultural issues.
We have a saying in Informatics that only 20 percent of the issues are technical,
the HIT or specific tasks and 80 percent are social.
The norms, the structures, the people and their interaction.
Our ability to process and use information is limited
and the prediction rules are never perfect.
Dealing with this uncertainty is difficult.
The validity and reliability of the predictions for something like re-admission
is only accurate a fraction of the time
and response depends in large part on how much people trust the data,
their current attention, their sense of what to do and how to do it
and what other people think.
Currently, any alerting from the EHR has significant fatigue associated with it,
meaning people ignore the alerts they receive from EHR's
upwards of 80 percent of the time.
This is the real dilemma.
In addition, we have strong beliefs about privacy, confidentiality, and security.
The Harvard Business Review highlights how the creation of truly innovative algorithms
based on genetics requires very personal information, like DNA to be shared widely.
How can we do this safely
and give people confidence that the benefits are greater than the risks.
This is the nature of the ethical principles.
In ethics around health care, we think about three major things.
One, the moral principles.
Two, the regulations and three, the practices.
Each of these is important.
For this discussion, we'll consider the Belmont report,
created in the 1970s which highlighted three important moral principles;
Respect for persons, meaning we must respect individuals and their rights and autonomy,
protecting those with limitations to their autonomy;
Beneficence, meaning that we must act for active good
and justice; meaning we must consider equity and fairness.
For data and risk prediction, we must engage the consent of persons whose data is shared.
We must use the information for good purposes
rather than to discriminate or act against people.
And we must be fair, providing the outcome of the data equitably.
These are difficult issues.
Some regulations like the Health Insurance Portability and Accountability Act, or HIPAA,
try to encode these principles.
Enacted in 1996 and updated over the last few decades,
the act establishes guidelines for protecting privacy
allowing patients to get their own data and decide when it can be shared
and punishing those who violate these laws.
For practice, we must improve our ability to provide good,
by making it easy to do the right thing.
When the data can be used appropriately?
Do we provide it in an easy to consume fashion to give benefits
or is it unstructured or kept in a proprietary format?
In a more focused analysis,
we see that ethical problems in Healthcare and Prediction Modeling are large.
For healthcare, it isn't equitably distributed or available.
Will our models improve this or make it worse?
How we handle data through these transformations is also important.
Data used out of context may be misused or misappropriated,
violating respect for persons.
Finally, how we produce and share our algorithms through software is important.
Poorly programmed software may fail to produce outcomes we want and may harm persons.
When should we provide public access to data?
For example to be effective, the internet of DNA would require making genomes public.
This would likely be covered under the ethical area of personal autonomy.
If you accept the risks and potential benefits of releasing all your genetic information,
you should be able to do so.
However, what about the wishes of those who share parts of your genome,
your children or relatives, could the release of your genome adversely affect them?
Public health has often trumped individual desire to keep data private.
For example, a restaurant owner whose business fails a health inspection
may not want that data made public.
But if making that information public would limit theoutbreak
of a potentially deadly disease such as E. coli,
the wishes of the business owner will be overridden.
Similarly, the mortality rate of a hospital with a surprising number of deaths
might be released to the public,
due to concerns about quality assurance and potential abuse.
Finally, research requires significant amount of data to create,
test and validate predictions.
But should the data be released to accommodate that?
At this stage, nearly all of the efforts focus on the first ethical area
from the Belmont report, allowing individuals the right to choose
by fully informing them of the risks and benefits.
A huge amount of effort is going into learning how to share data
while maintaining these ethical principles.
The Global Alliance for Genomics in Health
or GA4GH is a large international effort intended to help explore
and learn how to share this data ethically and appropriately.
The image on your screen is from their web page
which highlights their framework for responsible sharing of such data.
The link given can provide a great deal of information.
Ultimately, the solution to these ethical dilemmas is to explore different frameworks
and approaches that satisfy ethical concerns
and can help capitalize on the potential for new ways to analyze data,
to stratify risk and to predict diagnoses,outcomes and response to treatments.
This concludes Lecture C of the unit on Risk Adjustment and Predictive Modeling.
In this lecture, we describe the systems used in the creation, delivery
and evaluation of prediction models;
describe some of the ethical considerations involved when implementing risk adjustment
and prediction models.
This also concludes the unit titled Risk Adjustment and Predictive Modeling.
This unit covered the basics of Risk Adjustment and Predictive Modeling,
highlighting the current state and health care reform and redesign,
value based care and its use in HIT.
We discussed the data and ethical principles,
identifying that the field is still in its infancy but is evolving quickly.