In this course, you’ll learn the foundational economic theories behind health care innovation and how to optimize your own health care practice or organization. Designed to help you gain a practical understanding of the theoretical frameworks of behavioral economics and operations management in the health care setting, this course will help you apply these frameworks to assess health care practices and apply innovation while managing risk. You’ll also explore the best practices for evaluating one’s innovative practices, using real-life examples of success to see the concepts in action. By the end of this course, you’ll have honed your skills in optimizing health care operations, and be able to develop the right set of evaluations and questions to achieve best innovative practices within your organization.
À partir de la leçon
Module 3
In this module, you’ll examine the practice of evaluation and how it is applied to health policy and programs. You’ll gain a better understanding of the need for evaluations in an ever-changing health care environment, and the importance of control groups to combat selection bias that may skew the findings of an evaluation. You’ll explore different methods to conducting an evaluation, the types of questions an evaluation aims to answer, and the difference between effectiveness and efficacy. By the end of this module, you’ll understand the theoretical framework behind an evaluation and be able to employ an evaluation to better analyze the effectiveness of your health care organization.
Andrew M. Heller Professor at the Wharton School, Senior Fellow Leonard Davis Institute for Health Economics Co-Director, Mack Institute of Innovation Management The Wharton School
Amol S. Navathe, MD, PhD
Assistant Professor of Medical Ethics and Health Policy Department of Medical Ethics and Health Policy
David A. Asch, MD, MBA
Professor of Medicine and Professor of Medical Ethics and Health Policy Department of Medicine
Roy Rosin, MBA
Chief Innovation Officer Penn Medicine
Kevin Volpp, MD, PhD
Professor of Medicine, Division of Health Policy / Professor of Health Care Management Perelman School of Medicine / The Wharton School
Today, we're going to continue our discussion of the different types of evaluation.
Last time, we differentiated between evaluation and monitoring.
Remember, evaluations are periodic,
objective assessments of a planned, ongoing,
or completed project, program, or policy.
Monitoring, on the other hand is a continuous process that tracks
the program mid-stream and uses data to inform
program implementation and day-to-day management.
We also discussed the three types of evaluation questions,
descriptive, normative, and cause-and-effect.
You will remember that we're primarily after the cause and
effect type of question and program and policy evaluations.
As Paul Gertler and co-authors described in their textbook,
Impact Evaluation and Practice,
"The basic impact evaluation question can be formulated as,
what is the impact or causal effect of a program on an outcome of interest?
In other words, it will answer the question,
"Is a given program effective compared to the absence of the program?"
As we will discuss later,
as we dive into methods,
this type of evaluation often relies on comparing
a treatment group that received a project, program,
or policy to a comparison group that did
not in order to estimate the effectiveness of the program,
more on that later.
But I do want to highlight one important term and the distinction here.
You will note that we use the word effective or effectiveness here, as in,
is given program effective compared to the absence of
the program or what is the efficacy of a program in achieving its goals.
What's the difference? Well, effectiveness is real world,
providing evidence from interventions that take place
in normal circumstances or real life.
Efficacy, on the other hand,
provides evidence carried out under specific circumstances.
Oftentimes, we think of this as under ideal conditions.
The canonical methodology for efficacy is the randomized controlled,
and I emphasize controlled trial.
Can you guys think of some tradeoffs between efficacy and effectiveness?
Let's give some examples.
We will use two policy evaluations to illustrate.
First, let's start with an efficacy example.
Let's imagine we're studying how
a financial incentive program can improve smoking cessation rates.
In other words, we'll pay people to quit smoking and stay abstinent,
and see if that improves long-term quit rates.
This was a real study done by my colleague,
Kevin Volpe, in the form of a randomized controlled trial.
There were inclusion and exclusion criteria,
like a patient had to smoke at least five cigarettes a
day and not use any other tobacco products.
This homogenized the patient sample,
and patients were then randomized to financial incentive or not.
You can see that this was a highly controlled situation.
Now, let's consider an effectiveness example.
An illustrative evaluation would be assessing the impact of using
bundle payments to pay for joint replacement surgery on cost and quality.
In this Medicare program,
hospitals volunteered to be paid a fixed price for the acute hospital care,
plus any services a patient used over the 90 days that followed.
This was implemented at hundreds of hospitals across the country.
We'll call this the bundled payment example.
Okay. Back to efficacy versus effectiveness with
the smoking cessation trial and bundled
payment program examples in the back of our minds.
In general, we think of efficacy studies
as providing a truer effect of the intervention itself.
You may think of this as a proof-of-concept,
or testing whether a program or policy has the potential to be impactful,
or how big the size of that impact could be.
These studies have high internal validity.
Internal validity is the extent to which a causal conclusion,
based on the study, is warranted.
Which is determined by the degree to which a study minimizes
systematic error or bias or confounding.
Another way of saying this is,
the key question in internal validity is,
whether observed changes can be attributed to your program or intervention,
that is, the cause,
and not to other possible causes,
sometimes described as alternative explanations for the outcome.
So, efficacy studies have high internal validity,
but we aren't sure when we replicate them in practice,
whether we'll get the same result.
This means they may have lower generalizability or external validity.
In general, we are experiencing a crisis of replicability in science,
and that we find it difficult to replicate some major influential study findings,
much to the chagrin of researchers and the scientific community.
This is not at all restricted to policy evaluation itself.
In part, it's related to methodology and factors like efficacy versus effectiveness.
But there are many other factors at play too like publication bias,
mining data for p-values, et cetera.
That being said, let's set that aside for now.
Let's use one of our examples.
In the smoking cessation study,
the investigators went to great lengths to
make sure that the financial incentives were actually
the cause of the 10% point higher quit rates.
Randomization, sample selection and the like
were all used to increase the internal validity.
However, we may worry that
the next workplace any employer tries to implement this program,
they may not get the same results.
That is, they may not get the same effect if they're younger employees
or those that have been smoking for
shorter or longer than those individuals in the trial,
or other environmental forces like more chewing tobacco or bars around,
alcohol and smoking seem to go together.
So, this randomized controlled trial had high internal validity,
but we may worry it won't apply to other situations.
Effectiveness studies, on the other hand,
provide estimates of the program impact in real life.
That is, estimates that might be highly
relevant to replicating this program in other places.
However, we might have other factors at play that could explain the effect.
That is, the effect could be confounding.
So, effectiveness studies have high external validity.
External validity is the validity of
generalized causal inferences from our policy evaluation.
It is the extent to which the results of a study can be
generalized to other situations and to other people.
Saying this another way, external validity
is the degree to which conclusions in your study would hold for other persons,
in other places, at other times.
Our bundled payment program was implemented in the real world.
The participants had to contend with
what post-acute care providers were available to them,
what the capacity of those post-acute care providers was,
whether there were other bundled players in the market, et cetera.
Seems pretty real worry?
That being said, we may worry there's something totally
separate is driving the effect we are
seeing of bundle payments improving cost and quality.
Like, what if all those bundle payment providers
were also accountable care organizations,
and that was really why they were doing so well?
Also, remember that those bundle participants volunteered to be in the program.
So, we have some potentially very important selection bias
that could affect our causal estimate of the impact.
Here, we might say that the bundled payment example has
good external validity and question the internal validity.
Know that a caveat in
external validity is that hospitals volunteer to participate in bundles,
but we'll get to that in just a moment.
How would we have to change the design of each study to
shift the relative strengths of those examples?
In the smoking cessation study,
what if we studied implementation of a program across 15 different work sites and
three different employers consider all smokers and watch what happened over many years?
This is starting to sound more like an effectiveness study.
No. Or in the bundled example,
what if we mandated hospital in random metropolitan areas to be in
bundled payment programs and put strict criteria on which patients count and which don't?
This is real by the way. Medicare announced intent
around a future program of mandatory bundles in December of 2016.
This is making the bundled payment study sound more like an efficacy study, right?
Yet, it still retains a core real world aspect to it.
So, there're ways to try to bridge this gap methodologically,
but it will almost always be impossible to do so completely.
We covered a lot of ground today.
We discussed efficacy versus effectiveness and dove into examples.
See you next time for even more on evaluation types.