Articles, Blog

[email protected]: Data Science & Medicine

December 23, 2019


Thrilled that they made
the time to join us. We’re going to talk
about the power of data. We’re inundated
with it, aren’t we? Both personally
and professionally. It’s overwhelming. But information technology
and the unprecedented amounts of data being spewed out at
every second of every day are transforming our lives. And the massive amounts
of data being generated in research and medicine
are transforming our very understanding of
the basic biomedical process, clinical decision-making,
diagnostic and treatment decisions, and how
we approach population health. By crunching numbers,
data scientists are finding ways to
predict drug behavior and better understand
human disease. We’re set. And I want to welcome all
of our live stream viewers who are watching from
around the world. I apologize for a slight
glitch in the delay. And we’re just now introducing
our program and our speakers from Harvard Medical School. Data science is the driver for
how we approach and practice medicine. After we hear from Dr.
Hatfield and Dr. Rose, we’re going to invite
your questions. For those of you
who are watching through live streaming,
please use the hashtag at Talks at 12– T-A-L-K-S A-T 12, Talks at 12. So now I’m delighted to
introduce our two speakers. Laura Hatfield joined HMS
as an associate professor of Health Care Policy in 2012. Her research focuses on
tradeoffs and decision-making in health care and on
finding ways to help patients and their doctors select the
best course of treatment. Her interest resulted from
earlier work with cancer clinical trials that link
patient data on outcomes with survival rates. Dr. Hatfield finds working
in an academic environment an optimal environment from
which to examine health policy ideas and evaluate how to best
apply them to the real world. Sherry Rose is an
associate professor of Health Care Policy at HMS. Her work focuses on
developing and integrating statistical approaches
to improve public health. Her interests includes
risk adjustment, comparative effectiveness
research and its impact on health programs. She has coauthored the
book Targeted Learning. Dr. Rose was recently honored
with an NIH Director’s New Innovator Award in
recognition of her research. Together, Dr. Rose
and Dr. Hatfield serve as co-leaders of the
health policy data science lab. Please join me in thanking both
of them for being here today. [APPLAUSE] Thank you, everyone. Thank you for the wonderful
introduction and the invitation to speak today. Laura and I are so happy
to see everyone here. And hello to everyone
on the live stream. We’re excited that
this not only is within Harvard and
within universities but outside
universities as well. Today, Laura and I
are going to talk about data science in medicine. Maybe. All right that’s not working. [INAUDIBLE] I’ll keep talking. So Laura and I
are statisticians, and we approach
statistical problems from very different backgrounds. We have different training
and research agendas. But there are a couple
of commonalities in both of our
research when we think about data science in medicine. And those include the
fact that we are inundated with these large data sets. We have these large data sets. And we input them into
our big data machine. But at the end
what we really want is something that’s
policy relevant, that’s interpretable, that
people can understand. And so we care about
theoretical grounding, insensible principles, and
being computationally efficient. But at the end we don’t want
to generate some new tool that we can’t
interpret and really have an impact on policy. And so one of the ways that we
felt that we would introduce you to the scope of
the types of work that we do in data
science in medicine is through a couple of– a few
types of statistical questions and give examples of
substantive questions linked to those statistical questions. So this is not exhaustive but
just a brief introduction. And I’ll pose as the
questioner, and then Laura will give a little
introduction about how we can answer questions of these type. So the first class of questions
are classification questions. And an example of a
classification question might be can we
classify patients who are at high risk
of heart disease using blood pressure
and cholesterol? That’s a great question. Sherry, thank you
for throwing to me. So I’m pleased to be here
speaking with Sherry, because we work on
similar problems but from completely
different approaches. So in the answer
to this question I’m going to describe
an approach that will be similar to an approach
that Sherry will talk about later in the talk, where she
actually implements a very cool classification algorithm. And in this cartoon example I
have a very simplified version of what it might look
like to classify. So to answer a question like,
how do we classify patients at high risk of
heart disease, we might begin by taking
either from novel data or from existing
data, such as might be found in billing claims
or electronic health records, a series of risk predictors,
things that we think are important in determining
whether a patient is at high risk for
developing disease. In this simple cartoon, I’m
displaying two risk factors, blood pressure and cholesterol. So each of these
little heart symbols represents the combination of
blood pressure and cholesterol for a single patient. And we could array
them like this and set our fancy machine
learning data science tools to the task of trying to
separate the diseased hearts from the healthy hearts. And in this case, because
we know the answer for each person– whether or not they
eventually developed heart disease through a diagnosis,
which is the gold standard here– this is a form of
supervised learning. That is we know
the right answers. We’re merely trying to find a
space in the predictors, that is blood pressure and
cholesterol, that effectively separates the people
who are at high risk from those who are at low risk. So in the cartoon example
a reasonable algorithm might be to simply
set two thresholds. And of course this
is very unrealistic, but a threshold
on blood pressure and a threshold on
cholesterol, and say anyone who falls above both
of those thresholds is in this high-risk quadrant. And as you can see it picks
off three of the people who developed heart disease, and
it includes one who didn’t. And so that is a
classification error, but we’re willing to
tolerate it because if we set the bounds higher
in either direction, we would do even worse. So the usefulness of a
classification algorithm like this is that when a new
patient comes, this grey heart, and we put them into our machine
on the basis of their blood pressure and their cholesterol,
because they fall outside the high-risk quadrant, we would
say that they are probably not going to develop heart disease. Of course, we could be
wrong, but at least we’ve trained our classifier
on good data. And so we have some
reasonable assurance that this is a way of sorting
people into high and low risk. So another common type
of statistical question is clustering. And clustering sometimes sounds
very similar to classification, but Laura will explain in
this example how it differs. And one example that comes
up a lot in health policy is clustering hospitals. So a question might be can
we cluster hospitals that are similar in price and quality? Right. So in the clustering
question here, we’ve imagined that there
are things about hospitals that we care about. And in this cartoon
example those two things that we care about, in terms
of hospital performance, are price and quality. And we’re interested in letting
the data discover for us what the major clusters
of hospitals are. So here, unlike
the previous case, where it was a form of
supervised learning, we knew which people
developed and did not develop heart disease. This is unsupervised learning. We don’t know the right answer. There are no labels
that we can put on the hospitals in advance. We’re going to just let the
data sort out how many clusters there are. And we may or may not determine
in advance how many clusters we’d like to find. But in this case, in
the cartoon example, we can naturally
see that there are three clusters of hospitals. And a use of a clustering
algorithm like this might be to try
and discover what is happening with those
very high-quality, low-price hospitals. That would be an exploratory
use of clustering. There are other
confirmatory uses of clustering that would set out
with a hypothesis or a question in advance and try
to sort clusters that vary in ways that could
help us answer that question. So another question that’s
similar to classification is prediction. So when we generalize
to prediction, we might be interested
in different types of continuous outcomes
versus binary or categorical. So an example of
prediction question is can we predict
medical spending using patients’
chronic illness burden? And this is a question
that you’ve worked on a lot because Sherry’s research
in risk adjustment is all about the need
for a health insurer or a payer to be able to
understand, in advance, how costly the people who
enroll in the plan are going to be when they
receive medical care. And as you can imagine this
is a very difficult prediction problem because a lot of medical
spending is very unexpected. So the proportion of variance
explained in risk adjustment algorithms in the typical CMS
risk adjustment algorithms is like 12%. That’s as much of
medical spending as we’re currently
able to predict. I think your algorithms
do better though, right? Depends on the population
we’re studying, but yes. So in this cartoon
example we simply have one predictor
for spending, which is chronic illness burden. That’s unrealistic. In reality, you have hundreds
or thousands of predictors that you put into a
model for spending. But here we’re
imagining that all we have is a single index of how
many chronic diseases does a person have. This could be a count of
diabetes and congestive heart failure. And we array them along here. And again this is a
supervised-learning task because we know
for this population that we’re training
the algorithm on what their actual medical
spending was. So we can fit some sort of
curve or line through the data, and there are many sophisticated
parametric, non-parametric, machine learning ensembles. The tools are
extraordinarily diverse. In this case, it’s
a squiggly line. [INTERPOSING VOICES] Exactly. And the utility of a model like
this for prediction is that when a new patient enrolls
in the health plan, and we know something about
their chronic illness burden– perhaps from looking
at past claims, and we want to make a
prediction for a future year– we simply find their value
in the prediction algorithm for spending, and we impute
or predict that value for them for the coming year
based on our algorithm. So there’s another
broad set of questions in the causal inference or
comparative effectiveness space. And Laura and I have both
worked with the medical devices in our research. But an example
question here would be can we compare the safety
of medical devices using registry data? The typical
collection of evidence that people have available,
both patients and clinicians, when they’re trying
to make a decision about the use of
a medical device is a collection
of clinical trials that have been used oftentimes
in the pre-market approval process, as the FDA clears
a device for marketing in the real world. And those trials typically
run a device head-to-head. So here I’ve got some
cartoon knee joints. And in each of these
trials one knee joint would have been compared
to another knee joint or possibly to a
nonsurgical alternative. And in each trial there
would be declared a winner. So P less than 0.5,
success achieved. Our joint is better than
the existing technology. But the question that
patients and clinicians really want to know is which joint
is the best for me today for my medical problem. And looking at a series
of winners and losers doesn’t really
answer that question. So we would like to know
which is the best knee joint. So Sherry and I have both worked
on medical device registry data, where if we can overcome
the difficulties of not having randomized controlled trial
data, what we’re able to do is hopefully form a
set of estimates that allow us to rank all four of the
knee joints on a common outcome and have a predicted, say,
adverse-event rate for each of them. So in this case
the teal knee joint looks like it has a very
high adverse-event rate, and we would probably
not choose that one. But the purple one looks good. This process uses a
huge variety of methods, including methods to
adjust for confounding, as well as methods for
arraying a whole series of different data
sources together, the registries, the claims. Maybe you can fold in
the randomized controlled trial data as well. But it gets at a question
that is of greater interest than perhaps the original
trials were designed to answer. So hopefully this gives
a little bit of insight into the scope of the
types of research questions and statistical
questions we might be interested in answering as
quantitative health scientists, as statisticians,
working in teams with economists, clinicians,
health policy researchers. And one of the really wonderful
things about our department is we have faculty
in all those areas. So Laura and I not only
function as team leaders but also as part of
collaborative teams. And these projects may
fall into one or more of these substantive areas. But we bring not only our
statistical expertise, but we also become substantive
experts in these areas as well. And again the goal here– hearkening back to that
image from the beginning of the talk– is that most of these
studies may have all of these as policy goals. We make– we want to
improve quality, health outcomes, and cost analyzes,
and ultimately real world health care. And the example that Laura
is going to talk about has tremendous potential
in all those regards. Thanks. So Sherry and I are each going
to share one, slightly more, in-depth example of some of
the work that we’ve done. And each exemplifies one
of the types of questions that we previewed
at the beginning. So I’m going to talk about
a clustering question, and Sherry is going to
talk about a classification question. And through out
we will emphasize our deep interest in the
substantive questions, as well as the nerdy,
statistician take on the whole thing,
which is the methods. And I think it really– we’re hoping to demonstrate how
as a quantitative scientist, you can really bring a lot to
the table for these problems, and that working in teams
makes this science better. So the clustering question
that I got to work on was motivated by
an analysis that I began with one of my
clinician colleagues and one of my colleagues in
economics and health services research, where we began with
a cohort of people who were all diagnosed with a particularly
serious form of advanced lung cancer, so extensive stage,
small cell, lung cancer. And because they were all over
65 and enrolled in Medicare, we collected up all of their
health care utilization data from the time they were
diagnosed until the time that they died. So this was 14,000 people
over several years. And for each person we
constructed a trajectory from diagnosis through
death, that on a daily basis said are they receiving any
services at all from Medicare in the hospital on an ordinary
inpatient unit, in the ICU? Have they been discharged
from the hospital to a post-acute skilled
nursing facility? Are they receiving hospice
services, either in a facility or at their home or
none of the above? And that’s what I’m
calling home here, although home could
be a long-stay nursing facility for some people. And as you can see in these
toy trajectories here– these are just example
cartoons again– the length of the
trajectory for each person varies widely, because the
survival time determines how long, how many days
we have to observe you in each of these
settings, so that you can see that some people have
very long trajectories where they spend a lot
of time at home. And some people
have a density of these green and pink
and blue bars, which indicates that they’re in a
facility receiving health care services. So that’s like a more
intensive time for that person. And the question
was, if we wanted to communicate to
a patient, what’s going to happen to me
after this diagnosis? We would want to
communicate something about the time course
of their health care following diagnosis,
that reflected both the major patterns
of experiences of patients and the diversity
of those patterns. So a quick look
at the data might have been to plot
something like this, which is the average
proportion of people for each period of time, from
diagnosis to death, expressed as a percentage. So now we’ve solved the
problem of the varying lengths. We’ve put them all on a
unit, which is percent time. And this is the proportion of
people across the whole sample who are at home, in the
hospital, in hospice, or receiving post-acute services
in a skilled nursing facility. And one, sort of, standard way
of representing uncertainty around these
trajectories would be to put error bars
around each one and say well, it’s 75% of
people, plus or minus, right? That wouldn’t really
convey to a patient what they can expect to experience. So instead we turned to
a clustering algorithm, which was meant to
cluster trajectories that were similar to one another
into meaningful buckets. And then we could say a
certain percent of people experience this, another percent
of people experience that. And you can lay out these,
sort of, template experiences and attach percentages to them. And that may be a
better way to convey both the major patterns and
the uncertainty associated with them. So that’s what we did. The technology that
we used was to turn each of the trajectories into
a series of summary measures, that is 10 numbers that
we thought encompassed all of the important features of
the health care trajectory for each person. And then those 10 numbers can be
arrayed into a nice data frame like we like to work with. And we applied
latent class analysis to discover, in those
10 measures, the classes that they fell into. And in this case we did
not decide in advance how many classes we
thought there would be, or how many clusters we
thought there would be. We let the data determine
what number of clusters did a good job of representing
the diversity without getting too fine grained. So in this case we ended up
with four clusters of people. And those four clusters
look like this. This is what you would
say are the major patterns of health care
experiences for people with extensive
stage, small cell, lung cancer in our cohort. 66% were mostly at home. And by home again
I mean not getting Medicare-covered
services in a facility. And they had much
longer survival, 10 months of survival,
compared to the other classes. This is longer than
the average survival, because again these are
the healthiest patients who are getting the least services. Another substantial
fraction of people were in this group who were
mainly in the hospital. They spent a good fraction of
their life between diagnosis and death in the hospital
getting acute care or in the post-acute
setting, and had only a month of survival on average. There was another
group of people, 11%, who had four months of
survival and spent it almost entirely in hospice care. And then finally the really,
most-seriously, ill folks who were 6% of our cohort,
who had a month of survival and spent nearly all of it
in the ICU or other hospital settings. So as a way of
communicating what you can expect to experience
as a patient following this diagnosis, this might
be a more informative way to communicate that. And we can see by re-plotting
the proportion of people in each of the classes who
are in these settings, what we missed in the first picture. So the picture of the people
who spent most of their time at home– this is the bulk
of patients in our sample– it looks very similar
to the first picture. They’re mostly at home. And at the very end
of life, there’s an uptake in
enrollment in hospice. But this is what we missed
in that first picture. By not clustering, we missed
that for a substantial fraction of people, their time
at home is very short, and they very quickly enroll
in hospital-based care. And there is a small uptake
in hospice at the end. So the reason, I think,
that this method is useful, not only for
communicating to patients but also for setting policy,
is that currently the way hospice services are
typically covered for Medicare enrollees is
through the Medicare Hospice Benefit that provides the
majority of palliative care for cancer patients. And the policies around
enrollment in hospice require that a person have
no more than six months of prognosis, so no more
than six months to live, and also that they forego
further disease-directed treatment such as chemotherapy. And the timeliness of the
decision to enroll in hospice is dramatically
different for people who have 10 months
to live and people who have a month to live. So effectively
end-of-life care begins at diagnosis for the people who
are in those two classes that had very short survival. And instead of spending
that time in hospice, we found that they spent it
in the ICU or in the hospital. So that’s a place where there
could be policy intervention to make more rapid end-of-life
and hospice enrollment decisions for patients
who fall into that group. So Sherry is going to talk
now about classification, after we did clustering. Yes, and now we’ll switch gears. And I’ll talk about an example
in cancer staging, where we’ve published our first
paper but are actively pursuing additional work,
given some of the results that we had. But I wanted to start off
with just a broad overview of machine learning because
this is a term that we hear now all the time, even in
pop-up ads on the computer. It’s just machine
learning is everywhere or artificial intelligence. And here when I talk about
machine learning, what I mean here, and I’m using
the term very generally, is that it’s another
way that we’re aiming to smooth over that data. And we might be making
fewer assumptions than traditional methods. For those of you familiar
with regression-type methods for something like
classification, that might be parametric
logistic regression. So those are some
considerations we have because we do have these
large, observational data sets. We might have hundreds or
thousands of variables. And traditional methods may
or may not perform well. And so we want to be able
to consider different ways to search the data. We may also have issues
of data sparsity, so variables that
are very important– so medical conditions that are
very rare but very important to adjust for or to include
in our prediction function. Traditional methods may have
issues with data sparsity. With super high-dimensional
data, like imaging for example, we might have millions
of data points in a single image
for a single person. So sometimes we might be also
smoothing over more parameters than we have observations. So these are some settings
where machine learning might be advantageous. Machine learning can also
be advantageous sometimes in smaller data sets, so it’s
not just a big data question. But the idea is that we might–
with traditional methods, we might have issues
approximating the truth. And so we don’t understand
how our data was generated. We don’t know the truth. We’re not in a
setting where we have a controlled laboratory with
a plant and only a couple of variables. We’re dealing with
these messy databases. And even including something– a simple extension of
a traditional tool, like local smoothing might
be better for our data sets. Or we might want to consider
other types of tools that have been developed and
become very popular, like classification trees or
deep-learning neural networks. So there a lot of
different tools that will search the co-variate
space in different ways, partition it in different ways. And the concern then is– when I get to this point
in a presentation is– I don’t know which tool to use. And that’s a completely
legitimate problem. You might have completely
reasonable results. You may be able to approximate
the truth with standard tools, but you don’t know
that beforehand. And so one of the things
that I do in my research is develop ensembles,
where we can consider multiple tools at once. And we do this in a very
rigorous way with internal hold-out samples, so that we’re
not doing things like impacting the generalizability
of our data– we sometimes call
this overfitting– or a training to post the data,
such that– earlier, when Laura was talking about a
new patient comes in, or a new subject
comes into our study, and we want to be
able to either predict their outcome, or for some
other research question, maybe classify them into groups. If we train too
close to our data, then our algorithm will do
a poor job with new people or in a different population. So with these weighted
averages of algorithms, called ensembles, we can
use multiple tools upfront, a priori. So those of you that
are familiar with a lot of the headlines, even in the
popular press, about things like p-hacking or exhaustively
searching for data looking for significant results. By specifying
upfront that we want to use something
like an ensemble and consider lots
of different tools, including standard tools and
machine learning techniques, we can avoid some
of those pitfalls. And so the data example that
I’ll talk about very briefly is a product in cancer staging. And health services research
and health outcomes research has been limited in
a number of areas by not having information
and administrative claims data on disease severity. And cancer staging is
one of those areas. We have information
on maybe location of tumor or metastasis,
but we don’t have an accurate ICD-9 code or a
billing code for whether you’re early stage or late stage or
stage one, stage two, stage three. And so previous
work in this area has tried to use administrative
claims data for cancer staging, for recurrence, for progression. And claims data did
not perform well. And we had– with
some collaborators in the Department of
Health Care Policy, we had the opportunity
to explore cancer staging with machine learning
for classification. And we had claims data
integrated with registry data. And the registry
data is really what provided us with the opportunity
to build this function, because we had diagnosis state,
we had tumor characteristics such that we could classify
people as a gold standard to develop the algorithm
of whether they were early or late stage. And the idea is that we
won’t need that information in the future when we’re doing
research in another data set, that we can now classify people
using only the claims data available. And so we did an ensemble
like I described. We used multiple
different algorithms here in this black–
literal black box. And we also considered
different variable subsets. That’s another thing
you can incorporate into ensemble machine
learning– is maybe I only need the clinical data, maybe I only
need some– the claims date, or maybe I need both,
or some previously thought to be unreliable codes. Let me throw those in as well. So we did this. We did the internal
hold-out samples. We did cross-validation
for all of our algorithms, and we built this ensemble. And we had a tremendous result. We compared our ensemble using,
again, these internal hold-out samples, to have
a fair comparison, to this clinical tree built
by the team beforehand. This was the previous
best job that could be done using
clinical guidelines. And this clinical trial, as
far as the true positive rate, was 53%, which is not very good. And so we had a– with our
ensemble and machine learning, we classified correctly
about 3,000 more people. And our true positive
rate was 93%, so we had a 40 percentage
point improvement, which was– — is dramatic. — is dramatic. And I will say it’s not
always that dramatic. In a lot of the
studies I work on that, there is an improvement
with machine learning, but it might be a couple
of percentage points. And sometimes a couple
of percentage points matters, depending
on your application. And spending a couple
of percentage points– It could matter. — could matter a lot. And so because we had–
we saw the potential for this type of
classification algorithm, we’re continuing
on with this work. This was– our first
study was in lung cancer. And now we’re looking
at other cancer types. And we’re building a
generalizable tool also for rare cancer. So when we have rare
cancers, we don’t have thousands and
thousands of subjects to train an algorithm on. And so we want to
take our knowledge from these other
cancer types, where we do have the data, in order
to generalize to rare cancers. So we will– we’re developing
a shiny app, a web application, where researchers will
be able to use this tool. And also consider– so this– the other main thing
that we’re trying to do is to potentially reduce
the number of variables. So it’s much easier to use
20 variables versus over 100, like our original ensemble. And our initial results are
very encouraging that you can use a much smaller
set of variables from the different subsets,
so not just clinical. You need some from
each of the buckets. And so lastly, I’ll
close, before we wrap up the top portion of
our presentation, with causal inference,
because the other thing that– aside from I don’t
know which tool to use. But I actually really want
to do causal inference. And you can’t do machine
learning with causal inference, and I say yes, you can. You can do machine learning
with causal inference. There are a number of–
there’s a number of researchers working in this area. But we can really take the
tools from machine learning and translate them for
causal inference questions. So when we’re thinking about
causal inference questions, we always have this ideal
experiment in our head, where in the real
world we would be able to observe each
subject under the setting that they were
exposed or unexposed and observe their outcomes
under both those settings. But in the real world
we don’t have that. In the real world, we
only observe your outcome under the condition that
you actually experienced exposed or unexposed. So the idea is that we can now
take these machine learning tools, we can take
these weighted averages of algorithms, these
ensembles, and map them to causal parameters. So we can translate these
for causal questions. And we’ve done this in our work. And we’ll just close with– we wrote– I coauthored a book
in 2011 on causal inference and machine learning. And our next book
is coming out soon. It’s in press. That one focuses more on data
science questions like network data and longitudinal data and
time-dependent confounding. So this is something we can do. But it is a very
exciting area of work, lots of work to do there. So we’ll end with a
commercial, a side commercial, for some of the kinds of work
that we really love to do and that we get
ample opportunity to do in the interdisciplinary
environment at Health Care Policy here at Harvard
Medical School. Scientists are frequently
on this journey, which is filled with
challenges, from a question of scientific interest to
knowledge and understanding. And as they map out a path
towards that knowledge and understanding, they
encounter difficulties with study design,
with data collection, with data cleaning
and interpretation, with analysis, with inference. And every scientist has a bag
of tools that she can apply– standard tools that
she knows how to use– that can get through
these challenges. And this is the way scientists
can work on their own. But if instead you partner
with a statistician or a data scientist, at the
beginning of your journey– Beginning of your journey. — the beginning
of your journey, we can together map out
possibly a different route. And the part that we really
love as methodologists is the opportunity to build
with our collaborators, possibly, an
extremely-fancy tool, custom built to
this journey, that will take us all there faster. So thank you very much. [APPLAUSE] Now we’ll take questions, and
we’ll pass microphones around. So just raise your hand,
and our staff will find you. Any questions? I see one. Hello, thank you for your talk. It was very informative. I saw that the case study used
here came from Medicare data. So I was wondering how
data from private sources looks, like accessibility
wise, and if you think, maybe, like a single-payer
health care in the future could give you better tools,
or what that looks like, maybe, in other countries
with their data. Thank you. Thanks for that question. At Health Care Policy we
have incredible access to data resources,
which includes both– large Medicare
claims collection, plus several large
commercially-insured databases, one of which is a collection
across multiple payers and several of which are
single-payer collections. And as you know if we
did have single payer, there is the possibility that
researchers would have access to a more comprehensive slice
of claims data on the US population. That’s not entirely
guaranteed of course. Medicare has a fraction
of its population that we never get to
study, because they’re enrolled in the private
version of Medicare, Medicare Advantage. And we don’t have
encountered data on them. So even in a
single-payer system, there’s the possibility that we
might not have access to those. But large institutions
like the National Bureau of Economic Research,
Harvard, other places are able to get access to big
commercial claims collections as well. And we’ve made
frequent use of those. And sometimes we don’t
have national data, but we might have, for
example, all the Medicaid claims from Massachusetts. And so then we can say– we can do some pretty strong
research in Massachusetts that may or may not
generalize to other states. And states have implemented
these all-payer claims databases– Massachusetts is
one of them– where they’ve attempted to collect
across Medicare and Medicaid and all the private
insurers in their state, the claims for everyone. Those are yet unrealized
promise, I would say. One last note on
there is that it is important to think
about this question because one of the large
commercial databases that we use for risk
adjustment methods– that’s the same
database that was used to develop risk adjustment
methods for the exchanges, even though it’s a
different population. So having access to
those resources in order to do that research
is really important. I had a technical question. You say you used registry
data and claims data. How were you able
to link the two? Was it the same ID
between both data sets? So this was SEER
Medicare data, so we were able to link
based on identifiers. The first paper is published. I would be happy
to send it to you. Hi, so this is [INAUDIBLE] This is a question specifically
for Laura in regards to the cluster method
that you were explaining. [INAUDIBLE] like 66% Sent had
this trajectory, 17, 11, six, et cetera. How do you pair that with the
characteristics of those people in that category? And what methods do you use? So that if you do have a patient
come in, and they’re like, I wonder what group I’m in,
it’s not just like a 66% chance, but based on what their
health conditions are they have a different trajectory. So how do you manage that? Yeah, the question– maybe for
the live stream who can’t hear if it’s off-mic– is how do we relate the
characteristics of the people who are in the different
classes, who have these vastly different health
care experiences following the lung
cancer diagnosis? So part of the answer
to that question is found in our paper,
which is published. It’s in Health Affairs, which
is that we did something called latent class regression,
where we regress kind of a noisy, uncertain
version of class membership on a whole collection
of characteristics that we have about people. So we do know in
what classes are people most likely to get
chemotherapy, in what classes are people more
likely to be older, to have more significant
co-morbidities. That is subtly different
from a prediction algorithm. So we did not fit a
classification tool, which would put
in characteristics about the patient who was
standing in front of you, and get out their
expected trajectory. That would just be a
different scientific question. And of course some of the tools
that we talked about today could be used for a
question like that. It was not primarily our
goal, but you could certainly do that. So I’m going to
interject with a question from Twitter, from Brazil. So what is the future
of recommender systems in the context of big
data on human health? Recommender systems! So you imagine like
doctor choosers. I’m not sure. So we can explore this. So one of the most
prominent examples, the Netflix prize from
more than a decade ago now. I’m aging myself here. That was a recommender
system challenge about showing people the
types of movies or TV shows that they’d like to watch. They actually used an ensemble. And– but so the future
of recommender systems in human health is, I think,
an interesting question. Some of the data
that I work with have to do with
patients’ quality ratings of their providers. And this is a really
active area of research, because you can
imagine that it’s difficult to sort out the
different versions of quality that we have in mind when
we say health care quality. So one version of quality has to
do with hard patient outcomes, mortalities following
surgery, return to hospital. Another version of
quality has to do with patient’s own
experiences and their ratings of their providers. And sometimes those
things are at odds or are difficult to measure or
are very unstable over time. So a lot of work and
research is being put into the question of even
what do we mean by quality, and how can we measure it, and
how to sort these things out? So to me that seems like a
preliminary step, before we start recommending, is
to settle on definitions and methods of measurement. And I think the level of
how we’re rating quality– is that at the doctor level,
the hospital level, the practice level? These are a lot of
discussions and questions that are actively
going on at the moment. Witness the probably
deaf now for MIPS, which was an attempt to let
providers choose the quality metrics on which they
would be judged and paid, because it was like an opt-in. I’ll choose the measures on
which I’m going to be paid. MedPAC decided that was
maybe not a great program. Can’t imagine why. Thank you, Laura and Sherry. I have two questions. So the first question
is I remember you talked about high dimensionality. So I want to ask you– I have a data set
of 56 patients, and there are
about 25 variables. Is that high dimensional? I don’t think we would typically
consider that high dimensional, but you could still use
the types of tools we’re talking about with that data. When you have smaller
data sets, you do have to consider–
if you’re looking at comparative effectiveness
or causal inference questions, there you might have more
challenges than a prediction question when we’re
talking about the error or the uncertainty around our
causal inference estimate, for example. You said– My second question is– this is
a common thing that comes up– would you be able
to clearly explain the difference between a
predictive model versus a– confusing term– prognosis–
prognostic– model? So would you clearly
differentiate between the two prediction versus prognosis? For example, if
you’re using scores– like sometimes people put
prognostics scores in a model, and many people have
criticized saying that I don’t use the word
prediction and prognosis in the same context. So if you’re able to let
me know the difference, that would be great. Thank you. The prognostic scores are
used in multiple contexts. So are you talking
about prognostic scores that are composites of
multiple other values? Yes. So for example, let’s say I’m
building a prediction model, but I have a prognostic
score inside that prediction good model. So can I talk about
prognosis in that context? Like for example– I give
you a specific example. In the ICU, they use
a lot of ICU scores. Along with that, they
use individual variables. So they try to combine
and see if it increases predictive performance. So how do you talk
in that context? You only talk about
predict– is there a difference between saying
prediction versus prognosis, or you have to come
up with risk groups, and only then talk about
prognosis or prognostic scores? So this may or may not be where
you’re directing this question. But when we think
about prediction, what we care about is inputting
some set of variables– and those could be
composite variables, like a prognostic score– inputting some variables
and getting an accurate predictive value out. And so sometimes
there’s a desire to over interpret the
variables that we’re sticking into the prediction function. But if we care about
prediction, sometimes it may be OK, for example, to have
variables that are mismeasured, composite variables. So it’s about having
the improved prediction. Whereas, in prognosis
I think that’s a very clinical
question, that people may have strong feelings
about the variables that are fair game and how they
ought to be coded and combined and the interpretation
of the coefficients on those predictors may
be some of the distinction that you’re implying. Hi, thank you for the talk. As a medical student who is
learning about data science, you often hear about a
unmeasured confounding and– What was that? Unmeasured confounding. Oh, unmeasured
confounding, got it. Yes, and you were talking
about machine learning. And so I wonder whether it
has any in role or advantage in helping you identify
variables even thought about, or whether, in other
ways, it helps, or whether it still remains
an inherent limitations of your study design,
[INAUDIBLE] the way you assess the data. So unmeasured confounding
is a huge area of research. So when we’re talking
about causal inference, we need to control
for factors that are associated with our
treatment and our outcome. And if certain crucial
variables are not measured, we may not have the sufficient
measured confounders to overcome that. So the answer is
both yes and no. So sometimes we can
use machine learning to try to overcome
some of those. We may have proxy
variables in our data sets that can help us overcome
unmeasured confounding. But this is where when we talk
about unmeasured confounding, starting upfront with
a causal word map and drawing out a causal
diagram and thinking about all the variables
that you would need to really identify
that causal effect is really crucial. So this is another example
of machine learning can help us in some scenarios,
but it won’t fix bad data. It won’t fix bad study design. And it won’t fix
unmeasured confounding because unmeasured
is unmeasured. It’s unmeasured. There is really no magic bullet. We work with a
lot of economists, and there are many
clever study designs that are used all in a
lot of contexts, that try to get around unmeasured
compounding, such as instrumental variables analysis. All of them just face the
fundamental challenge, which is that dealing with
unmeasured confounding is really, really
difficult. And I think that some of Sherry’s
work is particularly useful because it encourages us– even though we have
extraordinarily powerful machines to throw at problems– to instead start with
the causal questions and the inferential
questions, and then later throw fancy machines
at it, instead of throwing your most
powerful computing tools at the problem
as your first pass. Right. So we have a question
from Facebook, I believe from
Cold Spring Harbor. Do you believe DNA
sequencing will be able to predict diseases and
eventually lower health care costs by predicting when
patients will come in? There’s a lot in there. So can we have a prediction
algorithm that reads your DNA and spits out your
life course of disease? I think we’re both sceptical. I’m certainly sceptical. I think for some
conditions, there would be promise when
combined with other factors. We know that
socioeconomic status and other environmental
factors– a lot of things play into whether you will
get a particular disease. So I think for most conditions,
with some exceptions, DNA sequencing, from my
perspective, wouldn’t be enough. Neither of us works particularly
in the genetic space, but it is a very
high-dimensional problem. Back in the old days– so
I have a bachelor’s degree in genetics. And back in the old days when we
were studying single nucleotide polymorphisms and everything was
like a single gene thing, that was as hard as the
problem got, which was searching around
through a couple of places where you thought
people differed and finding those silver
bullet, low-hanging fruit. Now, we’re in the era
of GWAS and epigenetics, and the scale of the problem
has gotten much larger. And there is no
low-hanging fruit left to be plucked
off of the problem. So I think that
we’re on the flat of the curve in a lot of ways. But I imagine that
there’s stuff left to be discovered if we
harness the dimensionality, without maybe getting
ahead of ourselves. And I think there’s just a lot
of heterogeneity in diseases, even when I talk about
lung cancer staging. Even within lung cancer,
there are many, many types. And so I think that also
plays a tremendous role. Then the second part
of this, which was, so if we could predict diseases,
would this eventually lower health care
costs by predicting when patients will come in? I think that’s not in
the near future at all. I think lowering health
care costs in general is an incredibly
challenging problem. And I think that this
will probably not be the way that we
would be doing that. Yeah, I’m inclined to agree. And there’s also so much– not only when patients come in– there’s so much else tied
up in that as far as access to care and other things
unrelated to whether you will actually get a condition,
because there’s people that get conditions,
and then don’t get treatment or don’t see their
doctor regularly. So there’s other
components in that space. One of the things
that I’ve learned since joining Health Care Policy
is what a big driver of health care spending medical
technology is, that in fact in
the United States we just buy vastly more
health care than we used to. We have longer lifespans. We do way more treatments. And there’s an
incredible proliferation of medical technology that is a
huge part of the driving force. So the ability to
identify diseases earlier and make predictions about
who’s going to come in doesn’t address that
very large component of health care spending growth,
which is medical technology. And I think that’s why
I’m a little skeptical. [INAUDIBLE] Oh, OK. So we have a YouTube
question, but– My question is borderline
related to ethics. So you’re describing patients
as a group of variables and different numbers. How well can you
describe, for instance, a doctor-patient interaction
and all that information that a particular doctor has about
a patient and incorporate that, because that will– obviously, you can not– you can obviously not
code all the information that a doctor has regarding
a particular patient and get a machine
learning that generalizes what you expect for everybody. So how many variables
are you actually losing by not taking into account
personal interactions? I think that’s a
great question, and I think it really speaks
to the limitations of a lot of the data
sources that we use, which I try to be
very open with. Especially when we’re using
administrative claims data, we don’t have a lot of clinical
information in there, let alone information on the interactions
between doctors and patients and all the things that can
play a role in whether a patient comes back for their next visit,
or whether the patient gets that lab test, or how
that doctor interacts with different
types of patients. I think that’s an incredible
limitation of the data sources. I think that’s one
of the reasons why data integration can be
really helpful, where we combine, for example, survey
data with large data sets. So that reduces our
sample size, but then can help us interview patients on
some of these issues in a more expansive way. But I completely agree. There’s no– the machine
learning is not a magic bullet. Using large
observational data sets is not a magic bullet either. And I would also emphasize
that we’re designing algorithms largely to be used in settings
where the patient is not in front of you. We’re not trying to replace
doctors with algorithms. No. We’re talking about
the perspective largely of policymakers,
payers, large organizations that don’t have a patient
in front of them. What they have instead is
a file of billing claims or an electronic health
record or some data source that is not the patient. And when they need to make
some sort of policy action in that setting, that’s largely
what the tools we developed are designed to do, not
to replace the doctor and the doctor’s judgment. No, so for my cancer
staging algorithm, a doctor doesn’t need that,
because the doctor has other information
right in front of them to be able to say this patient
is early stage or late stage. But in order for us to study
treatments for health outcomes, I don’t have that information. If I want to improve health
outcomes for liver cancer, I need cirrhosis
staging, for example OK, I’ll do this YouTube
question from Pakistan. This so multi-platform. I love it. Data sets for mobile apps can
provide large data covering different regions of
the world and can give a global picture of disease. How can we use this data? That’s a great question. And there are a lot
of statisticians working on this question. Yeah, the international
picture is challenging to those of us who
train all of our algorithms on US data, but I know,
Sherry, that you’ve worked on more
international projects that attempt to bridge and understand
the different global context of health services. It’s definitely difficult,
depending on which country you’re studying. So some of the– I’ve done– started
doing some work in some resource-limited
settings, where we don’t have a lot
of basic information about the countries. And so we use higher-level
aggregated data. But even then sometimes
for a study of 200 different resource-limited
countries, I might only have
10 or 11 variables. We certainly don’t
have mobile app data. But there are a lot of– I think this is a
burgeoning area of work. I think that there are a lot of
people designing studies, both in the US and globally,
to study health in different populations,
in different contexts. And there’s a number
of very interesting statistical challenges
in using mobile data. When you have streaming data,
you have so many data points. The data is very noisy. It presents different
types of challenges that we haven’t seen
in other types of data. We can draw some parallels. For example, imaging data
that I mentioned earlier is also very high dimensional. But the streaming
data– do we really need a data point every half second? And can we do some
types of smoothing? But this is where you
can see that having a friendly statistician on
your team or leading this work can be very useful. Hi, guys. Thanks for the fantastic tool. I was just wondering– I’m a clinician. I was wondering
what sort of skills you’d find helpful for
clinicians helpful to optimize their working with you? What kinds of
skills do clinicians have to optimize the
working relationship? I would nominate an
openness to try things. Sometimes the clinicians are
very sensitive to the style of the journals that
they like to publish in and the methods
that are typically used in those journals and have
some hesitation about trying anything too fancy, because
what will JAMA think? So I think the
clinicians that we work with have a little bit
more bravery and willing to try to float from ideas to
the clinical audience that would maybe push the level
of method sophistication, but might really bring
some additional edge. I think we’re also seeing this
start to shift, I mean slowly, in some journals, and then
also in NIH Grant Review. Some of my clinician colleagues
still get a review back. And it’s why are you using
methods from 30 years ago? Why aren’t you using
some statistical learning or some more flexible methods
that have become very– that are well-accepted
in the literature and have become
much more popular. So my response– my first
response was communication. So I think that– and this is with clinicians
and applied researchers in general– it’s very
interesting for me– being able to clearly articulate
your research question sounds very obvious but
often does not happen. It takes 10 minutes
to get there. So sometimes I meet with someone
who is very, very excited. They have a really
cool data set. They want to tell me
all about the data set, and I just keep asking what
is your research question? [INAUDIBLE] One last question we’re hearing. Could you talk about
the limitations of data science in
medicine at the moment and some of the
challenges it’s facing, and what could be improved
to lift those challenges? What was the first part of that? I didn’t hear the first
part of the question. The biggest challenges for data
science in medicine right now. Integration across data sources
is extraordinarily challenging. So a lot of the
data we have come into us in de-identified form,
with scrambled identifiers. And trying to link
across data sets, which as you saw with
Sherry’s project that used the registries
linked with the claims– it’s an incredible opportunity
to leverage the strengths of both data sources. And when we are stymied
by restrictive data use agreement, which are there
for a very good reason, I should be clear. Those are to protect
patient privacy. But some of those
barriers, I think, are standing in the way of
doing truly transformative stuff that draws data from really
diverse data sources. Yeah, in a similar slant– I think we’ve come back
to this a few times, but there are variables, and
there’s information in the data that we have available
that are just not there. So there are things that I
want to be able to measure, so that I can study really
important research questions. And we don’t measure or
capture those variables yet. So I think there’s
a lot of potential in natural language
processing to try and get some of that information
from clinician notes. But a lot of these areas
are in the early stages or just haven’t really
proved themselves to be widely applicable. So I think there’s a tremendous
amount of work to do. But for me that’s the
frustrating thing– is that I get really
excited about a topic– and again in our
interdisciplinary department, I can knock on my wall,
and I’ve got a clinician on the other side of the door. And he tells me you can’t
study that because we don’t collect that information. It can be disappointing. Thank you so much. Thank you. [APPLAUSE]

3 Comments

  • Reply Farshid Jamali December 4, 2018 at 10:12 pm

    thank you for this video

  • Reply Chris Hemphill July 1, 2019 at 12:12 pm

    I think it’s really cool that you shared your work on causal inference! Currently reading through the works of Judea Pearl and others to better answer causal questions in my work.

  • Reply LIFE RENU July 7, 2019 at 4:17 pm

    Nice video

  • Leave a Reply