Diagnosis: Teaching, Measuring, Innovating

The critical thinking required to diagnose a patient’s condition is fundamental to our work as internists. Yet if done poorly, diagnostic errors and inequities can deeply harm patients – in fact, studies have shown that several hundred thousand patients die or suffer major harm from diagnostic mistakes each year. In this Grand Rounds, we’ll have three expert faculty explore fundamental issues in diagnosis, each seen through their specialized lenses of research, informatics, and education. We will explore the measurement of diagnostic errors, the ability to harness AI to enhance diagnostic reasoning, and the future of diagnostic excellence. This year, our department became the home of a national Coordinating Center for Diagnostic Excellence (CoDEx), funded by a $15 million grant from the Gordon and Betty Moore Foundation.

Summary

Grand rounds discussed diagnostic excellence, focusing on teaching, measurement, and innovation. Speakers emphasized the evolving understanding of diagnosis, highlighting cognitive processes, biases, and system influences. Advances in AI and health IT were discussed as opportunities to improve diagnostic accuracy, timeliness, and equity, though challenges remain in measurement, policy incentives, and implementation. The integration of AI as a teaching and clinical tool was highlighted.

Raw Transcript

[00:00] Okay, good afternoon. Welcome to the Department of Medicine grand rounds. I'm Bob Wachter, chair of the Department of Medicine. See if there's anything to say. Today's topic is diagnosis. I don't think there is any more interesting topic in medicine in general. If you think about what we as internists do, this is a huge chunk of what we're trained to do.

[00:20] do and what we do for a living. It's incredibly important. The data says that we get it wrong a lot. And now we're entering a new era where we will have increasingly interesting and complex technological tools that may help us and may screw us up. And so the topic has become, if anything, richer and more important.

[00:40] the years have gone on. I'm really proud that our department I think really is a leading light internationally in the world of diagnosis, approaching the issue from everything from how do we train people to be great diagnosticians to how do we be great diagnosticians ourselves to what are the implications in terms of safety and quality.

[01:00] of diagnosis and excellence in diagnosis to the role of AI and other technological tools in improving diagnosis, policy issues related to diagnosis. We have leaders in all of that. In today's grand rounds, we have three of our leading lights in the field of diagnosis to present to us at

[01:20] talk that's entitled diagnosis, teaching, measuring, and innovating. So we'll hear kind of about the breadth of the issue and how things are changing. The three speakers, I think not in the order, will do in the order of how they're going to show up. The first will be Goprit Daliwal. Goprit, I think everybody knows is a master clinician.

[01:40] educator and professor of medicine. Here he's the site director of our medicine clerkship at the VA where he teaches medical students and residents in all sorts of settings, the ER, the urgent care, inpatient wards, outpatient clinic, morning report, and is probably preeminent expert in the country on the

[02:00] cognitive process of diagnosis, how do people get really good at diagnosis, and he demonstrates that constantly in his day-to-day life and in lots of conferences. And I would begin running through his teaching awards, but we would have no time for anything else. So I will not do that. But he's won every teaching award we have, and sometimes we invent new ones.

[02:20] just so he can win those. Next is Andy Auerbach. Andy is also a professor of medicine here based in our division of hospital medicine. Andy is a really world leader in clinical research, particularly in the organization of hospital care, having done a lot of the pioneering work in hospitalism, and also

[02:40] and outcomes, costs, quality, and value. He leads a large research consortium, national research consortium, looking at the impact of different healthcare models, particularly related to acute care. He's an active and highly revered mentor, very, very well-published. He was the editor of the Journal of Hospital Medicine for MedStar.

[03:00] many years and he's won numerous awards, largely outstanding investigator awards. I also won't list them. And finally, last but absolutely not least is Julia Adron-Milstein, who's also professor of medicine here at UCSF and chief of the division of clinical informatics and digital transformation known as DocIT, a newly

[03:20] division last year. So Julia is the inaugural chief of that division. She also directs the Center for Clinical Informatics and Improvement Research and directs New National Center on Diagnostic Excellence that now lives at UCSF. Julia has a had an incredibly prolific career and she is probably the nation's lead

[03:40] leading researcher in the area of health IT as it relates to policy issues. But over the last several years it has increasingly turned her attention to how to use these IT systems and now artificial intelligence to improve the quality and safety and efficiency of care and she's really done ground.

[04:00] breaking research in that area. She's published over 200 papers. She's a member of the National Academy of Medicine, one of the youngest faculty to ever be inducted into the National Academy. Also, if I go through her awards, we will not have time. So let me stop there. I hope the message is coming through. This is an exceptional panel of highly prized faculty and

[04:20] department and a really important topic. So really looking forward to it and Goprit, you're up first. Thanks Bob for that very kind introduction. Thanks everyone for joining us today. So you will see that the title of our talk is we're going to cover teaching, measuring, and innovating and I hope you're not

[04:40] turned it off by that acronym. Our goal is not to give two PMI is actually the opposite, which is our goal is to give a very brief coverage of each of these areas, spanning everything from challenges in these areas to really promising developments. I'm going to start us off by teaching and take us through a bit of a history lesson outlining how we've had a change in our philosophy of what diagnostic excellence

[05:00] processes, a change in our philosophy of what the diagnostic process is, and fold that into the implications for the teaching of diagnosis. One of the big evolutions is that we evolved from this conceptualization of diagnosis. That diagnosis, particularly in its excellence form, was someone who had an encyclopedic form of knowledge. They rolled a marshal it with flair at the bedside and they rolled

[05:20] bring to the conversation of rare diagnosis that no one had thought of before. That was in our consciousness for a long period of time. But even back at the turn of the century, we recognize this, that genius diagnosticians, they make for great stories, but they don't make for great healthcare. And we really owed everyone a very different goal, which was to become accurate and reliable.

[05:40] diagnosis. And in terms of making that effort of trying to get it right, we actually had to first turn our gaze into some of the ways in which we get it wrong. And in this and other efforts, we leaned on cognitive psychologists. Here's a representation of one of the most preeminent ones, Professor Daniel Kahneman, who just passed away recently. But one of his big contributions and people in his field was to

[06:00] to teach us that when we were missing the mark, one of the ways that we were missing the mark and our goal to be accurate is that our brain is subject to biases and they come in lots of forms. We learned that we're subject to cognitive biases like the brain's really influenced by recent things that it seems if I missed a subarachnoid hemorrhage last week, I'm really susceptible to thinking everyone's migraine this week.

[06:20] could be a head bleed and also implicit biases, which is the tendency to misjudge a population of patients. Like I might oversubscribe chest pain in women to anxiety and underestimate the probability of coronary artery disease. And the key thing about bias is that it's consistently directional in one way. And while that has got to be a big deal,

[06:40] gotten a lot of attention, Professor Kahneman in his second book, Noise, outlines a probably much bigger problem than diagnosis, which that's noisy. And noise in this sense means that when you present doctors with the exact same set of facts and present two doctors with them, you're apt to get two judgments. So I may be seeing a patient who has a fever and a cough.

[07:00] cough and I hear crackles, but the X-ray is negative and I say that's not pneumonia. And another doctor may see that same combination of things and say that sounds like pneumonia to me and give the patient antibiotics. And sometimes we celebrate that as sort of stylistic differences. But from the patient's perspective, one of us is right and one of us is wrong.

[07:20] Professor Kahneman outlines it. He says, well, if you're a patient, it's just a lottery, which doctor you get on which day. And if you're a teacher watching that process, it's our student watching that process. It's also a lottery, which teacher you get on that day. And as we recognize that we had work to do, we also recognize that we have to reconceptualize what we were going for, what it meant to hit the target. That hitting the target

[07:40] and diagnosis is more than just getting it right, sort of that intellectual exercise, that we really had much more than just satisfying that goal and that we had obligations to patients and systems and society that created diagnostic excellence as a really multidimensional construct. And if you see this list here, you'd see that there's really nothing that we can argue about.

[08:00] They're all very noble and they're all great aspirations. But the big question that came up then is, well, what are we going to do about it? And if you rewind the clock back to the turn of the century, again, we once again leaned on cognitive psychologists to say, how can we kind of improve our diagnostic efforts? And one of the big things that we learned from them is that if you break down diagno-

[08:20] diagnosis, it's essentially a classification task. Like the job of a doctor is to take the few hundreds of ways that a patient can walk into a clinic or a hospital and map that onto the thousands of diseases that we know. And once we modeled it that way and we thought of it that way, it actually gave us insight into the multistep process that the brain takes when it renders

[08:40] diagnosis. And for teachers, this was resplendent. This gave us a number of different points that we could leverage in the diagnostic pathway in our teaching. So for instance, we could teach using an illness script. An illness script is an idea that was taken directly from cognitive psychology. Basically told us that if I can teach you CHF or gout in the form of a story,

[09:00] like who gets it, why they get it, and what it looks like. It's far more effective than teaching you a bunch of facts about that disease and hoping that your brain will weave it together in a narrative, or that we could use schemas to outline the approach to a problem. These are sort of the frameworks for problems like thrombocytopenia or shortness of breath. And it

[09:20] They sound like a differential diagnosis, but it's very different. Differential diagnosis is a list that we hope the brain will make it by its own order of. And a schema instead is me uploading a piece of software or any teacher doing that and saying, I hope you run this software the next time your brain sees this problem. And as we started taking these approaches in education, it was a really strong play on that goal of accuracy.

[09:40] to see, to hyperpower the brain to solve these problems. And it inspired changes in conferences, the way we wrote case reports, and even great educational initiatives like the CP Solvers that has a very enduring and deep roots in our department of medicine. But a fair critique of that is that it's very cognitive. And by that, I mean that everything

[10:00] we're talking about there goes on inside the skull of a doctor. And I don't mean it in an apologetic form because that's super important. Like knowledge is king and that is the fuel for how we make a great diagnosis. The problem is just that diagnosis unfolds in a really complex context. And so knowledge is just not the whole story.

[10:20] And what we've come to learn with almost all mental processes, including diagnosis, is that it unfolds in a very complicated context. And so the output of any doctor's brain when they render a diagnosis is this interaction between what they know, the factors that the patient may bring, like what's their health literacy and who they brought with them to the

[10:40] appointment and system factors about our health system like what's the length of appointment times in this clinic how clunky is the EMR and how easy is it to talk to the radiologist you know I may be a great diagnostician if I have a census of eight patients all the patients I have are language concordant with me and the EMR that I know and love is easy

[11:00] access. But if you move me across town and none of those things are true, and I have a census of 18 patients, I'm working with translators and I can't find my way around the EMR, my diagnostic skill may drop substantially. And insights like this kind of reminded us that diagnostic excellence or diagnostic skill may be more of a state than it is a fixed rate.

[11:20] traits. And this also, this vantage point allowed teachers to teach different aspects of diagnosis than we would. You know, if you're a teacher who's trying to say this is how you work with the physical therapist in order to render a diagnosis of Parkinson's disease, or in our EMR, the way to keep track of a lung biopsy, so the potential for a lung

[11:40] cancer diagnosis is not missed or delayed is to do this, then all of those are teachers of diagnosis, even if they may not think of themselves in that way. And as we started to step out of the skull and say, well, this is a big system that we're teaching on and we need a larger group of teachers, some who are, this is sort of mapping onto other trends like high value care and.

[12:00] patient safety, we recognize that perhaps we need to really integrate the biggest system of all. That's the society and culture that we all live in. Because every one of us is a social actor, both the patients and the doctors. And when we're in that room, this manifests in ways like implicit biases, which I mentioned earlier as examples, but also the different tools and

[12:20] calculators that we use in the course of our diagnostic work and some of which are inaccurate or inequitable across different patient populations. We recognize these as important components of diagnosis and our multifaceted look at diagnostic excellence. We needed and welcome to a new group of teachers who say one of the ways to combat those issues is to

[12:40] to have skills in individualizing patients, having cultural humility, being able to regain trust in a way that's been lost before, or maybe counteract or disregard those calculators and tools if they're inaccurate. And so as we evolve, we really recognize in these teachers how important they are because they teach us a key part of this multifaceted construct of diagnostics.

[13:00] which is there's no diagnostic excellence if we can't also achieve diagnostic equity. And so as we built on this model, that diagnosis is a cognitive process that it's situated with the patient and the health system that affects the output of our brain and that we're influenced both by the good and bad of society, we just have to recognize.

[13:20] that by expanding the tent of diagnosis, we're also saying that they were expanding the teachers of diagnosis and that many teachers have many insights on many different aspects of the diagnostic process and all of them are teachers of diagnosis even if they don't traditionally think of themselves in that way.

[13:40] Now one of the residues I'd say of having a classification model of the diagnosis is it comes with the allure that maybe you can be accurate. That is to say I can sort of take a patient and slot them into a diagnostic criteria. And that sounds great until you spend about five minutes in a clinic or a hospital and you recognize that diagnosis

[14:00] is rarely a precision enterprise. It's much more often a probabilistic enterprise. And this comes for all sorts of reasons. Part of it is that patients don't read textbooks, and part of it is because doctors don't either. Or when we do, if we read them, we sort of interpret them as we like. That is to say, we try to fit the textbook to the patient before.

[14:20] You know, I, in the last week, I rendered diagnoses of UTI, tinea pedis, cervical radiculopathy, and I would say probably I have 50 to 70% certainty that that's what the patient has. That's the probabilistic space I'm in. But the key point for education is that when we're in a probabilistic space, you don't render a diagnosis

[14:40] diagnosis by matching the patient with diagnostic criteria. A diagnosis emerges only when it can outcompete other possible diagnoses. That means that you have to have a different approach to the dialogue that happens around diagnosis. You have to have, of course, a massive amount of medical knowledge to form an argument, but you also have a different set of

[15:00] skills. You have to be able to think probabilistically. You have to be able to weigh pros and cons. You have to almost invite counterarguments and counterfactuals so that one diagnosis sort of emerges as the best. And when we think of our role as teachers of diagnosis, if this is what the probabilistic and messy aspect of it calls for, we're actually thinking a little less like science

[15:20] and a lot more like attorneys in the court of law, where really to put forward a diagnosis, you have to get skilled at making your own arguments and sort of this balance of science with rhetoric and logic. As we think of like how do we get people to get better at that? Like how do we get students in residence more skilled than doing this? One of the things I like about it is that it's really important to have a clear understanding of what's going on.

[15:40] that stands as a great advantage to humans is that we know how to do this. We engage people in something called a cognitive apprenticeship. Cognitive apprenticeship is essentially thinking out loud, but it's paying special attention to the part of the process that might be tricky, that might be hard to keep, that I might pay extra attention to, or that I myself might fall prey to.

[16:00] to so that when I form an argument, I have everything at my, I have the strongest arguments I can make to advance this diagnosis. And for a long time, it seemed like this would be the upside for the brain because although AI systems that we talked about today are great at generating answers, they're not so great at explaining them. So a long time, this seemed like this sort of ability to engage in a

[16:20] cognitive apprenticeship with the learner would be the thing that would be our job security. Then a few months ago, I started coming across videos like this. And you'll see here, there's no audio, just video, that this is lingo one. So this is an AI, video language AI model that basically is narrating as it sees video what this automatically does.

[16:40] autonomous car is doing. So it's able to just take in video input and generate language output as an explanation for what's happening. And you can think of all sorts of applications. This might be comforting if you're in a driverless car so that you know exactly what it's doing. But you can't help but make the jump and say, well, you know, if this thing is talking out loud, maybe it could take a 9-1 minute break.

[17:00] novice driver with it and talk out loud. And that may seem fanciful, but if any of you have engaged in the task of teaching a teenager how to drive, I promise you that probably both parties, parent and child, like, prefer and said a more dispassionate instructor like you see here. And then you, of course, think to yourself. You say, well, if an AI system can do driver's ed, is that right?

[17:20] is there any reason it couldn't do Med-Ed? And that seemed like a jump until you start to see papers like this. I'm Tom Savage in his group, and I'll close with this last example, gave Chachipiti a case to solve. And it did it marvelously as it's done in countless of other examples at this point, but it did something else. They asked it not only to solve the case, but essentially to show it.

[17:40] its work and it asked to show its work in many different cognitive angles. So I'm just going to show you the case not for you to read it but to understand the breadth of the text that went in there and they said here's a patient who's post-op they got a hemiaarthroplasty and then afterwards they have fever, tachypnea, tachycardia, hypoxia, and petechia.

[18:00] What is the diagnosis? The chat GPT system dispatches with it quickly and says, this is fat emboli. The petechiae are really the clue that chain, I give you the clue among all potential diagnosis. But you say, show your work. And if you put in prompts, and I'm going to paraphrase the prompts that they use, said to solve the case with intuitive reasoning. It says essentially,

[18:20] this is acting like the lawyer, it says, here are the things that make this thing fat embolus. You might say, that's great, but I also want to make sure that you conduct a thorough differential diagnosis and you say, solve the case using DDX reasoning. And it does that marvelously as well. It says, of course, here's the pulmonary embolus that you expected to see, but FYI, in case you're not familiar with it, it's not.

[18:40] when thinking about anaphylaxis and drug reactions are good considerations also. And you might say, okay, but fat embolism is really rare. Like, work out the math for me. Can you show me this being really plausible? And it says, yes, I can. We started at 0.05, but you can do it. Here's all the Bayesian revisions with each piece of information that gets it to 60%. It's not perfect.

[19:00] but that's why I think this is a fat embolism. And then someone may say that's great, but I'm not a math person. I went to med school. Just make it make sense. Like tell me what's going on here and it says I'm happy to do that as well. Through pathophysiologic reasoning it says here is all the inflammation and occlusion you could ever want if you love pathophysiology. And it's pretty

[19:20] remarkable the different angles that it can bring. All of these are cognitivists, as I mentioned before, but it's possible in the future with more data assets that even the system and society can be brought in as a form of teaching as well. And I think the punchline from this section is just this, that in 2024 at least, we probably have a preference for an empathic and skilled teacher walk-in.

[19:40] someone through this in a cognitive apprenticeship. That's the person that you want as your teacher. But there's probably no reason that Chat GPT can't be a great and terrific teaching assistant. Now close by is reminding ourselves that our goal isn't to make this process heroic, it's to make it accurate and reliable. And one big part of doing that is measuring it.

[20:00] And to help us understand both the challenges and the promises of measuring this process, I'm going to welcome Dr. Andrew Auerbach. Great.

[20:20] talk about a case, let's do a case. This is actually one we reviewed at part of our research study. A seven-year-old woman who had a hysteroscopy had a uterine perforation during the case and was admitted to medicine afterwards for fluid overload. At the time she comes over to medicine, she has abdominal pain as her primary symptom, not shortness of breath.

[20:40] 10am the next morning she develops these signs and symptoms. She gets hypotension, lack of elevation, medicine, toxic onychology. They say, we think this is still really unusual, but we think it's probably aspiration and volume overload and not anything more serious. So they start IV fluids, antibiotics, and watch her carefully. But at 2 o'clock she gets transferred to the IC.

[21:00] at 4 p.m., the team reassesses, says we should call surgery. 6 p.m., they get a CT scan. 8 p.m., she goes to the OR where they find that she has a bowel perforation and ischemia due to the pressors. And unfortunately at 3 a.m., the morning she's a PA rest and dies.

[21:20] Which brings me to the measurement question, right? Is this a good diagnostic process? How would you measure the diagnostic process? And how would you know? How would you figure that out? What data would you use? What methods do you use? So I'm going to use a very simplified diagnostic process. You have a symptom in sign, you talk the patient through a physical exam, you talk with

[21:40] to your team, you gather some data, you look at the EHR, you use all the smart things that Epic does. You sit down and think carefully and thoughtfully with your team or all on your own. And then you come with a diagnosis that's timely and accurate. Other fields, other domains that go, Pete mentioned are very important. But for the purposes of what measurement we've been focusing on so far, this is

[22:00] usually what the field has been focusing on, timely and accurate, which means your diagnostic opportunity was very small. You didn't miss anything. There was no opportunity to improve, which meant the distance between what you thought the diagnosis was and what it ended up being was essentially the same. It made the diagnosis at a timely fashion.

[22:20] quite go quite this way. You're doing your usual four steps. You've seen the patient, you talk to your team, but some imaginary clock goes off where you should have made the diagnosis, in which case you get a delayed diagnosis. To that point, you've had a misdiagnosis, becomes a correct diagnosis just to know the lexicon that's out there. So here's an example that comes up reasonably frequently. A patient you think

[22:40] because heart failure actually is tight AS. And in this case, the correct diagnosis is maybe considered, but deprioritize. You just see this in notes. You think about this in your own kind of rounding. And the timing was clearly off. This is something you should have probably thought of earlier and maybe the diagnosis where we were the right tests for teamwork. Other questions?

[23:00] cases are more like this. You start with diagnosis A and it turns out the real diagnosis is diagnosis B, something completely different. This is what we often think of as a diagnostic or this is the classic missed diagnosis. It's like, oh my gosh, why do we not think of this? And here's a classic one from the ED literature person that comes with dizziness, comes back three days later with a stroke. In this case, the correct diagnosis

[23:20] diagnosis was largely unsuspected, or at least undocumented, at least in our work. But the working at correct diagnosis are so different that the timing is almost unimportant or they should have been made at the same time. So just again, thinking about the measurement challenges here. Now, life is never quite that linear. You start off with your forced test, but like,

[23:40] The journey from there is a little more circuitous. You're doing multiple things at the same time. You're seeing the patient, you're revisiting things, you're rethinking your team, you're talking to the patient again. And at the end, you've got this kind of central probability, but some uncertainty around it, kind of return to groupreits, a kind of cloud of electrons. This is the Schrodinger's cat.

[24:00] at some level of uncertainty around the diagnosis, you largely feel comfortable with, but you're not quite 100% there yet. So I'll use the presenter's prerogative and say, there are roughly three big ways we've measured diagnostic errors, diagnostic problems, and they fall into the autopsies largely.

[24:20] This idea of symptom diagnosis, discordance, which is kind of like the vertigo stroke thing I mentioned a minute ago, and then chart-based methods. Autopsy is really good, but you have to get the autopsy so we do not do a lip most frequently as we used to. The validity is really high. You can certainly see an unsuspecting.

[24:40] cancer or infection on the autopsy, but they don't usually lead to something a doctor could have done differently. They point out the gaps in our diagnostic process broadly, but may not what a doctor or a health system could do. The symptom diagnosis discordance approach is one that's really easy to do. It often relies on administrative data.

[25:00] or admitting diagnosis as opposed to discharging diagnosis or clinic diagnosis at what appears later on. And smart people say, well, that will create an algorithm that compares these two to find populations of patients where something might have been missed. So it's really easy to do. We've got lots of data out there so you can turn the crank really quickly. The face value is very high because you can see

[25:20] say again, dizziness becomes a stroke a week later is not probably a problem we should be paying attention to, but it's often results in a system-focused change. It doesn't mostly point to physicians becoming better diagnosticians. It points to the system more often than not. The last one is chart-based approach. As we actually look at a chart and review the process

[25:40] and kind of understand what was going on on the team's mind or the doctor's mind or the patient's mind when they came to the hospital or the clinic. This is not at all easy, but it has a lot of validity because when the approaches that have been used use one or more doctors to review it. So you get a kind of a peer review almost of the case. And then you often can get insights you can't get from administrators.

[26:00] And I tell where the wrong fork in the road was taken in a patient's care. So this last approach is essentially what we just published on earlier this year. We looked at 2,400 patients who died. I went to the ICU at 29 hospitals across the US. We had two and sometimes three doctors review each record.

[26:20] And we had asked three questions. Did an error happen? We used a standard tool for that. Did the error cause harm? And it's important to know whether it caused harm to our patients. And of 50 possible diagnostic processes ranging from what did you do with the right test, did you follow up on the vital signs appropriately to do an appropriate physical exam, could we have improved anyway?

[26:40] were. So on the return to the case, and I gave you just the most condensed version of this case as I could, clearly it seemed like a delayed recognition of perforated bowel essential to this poor person's care. It we thought it led to her demise, I think if we had recognized the perforation

[27:00] earlier in the day, she might have had a different outcome. And then there were at least three process faults that we thought were evident from the medical record. So the abdominal pain we thought was very much out of what we expect for just translocation of fluid at the abdomen. We thought the CT could have been ordered earlier, particularly the next morning. And then overweighing a low

[27:20] likelihood diagnosis both on the medicine team as well as the gynecology surface team who felt like they were still thinking food overload and aspiration being more likely than something more worrisome. Okay, so now do this 2,427 more times. That's essentially what our team did, like I said. 23% of patients in our cohort had diagnostic

[27:40] They caused temporary permanent harm or death in 7, in 18%. And across all deaths, the error was the cause of death in 7%. So in kind of the TMI part of the discussion is coming up now. So this is the table of our prevalence.

[28:00] measure you can start saying how often things happen. These are the diagnostic processes we screen through. This is 50 reduced into 9 big buckets. So you see some range of prevelences here. And you say, well, which are the ones most associated with the subsequent era? And these are the odds ratios. And I think you're trying to think about turning measures into

[28:20] improvement are things we could teach each other to do better. You can look at this combination of prevalence and odds of issues that create something called a population attributable risk or essentially the potential error reduction if you limited that factor. So think of this table now as the opportunities to improve diagnosis. And in our study, it was

[28:40] these three areas, picking the right test and following up on it appropriately, following up on monitoring of flight science generally. And then this larger thing, which Group P was mentioning quite a bit and did much more eloquently than I will, just like how do we think about patients? We can't tell in our study if the assessment was the reason you didn't do the test with the follow appropriately or the other way around as an example.

[29:00] And this is my second TMI slide. So that was the overall results from article 1. This is a heat map of those same nine buckets across the 29 hospitals. And there's two things to take away. Boy, who knew healthcare was complicated? And second of all, if you're going to start thinking about general asymptotic solutions as both a broader

[29:20] approach you need to take as well as something you need to think about for your local learning context, your local systems. Now you see very different findings. I have to shout out my colleagues at the Brigham, New York Law School, I'm released and who did this work with us. I'm really optimistic though I'm supposed to end with an optimistic uplifting. So I think

[29:40] AI is really here. This is a maybe not optimistic title. This is something New York Times maybe a week and a half ago about AI having a measurement problem. If we're paying attention to the things like equity, timeliness, accuracy of diagnosis, we can train the models more accurately. I'm a little worried still about the source data we're gonna use, but when I think about taking those 2,400 cases we did by hand and handing those over to the AI.

[30:00] model that can tee these up for us to look at if we're teaching each other or teaching our learners. This seems a huge opportunity and something that we're actively working on with Julia and her team. I also say that measurement it really started the field of patient safety broadly. You made this more even born this came out. This is probably the most important patient safety health service research paper of its own.

[30:20] last half century. This is the medical practice study in 1991. And I always point out here that diagnostic errors were number two on the non-operative causes of error at the time and as a paper, this whole paper prompted the whole field of patient safety to emerge. I don't think we ignored diagnostic safety. I think it was just viewed as a system.

[30:40] problem, I think the biggest change has happened in my lifetime is viewing diagnosis as being out of the system as much and now into our hands and our minds again. Now, Julie's going to talk a lot more about AI and the electronic health record, but I also pointed out the things that we did in response to patient safety in that paper were partially electronic.

[31:00] We all race towards electronic health records. We also changed how we work with the electronic system. So I think something else I think about how we do our diagnostic work in and relationship with the EHR. We changed teams. I think the most important safety intervention, the ICL looked at the election year perhaps is putting a pharmacist. ICU team is one of the most important safety interventions.

[31:20] So that was separate from either EHR or the human technology interface. And I think thinking about policies and standards, like how do we integrate this kind of learning back into how we teach each other and manage our health care are really key. We are taking our own advice to heart. We have a follow-up study at 14 sites doing a central

[31:40] Essentially the same thing I just mentioned, we're taking the case you've used being done concurrently here and other hospitals, feeding them back to our safety infrastructure locally, and then using that same data benchmark naturally to kind of deal with the complexity of the heat maps. And then using that information to really develop local countermeasures, UCSF is using our data to think about how patient safety could be improved here as well.

[32:00] working with Julia and some of our team to think about what our AI models might look like and then again broader insights that can be applicable and make generalizable AI clinical assistance programs. So I think I've set up the AI part of this talk that you're all here for but thank you again.

[32:20] Okay, well, I'm honored to round out our trio. I have to say I'm still recovering from an out-of-body experience when I think Gopri used the terms EMR that I know and love. So just to start there, that's good news. So yes, we've already heard some hints at the innovation space. And overall, I would say as we look

[32:40] at the broad concept of innovation within diagnosis, there's been a lot of unevenness. Some areas that have had a tremendous amount of innovation and other parts of diagnosis that really look the same as they have 10 or even 20 years ago. So what I really want to do is to present a framework for how to think about that unevenness and I hope present a vision for the future.

[33:00] where we can even that out and have innovation really across the spectrum of different components of diagnosis. So the area where I think we've had the most innovation is actually something that I'm quite confident everyone in this room has done at home to support their diagnosis. And that is a home COVID test, right?

[33:20] about how transformational this was in the pandemic, what it allowed us to do that we couldn't do before, perhaps the most important innovation and diagnosis that has happened in the past five or even 10 years. But this is a very specific thing. It is a diagnostic test. It very clearly defines the presence or absence of a disease condition. And that's only part of.

[33:40] how to think about the different areas of diagnosis that require innovation. So what comes next maybe is this middle category of a test that is used to diagnose. Subtle, but an important difference. It provides information that's used in the diagnostic process, but is not in and of itself a diagnosis, right? So that way,

[34:00] then be a piece of information that would need to be fed back to a clinician or even a patient to help inform arriving at the ultimate diagnosis. And then that's even different than the diagnostic process itself, which is largely what you've heard about today, which is the use of information from tests and other sources, right? The patient, what they may

[34:20] have told you about their past history to make a diagnosis. And this is very different because unlike sort of a lab test or an imaging test, this is really an iterative process of reducing uncertainty. So we're moving from the notion of a test or something that we often call the diagnostic to the diagnostic process.

[34:40] And then imaging is almost an interesting story within a story as we think about diagnosis because it's used very heavily in the process of diagnosis and in some cases is really more like a diagnostic test because it gives a definitive diagnosis. Or in other cases it can really head into the right stream.

[35:00] because all it's doing is providing more information to the team to then ultimately make that diagnosis. And important to also recognize that there's also a human involved in imaging, right? The radiologist who's reading the test, making an interpretation, writing it down. Right? So there are multiple layers in which the human cognitive process that you're pre-described.

[35:20] is coming together to form into this larger process of diagnosis. So again, as I characterize where innovation has happened, I think we've seen a lot of innovation on the left side here, diagnostic tests, and much less innovation around the diagnostic process itself. How do we support innovation?

[35:40] that process of different people coming together to work with information to reduce uncertainty to the point of being able to move forward with a treatment plan. So again just to make sure that we're really giving credit where credit is due in terms of the progress here. As we think not just about

[36:00] broadly, but specifically where AI artificial intelligence has had impact. We've seen a tremendous explosion of AI-enabled diagnostics, particularly using imaging. The key here, what has enabled this explosion is labeled training data. Basically, the ability to take a test or an image and say, this was present.

[36:20] present or this was not present. And once you have those labels, you can then train AI to start to understand, well, what is the features of that image or of that physiological state that allowed it to be true in one case and not true in another? So we've seen many different examples of where this model has worked, where labeled training

[36:40] data has led to the ability of AI to be a really powerful diagnostic, things like detecting polyps and colonoscopies, detecting lung nodules and CT scans. And here at UCSF, one of these that we pioneered was detecting a pneumothorax in a chest X-ray. And this has come so far now that if you

[37:00] go and buy the GE X-ray imaging suite, this AI algorithm to detect the pneumothorax is actually built into the X-ray machine itself. So when you do the X-ray, it tells you right then and there whether it's detected a pneumothorax or not. So again, tremendous innovation that's happened in these areas where we've had this clearly labeled training data.

[37:20] and can train an algorithm to say, yes or no, this diagnosis is or is not present. So let's get to this right now because that's where I think we really want to be able to focus on making progress moving forward and to change this from an area where we'd say there's maybe been less innovation to an area in which there's been more innovation.

[37:40] So again, we want to evolve from targeted AI solutions that can predict diagnosis X to AI that can act as the cognitive support to the diagnostic reasoning process by doing the things again that you've heard about today, right? Identifying the most relevant information, presenting a set of best next steps options, and then helping refine.

[38:00] and reduce uncertainty over time as more information comes in. And again, what's hard there is that we don't have that nice, easy diagnostic label, yes or no. In this process, there's going to be many just changes in uncertainty, probability. And so it's a much harder prediction task when you're dealing with the situation on the right. It's squiggly.

[38:20] lines, not exactly sure where we are, where to go next. So the first thing we need to do is to say, like, can we just clean up those squiggly lines a little? Can we even understand the diagnostic process in a somewhat cleaner way? And so I was fortunate to work with Grapreet and some other colleagues to put together a new framework for how to understand the diagnostics.

[38:40] process, which we labeled wayfinding, this concept that the team is really trying to do wayfinding to get to the diagnosis. And that it starts with a patient presenting with signs and symptoms, and then the first thing that has to happen is organizing information and prioritizing it, saying what's important, what's not here.

[39:00] moving on to integrating the information and making basically a first schematic understanding of where to go next, and then formulating those next steps and deciding what's going to be the most useful in reducing uncertainty, and then continuing to move through that process iteratively until uncertainty has been sufficiently reduced to move forward into a treatment decision.

[39:20] So with this concept of wayfinding, we can then at least have a model for around which to organize tools that could then help support it. So in order for this model to be the basis for innovation, there are two things. First, again,

[39:40] talking about this labeled data is really the key to the innovation we've seen so far. Do we even have the data available to build tools that might support diagnostic wayfinding? And then if we do, how do we ensure that they're usable, that in this complex process, they're actually tools that are going to be able to support clinicians and care teams in this process of dynamic refinement.

[40:00] So I want to talk about each of these very briefly. In terms of feasibility, I do think we have some data and as ChatGPT comes online, we may have a much richer dataset from which to draw from that can help us with this notion of understanding the thinking trajectory that the

[40:20] clinician is on and where there might be opportunities to supplement or better organize information. And some of this information is hiding in plain sight in our current electronic health records. And this is because we have what are called audit logs, which basically track everything that every clinician is doing in the EHR at every moment of the day. You may have

[40:40] not known that and it may seem a little bit big brother-ish, but it's there because of HIPAA, our favorite regulation. But what it does do is create this treasure trove of data and in fact several people have suggested to me that a great NIH grant would be to pull this data for Grapreet and say what is it about his footprint in the EHR?

[41:00] that looks different from others that may be the secret sauce that makes them a master diagnostician. So again, it's this really interesting behavioral data so you can see what information the clinician is looking at, what they maybe do next in terms of placing an order. So this is the first data that's not as clean as labeled training data. It's not a yes or a no, but there's probably a signal here that's

[41:20] telling us something about that clinician's journey on wayfinding through a patient's diagnostic process. And that, I think, in and of itself is promising, but not a blockbuster. But when you combine it with the notion that our clinicians may be increasingly using chat GBT logs to make their thinking explicit in the same way you saw the self-proxisting process.

[41:40] driving car, narrow rate, right? For the first time, we might then be able to capture data on exactly what the clinician is thinking. What are they considering? What aren't they sure about? And the ability to combine these two types of data, I think could be the blockbuster opportunity to again, really understand clinical reasoning in a measurable way, and then be able to build AI tools.

[42:00] that would sit around that and support it. So that's the feasibility piece. Again, I think it's, you know, it's still exploratory, but very promising. And then what about usability? Like, how do we design tools that clinicians actually want to use, which is, I think, not a given. We've seen a lot of tools that have been developed that really don't fit into

[42:20] clinical practice in a seamless way. So I think the first thing we have to do is make sure that we're collecting data on patient signs and symptoms. This is actually a type of data that's not routinely captured in a structured way in electronic health records and is problematic because it's the entry point to this whole process. And then secondly, we need to make sure that we don't just build up.

[42:40] AI models that get to that final prediction and basically ignore the entire wayfinding process. So to tell a care team, here's your patient, given their signs and symptoms, it's diagnosis X. It completely ignores the cognitive process that sits around diagnosis that we need to support. And so instead, what we need to think about is building AI tools that, again,

[43:00] are going through this process with the clinician. So moving from signs and symptoms to what is the most important information, making that be the most visible within the electronic health record or whatever environment they're working in, and then to start to organize those potentially around different diagnoses and then suggest the best set of next steps. So it's really thinking about AI that's more organic and more effective.

[43:20] organized around a process rather than just jumping to the end of the answer, the final solution. And then if we look, I think even beyond that, right, we want to move from AI that acts as a cognitive support to diagnostic reasoning to even a broader set of things that could be done, right? So like providing feedback to the clinician on their reason.

[43:40] If we can detect systematic deviations from what's predicted, that's an opportunity for feedback to the clinician to say that we see that you are sort of not on the trajectory we would have expected and why is that. So that's even sort of beyond just the cognitive support piece, but the feedback. And eventually I think what we'll be able to do

[44:00] do is to start to eliminate steps in the diagnostic process. If we can see that patients who are presenting with a set of signs and symptoms, 98% of the time will get diagnostic text X. Why not just start with that diagnostic test to begin with? And that way, when the care team encounters them, they will have an additional piece of information that would likely speed

[44:20] time to diagnosis. So we can start to understand this whole pathway in a much more data driven way that could create insights to lead to more timely diagnosis. So that's the very long, long term vision for innovation in this space. As I said, I think the first is really just moving to this model of AI that's designed to do cognitive support for

[44:40] the diagnostic process. And I just want to end by saying that there is many opportunities to get involved in different dimensions of diagnosis here at UCSF, but one that I hope you'll keep on your radar is a new national coordinating center that we've launched called CODEX. We will be, this is going to live within my division. We will be announcing the fact

[45:00] director soon and this will be opportunities for events, convenings, we plan to launch some action collaboratives and so if you're interested please do reach out and we'd love to get you more involved and with that I'll pass it back to Bob. Thank you that was terrific incredibly interesting topic and beautiful presentations.

[45:20] Let me start with on the, you did not give a dimension of red flag diagnoses or it was all sort of here are the probabilities, but I think the diagnostician is also thinking hard about can't miss diagnoses and weighing in the severity and the downsides of blowing it as they think about the list.

[45:40] How does that get layered into any of these, whether it's your own cognitive processes or an AI-based cognitive process and coming out with an output?

[46:00] not missing them, and I think for good reason, because they cost patients life for limb and other things. But I think in terms of, you're saying, I just want to clarify in terms of how- As you went through the different ways you can think about diagnosis, there was no way, the probabilistic was the most likely thing is, this is fat embolism. It wasn't that the can't miss diagnosis,

[46:20] which may have fairly low likelihood, but if I miss it, it's terrible. That sort of plays in your diagnostic reasoning in a major way. Yeah, and I simplify that. Sometimes when we talk about how we give diagnostic output, we can think of our brain or we can think of a computer. We sort of have this prioritized differential, most likely, somewhat likely, unlikely, and then we give extra attention.

[46:40] to the Cantonese diagnosis, which is way disproportionate to its probability for the reasons I mentioned. I didn't highlight it there, but I think that's a separate channel that is always running. And of course, we do that in our profession through training, and you could easily program the computer to do this thing. So yeah, you would have to say to the computer, not only give me the list by probabilities, but give me the list by madness.

[47:00] or the consequences of missing something. I suspect if they had put an engine just in the exercise I did, just what I read from the paper, I bet if they conditioned the model to do that and it had knowledge of the severity of those outputs, it probably would have done it through a handle, would just reorder this in severity. You could even do expected weights. Yeah, yeah. I think that'd be an important angle to take. Great.

[47:20] your study, which I assume costs someone a whole lot of money for all of those human chart reviews, can those chart reviews be done by AI now? Hoping. We're starting to work on that now as Julia's group taking GPT for a versa here to see if we can recreate our chart review tools based on the notes and labs we used at UCSF. So that's certainly

[47:40] Each of those took between 30 and 40 minutes to do so just teeing them up so that another reviewer can at least look at a pre-filled adjudication form and Call it yes, no or agree with it is a key for them. So it's two different things one is can the AI Summarize the chart for you in order then to pitch to a human in a much

[48:00] more simplified form that saves half an hour. The second is we don't need the human anymore. Can that be done? And that's the second phase. Most of us have proved that we can create these, essentially, these algorithms that detect a DE, diagnostic error. We can then turn it loose to find diagnostic errors that we didn't suspect over an existing repeat process like our M&Ms.

[48:20] code, blue, community, those kind of things, we'll be able to use this as a tool to screen separately. Right. One more question for you is part of Julia's new center, Cane Beef, caused the Gordon Betty Moore Foundation, which is incredibly generously funded work in diagnostic excellence to the tune of, I think, over $100 million over the last several years. Part of their goal was to come up

[48:40] with easily measurable metrics for diagnostic excellence, which then could be publicly reported, they could be part of payment systems, you have an incentive if you're better and you get dinged if you're worse. How much closer are we to having good measures? I mean, we can measure rates of central line infections and rates of fall.

[49:00] and urinary tract infections, and part of the reason that diagnostic excellence didn't compete well against those other things in the way we think about diagnosis was that we're in good measures. How far along are we now in having better measures for, let's say, public reporting? Is UCSF a good hospital? Great question. I think the more financial invested heavily in it

[49:20] that symptom diagnosis discordance methodology which scales really well but kind of comes into the historical purposes this idea of death and low rate the low risk DRGs or theodoresque which is a measure we tried many many years ago which didn't really compel change. I think the approach of using AI to find actual care processes that were fall-

[49:40] and errors that would like you to happen, maybe get more like the balloon time equivalents for diagnosis. They may be very different across settings or diseases, but certainly we can start getting there much easily because we can scroll through the record and we have much more like a doctor would. But today, is there anything, as I go online to try to figure out, should I go to UCSF Health System or any other health system?

[50:00] another system. Is there anything there that's going to tell me that we are better at diagnosis than another place? I know we are, but just stick with that, Bob. So I would say not yet. I think we operate largely off our clinical outcomes, our cost and value, which kind of imply diagnostic efficiency and effectiveness, nothing that says exactly.

[50:20] exactly how well we approach the clinical problem, how well we communicate back to the patient and so forth. Yeah, Julia, do you wanna tackle that as well? From your perspective, where do you think we are in kind of the policy environment? Cause a lot of what the Moore Foundation was trying to do was change the dynamics in the policy environment to provide an incentive for us to pay more attention to diagnosis.

[50:40] Yeah, I think the incentives are still pretty weak to prioritize diagnosis over other quality and safety. And to some degree, it's because of the lack of, I think, scalable measures. And to some degree, I think we've also thought about the malpractice system as the thing that creates the incentives for diagnosis. But I think it creates one set of incentives, but perhaps not.

[51:00] strong enough general set of incentives to really improve the entire diagnostic process. Yeah, which will become a big issue if all these AI tools come out and maybe do help you improve diagnosis but they cost you know 10 million dollars to scale across the system. Do you think today the system has an incentive to buy that?

[51:20] question. I mean, I think on the one hand, something that really feels like it's, you know, again, make a huge improvement in quality and safety. I mean, I think yes, there are incentives there, but you know, against what is, you know, what does the competing investment? And so that's where I think, you know, it's not clear to me, I don't want to say it's a no, but I think it would, you know, we'd have to really think through.

[51:40] I think the other thing we have to recognize with the technologies is that they haven't shown a great track record of rapid uptake. And so I think there is just some general skepticism too, that we really will come up with the killer app, so to speak, here for diagnosis, that will totally revolutionize it. Again, I think imaging, we're starting to see that. But whether that will

[52:00] even exist in a way that we believe is true. Yeah, it was interesting when you mentioned the killer technology was this COVID home test. And I think about when I'm on the wards, the piece of technology that most changes the probability they'll get the diagnosis right is the video interpreter. You know, that turns out to be massively useful. It's easy to use.

[52:20] it solves a really important problem. It's easily accessible. So if anything in the EHR and decision support, that's the thing that to me is the game changer. I don't know. Well, just to circle back to the business model around diagnostic error reduction, we're starting to look at this and the data I showed you. And UCSF probably lost $20 million that year due to diagnostics. We project the error costs and

[52:40] across all potential deaths and ICU transfers. That year, probably 12 to 13 at ZSH that same year. So I think there's a business model there for the problem. Whether we can find an effective solution is better than the ad model to have to prove itself. And we didn't lose it because somebody dinged us because of diagnostic errors. We lost it because we had paid a fixed amount of money for a given patient and they spent two weeks in the ICU.

[53:00] costs due to each error, and what excess length of stay do they error, and also that means what length of stay opportunities for other admissions were lost too. So you add those three things together, it gets into the eye-catching numbers per year. Great. Julia, maybe last question and I'll open it up. The wayfinding model, you made an interesting point that I guess I want to put

[53:20] Sean, we should not go from this all this big collection of data with a straight line to the diagnosis or maybe why do we need a thing called the diagnosis? Why not? We just we need a treatment approach that is demonstrably connected with the set of data leads to a better outcome. You said we shouldn't do that.

[53:40] because you have all this human stuff along the way, we got to walk the clinician through to doing the right test, getting the right test, why? Yeah, it's a great question. And I think there maybe are cases where we don't need to do that where like the line is so direct. But I come back to Graprit's example, right? Like for many situations, the answer will.

[54:00] be 60%, not 100%. And I think when we're still in the 60% range, we will need the human stamp of yes, we should still move forward with this, even though we're not at 100%. Because again, we've made the argument we think like all other options are, you know, are not the right ones. So I think it really is a sort of how certain are we decision and as we get

[54:20] towards 100% with certain diagnoses, then I think we'll start to get comfortable saying, we don't need a human in the loop at all. But for diagnoses that I think are sort of in the, you know, substantial amount of uncertainty range. That's where I think in the near future, both on the clinician side and on the patient side, we sort of won't be quite ready for a human to be out of the loop. Got it. Actually, before I open up, I have one question I have to ask. Goprit, how are you using GPT-E?

[54:40] or whatever in your day-to-day clinical life. I use it all the time. We were using a morning report today and I use it quite a bit. I was telling someone that I click on chat GPT much more often than I click on my up-to-date app now and I hope that doesn't come across as a commercial plug. But what it does is it sort of gets to Julia's point that the questions we have are the issues we deal with oftentimes very customized for patients.

[55:00] Like today, I just need to remind myself, what is the sella turcica? Is it a bone or is it part of another bone or what does it do? And I could have scrolled through web sites. That was a planet. Talking about the pituitary. And you know, that simple question in millions of websites, it's certainly written and up to date, but it took me probably 10 seconds to get there.

[55:20] get exactly what I need to know. Like it is a bony structure, but it is a depression in the sphenoid bone. And it answered that and dozens of other questions and report much better than my surgeon could have done. There's always the question of accuracy, but I think the same is true of all the other websites that would have used as well. I'm like you, I'm using it much more than Google and much more than up to date. And the other advantage is it's contextual.

[55:40] You'd say, I have a patient with this disease, this disease, and this presentation, and there's no nothing on up to date that does that that puts it in that context. Then you'd have to search there, but you wind up saying, I needed a different answer, and is there a structure for you? All right. Say who you are. Sam Bronfield, Oncology, United States, awesome talks. So you all have talked a lot about-

[56:00] about living in a probabilistic space and diagnostic uncertainty. I was curious if you could comment a little bit on communicating with patients about that diagnostic uncertainty and those probabilities. Yeah, I think it's a very important skill. There's a lot of literature now about uncertainty and diagnosis that ranges from the philosophical.

[56:20] to how to manage it among yourself as a doctor. I know Dr. Sanpich has done stuff about communicating between physicians in uncertainty. But in the literature about communicating uncertainty with patients, I would just say it's a mixed bag. There's a lot of people who say you should do it all the time, but studies show that different patients respond to it differently. There's even different ways in which it's framed. So some people

[56:40] will say like I'm leading with this diagnosis but I'm considering others and other people will lead by giving much more of a probabilistic spectrum and even those linguistic choices leave patients feeling differently. So all I can say is it's definitely not you should always share your uncertainty about everything all the time and yet there are certain patients who really welcome it.

[57:00] Of course, maybe this gets a little bit to the heart of medicine is figuring out which patient is sitting in front of you. Yeah, yeah, in the back. Hi, I'm Alar from Pulmonary. I have a question about how we'll rely on guidelines once AI becomes more prevalent. So the example that comes to mind is empiric treatments that are not without adverse consequences, like empiric antibiotics.

[57:20] In the diagnostic phase, we use them for hyperionization. These can have adverse consequences, and we rely on guidelines a lot. You think in the coming years, as we have more probabilistic information from local data interpreted by AI, we move away from guidelines.

[57:40] and rely more on the last year of UCSF Medical Center data to guide whether we should treat with bank and zosyn.

[58:00] It could be what the American College of Cardiology says. It could be what a review of all of PubMed says, and it could be that the last million cases that look like this at UCSF or the UC hospitals looked like this or did better with treatment A than treatment B. How's it going to choose?

[58:20] in the world of living with more, what we call real world evidence and needing to blend the two. I mean, I think guidelines have been rigid in a way that we've acknowledged for a long time. And I think now we will be able to bring more evidence to the question of when it makes sense to deviate from guidelines. So I think that's the best case scenario, was sort of you start from the guideline, but when you decide to deviate from it, it's because you have strong.

[58:40] evidence to support why that's the right thing to do. Andy, anything to add? I think the other part of this is when you see deviation, is that a new discovery? Is that a new disease, that new treatment we should be exploring further? And that's the virtuous cycle of the learning health system really starts. And I think AI can sift through those a lot more quickly than we could 10, 15 years ago. Yeah. Other?

[59:00] Yeah, question. So from the 100 people online, they have two big questions. One is about is there any way to predict prospectively who gets sick before they develop signs and symptoms. And then the next one eloquently asks, by participating in sharpening and honing the use of AI clinical reasoning, are we contributing to our eventual obsolescence?

[59:20] That's Dr. Antonio Gomez. You asked that. Fabulous questions. So the use of it for prediction. Why don't we start with that?

[59:40] start with the data, right? Because if we can't observe, right, the signs and symptoms, which do begin in the home. So then it's the question of like, are you observing GPT or Google searches, you know? So I mean, to the extent that the data is there, I think you could move towards that. And especially if you connect the sort of, you know, patient generated data to what's happening

[01:00:00] come into contact with the health system? Like yes, I think that model is conceptually feasible, but I think right now the main issue is like we just don't even have access to the data to fuel those predictions. Yeah, I mean prediction, it's sort of trying to disentangle what is a prediction, what is an early warning sign that this is happening, but it's sort of below our normal threshold to pick it up.

[01:00:20] what is the risk factor. And they're all kind of different. But you can imagine a world where if the AI is sifting through all pieces of not just clinical data but sensor data and other stuff that's that's that's flowing in it will all the sudden or not all sudden it will begin giving us earlier warning that something is going on. I'll just maybe the other I guess the other question about the

[01:00:40] So, dystopian terminator and game, we're not doing that. Or perhaps that's decades away, but I think in the short term what we're really talking about is just having another tool. And if the tool performs certainly really, really well, operates at 90% level, 95%, 99% level, we may turn that thing over to the tool. Is this thing a pneumonia? Should I give chemotherapy?

[01:01:00] therapy in this case. But as we talked about, there is a massive amount of uncertainty and sort of the practice of medicine. And this is the only thing technology ever does that just shifts the uncertainty frontier to some next question. And as long as there's uncertainty, you'll always need a doctor to navigate. So I'm not worried about it. If your kids were in college and thinking about going

[01:01:20] medical school, would you have a different answer? No, no, I'd tell him to go for it. Oh, you do? Double down. Yeah. Maybe take a data science major. Any other any other comments on on that issue? So I would say that I think risk prediction and the deterioration indexes are going to be around. They move in C statistics from 0.95 to 0.97. Whether that improves lives.

[01:01:40] does not clear to me, but I think AI is going to help us more interpreting how we're thinking in the chart. We're looking at chief complaints from the upside study right now, and the patients who had diagnostic errors are chief complaints for vague, kind of syndrome-based and not kind of disease- or symptom-based. So it's like failure to thrive, which I think you call it failure to think.

[01:02:00] Which I think is, so I'm not saying the doctor wasn't thinking about the reason for it, but I think they either were too busy, they were swamped for other reasons, or the patients are very complicated. So both of those are opportunities to improve diagnostic processes. And I think just that's another way to add that. Just reminded me of that failure to think, reminds me of Merle Sandew as the chief of medicine.

[01:02:20] the county for a long time said, show me a doctor who needs a sed rate and I'll show you a patient who needs a doctor. Julie, did you have something to add? I mean, maybe I'll just say, I think in the near term, the opportunity for AI is both in the biased slide and in the noise slide, you know, where you can just think about like, we're asking humans to remember more than a human can possibly.

[01:02:40] possibly remember and sort of perform it a way like you know that's just beyond what we know humans can do and so the AI can just help be that companion to sort of support humans in the way that I think has always been unrealistic like that that's Really to my mind like the first step here is as that sort of companion. We're over time I have one last question because I think this is going to become very real very quickly here

[01:03:00] We are rolling out digital scribes now, and I think 100 of our busy sample tory docs have them and I predict within two years it will be standard that everyone will have a digital scribe. What it will mean is that you will no longer be actively creating the note. The note will flow out of a direct conversation that is picked up and becomes a note.

[01:03:20] that when I'm writing the note, that's often the time where I process the case in a different way and think, oh my God, I didn't think of X until I'm writing the note. We're pretty worried about that? I look forward to that and I am worried about it. There is no doubt that writing is most tedious and it's a really critical part of the thinking process. All of us know that when we sit down and write, we generally reveal the truth.

[01:03:40] some ignorance. Like you know, you start writing, you say, actually don't know what this drug does, or I sent that patient out and now it doesn't make sense. I'm trying to put pen to paper. And so I do think there's this risk, the idea would be that we'll turn from being writers to editors. But when you edit someone else's work, you're only editing what they put on paper. If an idea wasn't generated, we know this from the diagnostic literature.

[01:04:00] The most important predictor of getting the right diagnosis and having it is one of your early hypotheses. And if whether it's me or the AI doesn't get it there at the first step, then there's very little hope of it showing up later. I just want to say that our experience from our first hundred users has been that they find that it's a tremendous relief to not worry about forgetting something. And so they feel like that the AI is doing

[01:04:20] much better job of remembering everything, especially if they're not writing their note immediately proximal to the encounter. And so I do think that we will get more comprehensiveness, but then some of the trade-off might be the depth of thinking. So, you know. Yeah. And there's also another trade-off that they actually have a little bit more bandwidth to talk to the patient and actually listen rather than multitasking at the end.

[01:04:40] that time. So it'll be an interesting trade-off. I assume you and your colleagues will study it and tell us how it's working. We can go on for hours but we need to stop. Thank you all for an absolutely terrific presentation.