Thursday, February 16, 2012

In medicine, beware of what seems too good to be true

Update 2/17/12:
A reader brought to my attention (thanks!) a very slight inaccuracy in the first table below, which I have corrected. I did the calculations in Excel, which, as you may know, likes to round numbers. 

File this under "misleading." Here is the story:
What's the Latest Development? 
A California start up has developed a breath test that can diagnose lung cancer with a 83 percent accuracy and distinguish between different types of the disease. The procedures which currently exist to test for lung cancer, which is the leading cause of cancer deaths worldwide, result in too many false positives, meaning unnecessary biopsies and radiation imaging. The new devices works by drawing breath "through a series of filters to dry it out and remove bacteria, then [carries it] over an array of sensors."  
What's the Big Idea? 
The company is now testing a version of the machine 1,000 times more accurate than its latest model, which could increase the accuracy of diagnoses to 90 percent, the level likely needed to take the device to market. Because the machine is not specific to a particular group of chemicals, the breath tester could, in principle, test for any disease that has a metabolic breath signature, for example, tuberculosis. "A breath signature could give a snapshot of overall health," says the company's founder, Paul Rhodes. 
Am I just being a luddite by not getting, well, breathless about this? I'll just lay out my argument, and you can be the judge.

There is not doubt that lung cancer is a devastating disease, and we have not done a great job reducing its burden or the associated mortality. However, there are several issues with what is implied above, and some of the assumptions are unclear. First, what does "accuracy" mean? In the world of epidemiology it refers to how well the test identifies true positives and true negatives. If that is in fact what the story means, then 83% may not be bad; we'll regroup on that point at the end late in this post. This brings me to my second point: what is the gold standard that the test is being measured against? In other words, what is it that has the 100% accuracy in lung cancer detection? Is it a chest X-ray, a CT scan, a biopsy, what?

The SEER database, the most rigorous source of cancer statistics in the US, classifies tissue diagnosis as the highest evidence of cancer. However, in some cases a clinical diagnosis is acceptable. The inference of cancer when no tissue is examined is possible when weighing patient risk factors and the behavior of the tumor. So, you see where I am going here? The gold standard is tissue or tumor behavior in a specific patient. Is that what this technology is being measured against? We need to know. And here is another consideration. What if the tissue provides a cancer diagnosis, but the cancer is not likely to become a problem, like in the prostate cancer story, for example?

But all of these issues are but a prelude to what is the real problem with a technology like the one described: the predictive value of a positive test. The story even alludes to this, pointing the finger at other current-day technologies and their rates of false positivity, and away from itself. Yet, in fact, this is the crux of the matter for all diagnostics. Let me show you what I mean.

The incidence of lung cancer in the US is on the order of 60 cases per 100,000 population. Now, let us give this test a huge break and say that it yields (consistently) 99% sensitivity (identifies patients with cancer when cancer is really present) and 99% specificity (identifies patients without cancer when they really do not have cancer). What will this look like numerically given the incidence above if we test 100,000 people?

Cancer present
Cancer absent
Total
Test +
59
999
1,0589
Test -
1
98,941
98,9421
Total
60
99,940
100,000

If we add up all the "wrong" test results, the false negative (n=1) and the false positives (n=999), we arrive at a 1% "inaccuracy" rate, or 99% accuracy. But what is hiding behind this 99% accuracy is the fact that of all those people with a positive test only a handful, a paltry 6%, actually have cancer. And what does this mean to the other 94%? Additional testing, a lot of it invasive. And what does this testing mean for the healthcare system? You connect the dots.

Let's explore a slightly different scenario. Let us assume that there is a population of patients whose risk for developing lung cancer is 10 times higher than the population average. Let us say that their incidence is 600 cases per 100,000 population. Let us perform the same calculation assigning this same bionic accuracy to the test:

Cancer present
Cancer absent
Total
Test +
594
994
1,588
Test -
6
98,406
98,412
Total
600
99,400
100,000
The accuracy remains at 99%, but the value of the positive test rises to 37%. Still, 63% of all people testing positive for cancer will go on to unnecessary testing. And imagine the numbers when we try to screen millions of people, rather than just 100,000.

Let us do just one final calculation. Let us reflect the data back to the test in question, where the article claims that the accuracy of the next version of the technology will be 90%. Assuming a high risk population (600 cases per 100,000 population), what does a positive result mean?

Cancer present
Cancer absent
Total
Test +
540
9,940
10,480
Test -
60
89,460
89,520
Total
600
99,400
100,000
From this table, the accuracy is indeed 90%, concealing the very low value of a positive test of 5%! This means that of the people testing positive for lung cancer with this technology, 95% will be false positives! What is most startling is that to arrive at the same mediocre 37% value for a positive test that we saw above in this population, we would need a population where cancer incidence is a whopping 6,000 per 100,000, or 6%!

I do not want to belabor this issue any further. Screening for disease that is not yet a clinical problem is fraught with many problems, and manufacturers need to be aware of these logic pitfalls. What I have shown you here is that even when the "accuracy" of a test is exquisitely (almost impossibly) high, it is the pre-test probability of, or the patient's risk for the disease that is the overwhelming driver of false positives. Therefore, I give you this conclusion: beware of tests that sound too good to be true -- most of the time they are.

h/t to @gingerly_onward for the story link  


No comments:

Post a Comment