Science is fun


Do not be fooled by statistics!

Recently, I came across a diagnosis ELISA Kit which was claimed to provide a sensitivity of 100% and a specificity of 99%. These statistics were computed from a skewed sample of sera, with only 5 positive cases and more than 300 negative cases.  The potential client should be aware that the confidence on these statistics is very poor for such a low number of positive cases considered.  Let me support my discussion with a similar simulated example. Imagine 300 negative readings of a diagnosis kit, representing non infected sera, are simulated following a normal distribution with 0.5 mean and 0.25 standard deviation. Also, 5 positive readings are simulated with a normal distribution with 1.5 mean and 0.5 s.d. This may be seen as a very realistic situation, according to our experience. With these simulated data, the optimum cut-off following a given criterion (the Youden index) is 1.01, and the sensibility and specificity of the (simulated) diagnosis kit is 100% and 97%, respectively. These are quite close statistics to those reported by the test. But the question is: are they reliable values for the client? The answer is no. Mainly because, as already commented, five positives are not enough.  Another relevant consideration when using a reduced number of samples is that the sensibility and specificity should always be computed from independent data to that used to set the cut-off. Otherwise, we get very optimistic results. As a matter of fact, the less the size of the sample used, the more optimistic the statistics are. To illustrate this, imagine that a client buy the assay, with cut-off 1.01 and sensibility and specificity of 100% and 97%, respectively. I simulated new samples of positive and negative readings, representing the new sera analyzed by the client with the kit, and obtained a sensibility and specificity of 84% and 98%. Why the sensibility was so low for these new readings? Because the number of positive sera used to compute the cut-off (5) was too low to obtain a reliable statistic. To avoid this problem, resampling techniques such as cross-validation provides more realistic estimates. Cross-validation estimates of the sensibility and specificity were of 80% and 97%, much closer to those observed by the client than the original ones. As a conclusion, statistical validation of diagnosis kits should be always supported by sounded statistical procedures and enough data. Otherwise, performance statistics reported are simply not reliable.

Written by José Camacho.