Sometimes, I am a bit of a maniac. At the beginning of the 2014 Summer, the constant stream of posts about the accuracy of the Dexcom G4 systems and its calibration strategies started to irritate me. Some users were reporting that their Dexcom was always "spot-on", which was of course impossible, if only because of the blood glucose to interstitial glucose delay. Other users were reporting huge errors and complaining loudly. What annoyed me was not that users seemed to have wildly different experiences but rather the utter lack of rationality in the reporting and testing.
With the help of some of the nicest people in the "CGM in the Cloud" Facebook group who were kind enough to share their data with me, I decided to dig a bit deeper. My goal was to develop somewhat substantiated hypotheses and test them. During that test phase, Dexcom released a software upgrade that, to some extent, changed the landscape. For this reason and to avoid adding confusion in the mind of users, I decided not to release my findings. However, I have now learned that non-US users and pediatric US users will probably not be getting the 505 upgrade. This is why I am finally publishing my results.
I used anonymized user submitted Dexcom exports that provided me with CGM values and calibration points. Ideally I would have liked to use BG meter values that were not entered as calibration as well (as I do on my own data) but that introduces complex time synchronization issues if the date and time of the meters do not match the Dexcom's time, are changed during the period, etc.
I received different amounts of data: some users sent a year, some users sent a month. In some cases, I used random selection to prevent a long set of results having the same impact as twelve other sets of results. In other cases, when it was interesting to run some tests on larger amounts of data, I adjusted the impact of larger data sets (for example, a user sending 12 months would have its stats calculated over the whole period then their impact on the total data divided by 12 if compared, summed or averaged with other users sending only one month of data)
I mostly used established libraries (Scipy, Numpy, Pandas, Statsmodel...) and custom code. In order to detect bugs in the data handling, I compared all basic data sets computed characteristics with the values calculated by Dexcom's own software.
I advanced step by step, zooming on issues that appeared interesting. I could have missed some. And there could be a selection bias because of that. You have been warned...
Finally, I tested my hypotheses new data that I acquired after the test.
What you'll see here is a "best attempt" at getting a better understanding of the Dexcom G4 calibration behavior in real life. While it has not been executed with extreme scientific rigor, it is certainly better than anecdotal evidence.
Let's begin with the BG MeterIn real life, users don't have the option of checking their results with a glucoscout or YSI. The BG meter is the only tool available for calibration and accuracy checks. Unfortunately, we do not have the data to estimate the absolute BG meter accuracy. But, with the initial double calibrations, we do have enough data to estimate their precision (how consistent they are) and see if has an impact on the subsequent correlation with the CGM data stream.
Here is the result of that analysis on 566 double calibrations
- On average (whole data set - y axis), the consistency of the double calibration points was good (6.25% MARD).
- On an individual basis, using a consistent meter leads to a better correlation between all the future calibrations of that CGM run. That is, of course, a bit of an obvious circular argument: if your meter isn't precise, it is unlikely to track the CGM consistently and the CGM is less likely to behave well.
- The blue arrow shows that a really inconsistent BG Meter leads to a poor CGM behavior. This was caught by my initial data consistency check when data was sent. I contacted that particular user and he/she confirmed that, indeed, their BG meter had been awful. (either because it is intrinsically bad or because the procedure isn't correct).
- The red arrow was much more interesting: the BG meter seemed almost perfect, yet the CGM correlation was awful. What could be the reason? It turns out that more than 90% of the double calibrations introduced by that user were identical. Given the typical consistency of BG meters, this is too good to be true and indeed was: a single finger prick value was entered twice for the initial calibration.
The problem is that the initial double calibration serves a very specific and important purpose: it increases the precision of the inherently imprecise BG meter measurement by a factor of 1.4 by diminishing the measurement noise. This is, in general, called oversampling. The complete mathematical justification of this is beyond the scope of this blog post but, if you are interested by the issue, you can start by reading the "noise" section of that wikipedia article on oversampling. Each BG meter measurement contains a part of signal and a part of noise. It is not a magic bullet: if the noise is uncorrelated (for example depends on the intrinsic imprecision of the behavior of the measuring system) you get the full benefit, in other cases it depends. If you have glucose on one finger and not on the other, the impact of the error will be reduced.
Note 1: this is probably the reason why the new Medtronic sensors have 4 sensing wires. Medtronic is clearly behind Dexcom and others in terms of the quality of its single wire CGM. Having 4 sensors improves the consistency of the signal by a factor of 2.
Note 2: by itself, oversampling does not improve the accuracy of a measure: if you have an inaccurate measure (for example bad sensor or glucose on both fingers) you will get a better (less noisy) bad measure. Oversampling reduces the noise in the measurement which improves consistency - if your sensing process is accurate, your measure becomes more consistently accurate.
First practical conclusions
- use a consistent BG Meter with a decent procedure (wash hands, enough blood...)
- do not skip the initial double calibration.