Friday, December 18, 2015

MDs and stats: probably not worth the trouble.

The meme


The mathematical abilities of medical doctors is a classic meme. It is not totally unfounded. This classic study "Statistical Literacy of Obstetrics Gynecology Residents" is often summarized by the catchy phrase "significantly less than 50% of doctors can answer a true false question about statistics". Indeed, if one reads the abstract, it appears that only 42% of the respondents in a large sample of MDs (n=4173) were able to recognize the definition of what a P Value is. Some abstained, some - just a bit less - got it right, the study can't be generalized...

The question was ‘‘True or False: The P value is the probability that the null hypothesis is correct.’’ 

The correct answer is false. Assuming some null hypothesis is true, the P value is the probability of obtaining a test result at least as extreme as the one obtained. Say you want to test a placebo and your study produces a P Value of 0.01 basically means that your data had a 1% chance to occur if the if the null hypothesis was true (see here for an explanation with charts). It means nothing more. Likewise, a P value of 0.95 means the result you obtained had a 95% chance to be observed if the the null hypothesis was true.

Some important things to note:

  • by definition, the math is based on a starting point where the null hypothesis is considered to be true. Therefore, you are NOT calculating a probability that it is valid or invalid. The probability is applied to the data, not the null hypothesis. (some will even argue that the data has no probability since it is there already, but that part is for the larger B vs F debate I will mention a bit later)
  • if your data has comes up with a low P value, you can't really tell distinguish between "the null hypothesis is false"  and "my sample is biased by something else"
  • the confidence value to which the P value is compared is a bit arbitrary and depends on the field. Medicine is often quite happy with 0.05, fundamental physics wants "5 sigmas" or a P Value of 0.0000003.

Let's stop for a minute to think about the first and last point: why such a discrepancy?  What is the chance of incorrectly rejecting a true null hypothesis? Well, it depends on several factors, selection bias (in the above case, only gynecologists for example), population size, prevalence of the observed effect, etc... Physicists can afford to characterize their testing environment, run as many tests as our taxes allow. Physicians are limited by who they enroll in their studies, vague null hypothesis based on previous studies, small sample sizes, different prevalences of what they test, etc... One of the main issue is the initial probability of the true null. It is assumed to be 100% in the calculations of the P value but, particularly in Medicine, it is almost always not the case! Addressing that question leads to another set of calculations that provide, you guessed it, another probability. What is the chance of incorrectly rejecting the null hypothesis? It turns out that a P value of 0.05 has at least a 23% chance of incorrectly rejecting a true null hypothesis. In practice, the probability is around 50%! (start here for a recursive exploration in the issue) but it can also simply be 100% if the null hypothesis is poorly chosen.

But there is more: even if an experiment has been set up and reported properly, it has a significant chance of being understood or reported incorrectly, even in the specialized press. In this case, a now famous Higgs Boson announcement was reported by both the Daily Telegraph and Nature. Nature got it wrong... Maybe both writers tossed a coin and one got lucky?  (medical analogy: after some procedure, you are either alive or dead, there is only one direction of interest. You can't be more alive or more dead. But you can be taller or smaller than average, or produce more or less TNF, etc...)

Does ignorance matter?


Going back to our gynecologists study, generalizing it as "MD's don't know shit about maths and stats" could be a mistake. But, even if it is not, does it matter? Is it actually a bad thing not to know about P values? Let's see what Nature (again) has to say on the issue and let's hope that, this time, the coin they tossed fell, by chance, on the correct side: Scientific method: Statistical Errors. But is there a consensus on the issue? The comments are priceless and quickly turn into an argument.

Assume a study reports a P value 0.05, does it really matter if 50% of the MDs who read it get it wrong when the study itself has a true error rate of 50% ? A question worth asking I believe.

In the subset of MDs answering the question correctly, how many will go down one level deeper and understand all the parameters that intervened in the generation of that P value? If they don't understand them, what is the chance that they falsely convince themselves that a result is likely true or a null hypothesis is false? Now, that is getting a bit hairy... But we are not done yet.

Assume they are of the rare breed that gets it fully (I certainly would not claim that I do myself even if I spent more time than the average MD looking at the issue), is that beneficial? They could, of course, start teaching and improve the statistical literacy of other MDs by helping them understand the correct definition and interpretation of P values. If they are particularly gifted, they could even start disagreeing with other particularly gifted statisticians who happen to fundamentally disagree with them. Or they could decide to jump in the big arena and join the never ending Bayesian vs Frequentist debate...

A typical Bayesian vs Frequentist fundamental debate could be summarized as follow

Bob to Fred: "The probability that you are wrong is extremely high"
Fred to Bob: "Don't bother, I observe that you are wrong"

or the other way round... who cares.

 

Meanwhile in the real world...


But that's not where the glory and money is... The main career path of statistically competent MD is in the industry, where his only goal will be tweak studies to achieve a nice P value for whatever drug or device study his employer wants to push on his statistically illiterate colleagues.

That's something he will achieve by slightly tweaking the null hypothesis, excluding some data, cherry picking the prevalence rate of the issue that he addressed... "We need P : 0.01 to push this through, get it."
More statistically competent MDs means more tweaked studies if they are on the dark side. Unfortunately, the dark side usually pays better.

Clearly, increasing the number of statistically competent MDs is NOT  good thing. Quite the opposite.

Disclaimer: since I am expressing a single opinion about a binary choice, there's a 50% chance that I am wrong. A priori. If my hypotheses are true. And I understood the issue. Got my prior right...

Wednesday, December 16, 2015

Accessing the NightScout Mongo database in Python.

I've already mentioned the NighScout and xDrip projects many times on these pages. Unless you have been living under a rock for the last 18 months, you probably know that these free and open source projects have had a major impact on the quality of life of many Type 1 diabetics and caregivers.

Paradise Lost


In late 2015, MongoLab upgraded their database system and subsequently their Mongo-db as a service web site to a new version (version 3.x). The NightScout developers and support team, once again, delivered: thanks to an upgrade, their users were able to continue to use the system with minimal inconvenience.

Unfortunately, for a non standard user such as your servitor, the new system had several drawbacks:
  • data could not be posted directly to the database anymore, it had to be sent to the web site, then to the database. 
  • the web site itself had started to evolve into something much bigger than a simple remote viewer.
  • for most users, this meant that setting up the full new portal on Azure was the only practical solution.
My own requirements were
  • the database is all that matters.
  • simpler architecture that I can fire up in VMs, local computers, Raspberry PI, any hosting service.
  • no need for the current additional features but the necessity to be compatible with my analytical interests.
  • no desire to depend on the availability of chained elements (uploaders, azure, database, azure again, github for updates)
This is why I decided to leave the NightScout paradise for a while and revert to my private purgatory which includes, I shamefully admit, SQL and SQLITE databases... 

Born again? 


But then, right in time for the Holiday season, good news showed up. xDrip will support direct database uploads. Thank you guys, you made my day! And that brings me to the point of this short post: if MongoDB was again a viable option, how hard was it to access it outside the normal setup? I've always been a bit suspicious of MongoDB, as a memory and resource hog... Plus, if I had had the final say on product naming, I would have called it LISCB (lots of insane stupid curly braces). I was a bit reluctant and I was WRONG!. It turns out that, in Python at least, the task was extremely easy and took a few minutes!

Assuming you are running Python 3.4+, here's how to do it.

Start by installing the pymongo package

python -m pip install pymongo

and here is some simple sample code, with no error checking included, I just want the smallest demo possible. The prints aren't needed, they are just there to show the output. If you want to test it on your own installation, replace the user, password, 00000s db and collections with your own identifiers of course.

# import the required package
from pymongo import MongoClient
# database connection setup
uri = "mongodb://testuser:testpassword@ds000000.mongolab.com:00000/testdb?authMechanism=SCRAM-SHA-1"
client = MongoClient(uri)
print(client)
# set the db we need, we could of course query what's available
db = client['testdb']
# get the list of available collections, call them tables for fun
tables = db.collection_names(include_system_collections=False)
print(tables)
# set the collection we need
testcoll = db['testcoll']
# see if we can get a record
result = testcoll.find_one()
print('Result\n______\n', result)
# find out how many record we have
count = testcoll.count()
print('Number of Documents in collection :', count)
# let's look for values above 270 mg/dl
cursor = db.testcoll.find({"sgv": {"$gt": 270}})
# and display some information about them
for value in cursor:
    print(value['sgv'], 'provided by', value['device'], 'on', ['dateString'] )

And here is the output

/mongodb/minimal.py

getting there

MongoClient(host=['ds000000.mongolab.com:00000'], document_class=dict, tz_aware=False, connect=True, authmechanism='SCRAM-SHA-1')


collections

['devicestatus', 'objectlabs-system', 'objectlabs-system.admin.collections', 'profile', 'testcoll', 'treatments']

first record in that test db

Result
______
 {'dateString': 'Fri Oct 24 14:14:00 CEST 2014', 'direction': 'SingleUp', 'date': 1414152840000, 'device': 'dexcom', '_id': ObjectId('544c6816a885b50eecc2571e'), 'sgv': 101}

number of records in the testcoll collection

Number of Documents in collection : 37192

result of the conditional query

271 provided by dexcom
271 provided by dexcom
271 provided by dexcom
271 provided by dexcom
271 provided by xDrip-BluetoothWixel
271 provided by xDrip-BluetoothWixel
271 provided by xDrip-BluetoothWixel
271 provided by xDrip-BluetoothWixel
271 provided by xDrip-BluetoothWixel
273 provided by xDrip-BluetoothWixel
275 provided by dexcom
276 provided by xDrip-BluetoothWixel
277 provided by dexcom
277 provided by xDrip-BluetoothWixel
277 provided by xDrip-BluetoothWixel
278 provided by dexcom
278 provided by xDrip-BluetoothWixel
278 provided by xDrip-BluetoothWixel
278 provided by xDrip-BluetoothWixel
279 provided by dexcom
279 provided by xDrip-BluetoothWixel
279 provided by xDrip-BluetoothWixel
280 provided by xDrip-BluetoothWixel
281 provided by dexcom
282 provided by xDrip-BluetoothWixel
282 provided by xDrip-BluetoothWixel
283 provided by xDrip-BluetoothWixel
286 provided by xDrip-BluetoothWixel
287 provided by xDrip-BluetoothWixel
289 provided by xDrip-BluetoothWixel
290 provided by xDrip-BluetoothWixel
292 provided by xDrip-BluetoothWixel
292 provided by xDrip-BluetoothWixel
294 provided by xDrip-BluetoothWixel
296 provided by xDrip-BluetoothWixel
299 provided by xDrip-BluetoothWixel
303 provided by xDrip-BluetoothWixel
310 provided by xDrip-BluetoothWixel

That's all folks, back to my obscure useless stuff.

Monday, November 30, 2015

Dexcom's temperature compensation issues?

I wrote previously about the issues we had with warm baths and the predictive part of the Libre algorithm. You can have a look at what I wrote about warm baths in Some Libre peculiar behavior you should know about: temperature and I Love the Libre when it misbehaves. Temperature compensation is a bit tricky for CGMs. To some extent, the Dexcom G4 non AP algorithm's averaging behavior and the fact that packets were not received when the transmitter was submerged masked the effect with the G4. One had to dig into the details of the Dexcom patents to see it was also affected (as expected as it is a physico-chemical issue).

For some reason (I speculate, but don't have rock solid data yet), the G4 AP is clearly more sensitive to temperature changes than the non AP version. Here's a dump of some this last two week's traces. In all cases, after the drop of signal that indicates submersion, we observe false higher reported ISIGs (confirmed by BG tests). See for yourself.






At this point, I am simply reporting the fact and considering the hypothesis that it is indeed caused by the temperature increase. There could be another cause, for example water changing the conductivity between the two transmitter plots (resistance should be infinite).  We'll try to run an additional test to invalidate the temperature hypothesis, but that will not be a cold bath. I love experiments, but will not torture the kid.



Thursday, November 12, 2015

Diabetic Autonomic Neuropathy - RR Intervals analysis part 2

Basic ECG Info.


When you record an ECG, you record the electrical activity generated by the heart's contraction. Each beat should produce something like this


The P Wave represents the atrial depolarization (contraction) triggered by the autonomic system. The PR segment measures the time it takes for the influx to travel from the atria to the ventricules. The QRS complex represents the ventricular depolarization (contraction) and the T wave represents the ventricular repolarization (relaxation). A beating heart produces (hopefully) a stream of PQRSTs. The autonomic "balance" controls the heart rhythm from a bundle of nerve tissue called the sinoatrial node (SA). Ideally, it is at the SA that the autonomic triggers occurs and where it should be measured but that would require intra-cardiac electrodes... That is why the very characteristic tip of the R wave is used as a proxy. RR Intervals variability analysis is simply a fine grained analysis of your base heart rythm variation.

Normal vs Diabetic vs poorly controlled diabetic.


Let's start by the chart showing the main point in Clarke's paper about RR Interval analysis in diabetics.
Non diabetics have a SDNN - which stands for Standard Deviation Normal (R) to Normal (R) - between 50 and 100 milliseconds. The diabetic population as a whole shows a very different distribution, from 0 to 100 milliseconds, heavily skewed to the sub 50 milliseconds variability zone. (with a possible outlier). Diabetics with proliferative diabetic neuropathy show a drastic reduction in RR variability. That looks simple enough...

But why the SDNN and not simply the SD? Well, even a normal healthy heart can offer some spectacular, but harmless, non normal beats such as ventricular extra systoles. At some point, your ventricule decides it has waited enough and contracts spontaneously. Here is one of those PVC (premature ventricular contraction) on my own heart. You have probably had those and call them "palpitations". Such a spectacular beat, however, introduces a significant variability in your heart rhythm and must be excluded for the analysis. SDNN is not always SD. Add other potential rhythm troubles and you realize that whole recordings used in RR analysis must be reviewed by a cardiologist in order to exclude abnormal beats. At that point, the optimal method to "fill the blanks" also starts to matter. No extra beats allowed - that is the first pit in which you must not fall when you are doing a RR analysis. (and if any MD happens to read this, yes, I know about my P-Wave)


Here is the recording of Max heart I used for the rest of our tests. I could use one of the many semi arbitrary reject filter used in the literature, manual beat by beat review. Depending on the threshold, 1 to 3 beats would be rejected, I had doubts on one of them visually and went for one reject.


How did I get that ECG? With an ECG machine, obviously. A somewhat amateurish Prince 180D. Does it matter? Yes and no.

Yes, it does matter because the consumer ECG machines on the market (very good site about consumer ECG machines) have one fundamental limit: their sampling frequency. The ECG signal is measured 150 times per second. To put things in perspective, an entry level decent professional machine will take 960 samples per second... but will cost ten times as much. (There are also other differences: denoising and signal cleanup algorithms, number of simultaneous channels etc...). Today, you wouldn't find a cardiologist that would consider 150 Hz sampling useful.

 But no, it does not matter because a lot of the RR interval analysis done in the late 80s and 90s was done on what would be considered an unacceptable sampling rate today. 128 Hz holters were frequent. ECG sampled at a low frequency: their signal only looked smooth because it was drawn by multiple moving pens...

Why worry about the sampling frequency? Because in order to find the exact timing of the R peaks, the signal must be processed. It is very hard, in a blog post, to detail that processing but lets simply say that you have to slice and dice the signal until you manage to transform the peaks into zero crossings that you can accurately measure. One of the best known algorithm used for that purpose is the Pan Tomkins algorithm (details here) and that is the one I used. Here is the result.


The red dots are the data points measured by the Prince 180D. The sampling frequency problem is obvious. Sampling 150 times per second gives enough resolution for most of the ECG, except the QRS. The trigger of the ventricular contraction happens in around 80 mseconds. That means you'll only get a dozen data points during a QRS... not exactly an optimal resolution. That's the "yes it does matter" part.

However, the green lines are the tips of the Rs detected by the Pan Tomkins algorithm. And they are extremely accurate. That's the "no, it does not matter" part. A higher sampling rate may provide a slightly more accurate result, but we don't need that for our purpose.

Sanity check: when I started this project, I had decided that I would be satisfied with an OK result given the confidence levels I could estimate (a couple of milliseconds) and that I would, of course, contact a specialized cardiologist to investigate further any abnormal results. I believe it is great to understand as fully as you can a clinical test or procedure, but one must be very careful not to overreach when it may matter.

Sampling frequency will matter more for spectral analysis (which I will possibly cover later) as explained here. So, remember, sampling frequency is the second pitfall to avoid (for "amateurs"). A decent RR interval analysis can not be achieved with any ECG that samples below 100Hz: no Arduino project, no cell phone without specialized hardware. Top of the line heart rate monitors might work (I haven't tried) and RR interval analysis can also be used to detect over training.

The third dark area is the lack of standard protocols, reference values or consistent results in the literature. Detailed methods descriptions have only recently been required/mandatory for publications. A lot of the medical literature is very fuzzy in that respect. Some attempts have been made to improve the situation over the years, but the RR Interval test isn't the recommended first test for autonomic neuropathy exploration (it is an eventual subset of the "breathing test" part of the Mayo recommended test array). A detailed look at why RR analysis is so interesting and why it hasn't been used in general can be found in this great article "Tests for early diagnosis of cardiovascular autonomic neuropathy: critical analysis and relevance."


But there is more! As we have previously seen, the autonomic balance is in a constant state of flux. It defines the ability of our cardio-vascular system to react to the thousands of events in our lives. The downside is that variability is affected by a lot of things: climb stairs to visit your doctor? stressed by the examination protocol? annoyed to undressed partly? cold? had coffee before the test? insulin? All these factors and dozens of others can significantly modify your RR variability test.

And, of course, age does matter! There are a lot of variables to take into account.

Some results

To conclude that RR analysis part 2, here are some of Max's results with a few comments.

File Size: 100240 bytes. It contains 10 pages
Hardware Version: 2.6
ECG Recorded on: 14/8/2015 at 20:38:0
Total ECG run time: 300.0 seconds
Number of samples:  45000 150 Hz

The Prince 180D ECG file format is totally non standard, some minor reverse engineering was required. If there is some interest, will detail in another post.

1st pass analysis
---------------------
Detected Heart Beats : 402
Average FC (run length) : 80.4
Average FC(RR) : 80.48

First pass with Pan Tomkins. His resting heart rate is a bit higher than usual, probably because I had to run after him to organize the test and the novelty of it added some excitement. A lower frequency would have increased SDNN somewhat.

Cleaning artifacts
-----------------------
1 beats rejected. Reject List: [187] ...

One beat will be rejected on the criteria that it introduces a RR that is outside the 75% to 125% of the RR average of surrounding beats. Visually, it looks like a normal beat, but is a bit noisy from an electrical point of view. As I said, consumer ECG denoising isn't optimal.

Removing Extra beats
--------------------
Duration of clean run 297.272 secs

Preparing RR Data
-----------------
MRR (mean of RR intervals) : 745.18 msec
RMS Intervals : (RMSSD) 43.4575816574 msec
SDNN (standard deviation of normal to normal): 58.1472062957 msec
NN50, pNN50 (86, 21.55388471177945) n, %

SDNN is what we are after here. With a 58 +/- 2 msec SDNN, Max would be in the lowest bucket of the normal population and in the top half of the diabetic population. In other words, at this stage, no indication of autonomic diabetic neuropathy.

Final words for this part

There are tons of other interesting things you can do in RR interval analysis. Here is again a striking example of what it can achieve (Yang, 2006).


and the same vizualization for Max's recording which shows a healthy spread (but indicates he isn't totally well rested - again another story).

For the astute reader, look at Max's Poincare plot below, centered on 750 ms / 750 ms then look at the plot above. Can you guess Max's age? Correct, almost 15


The next part (whenever it comes) will cover more advanced analysis methods, a great professional free software that I used to double check my results and possibly some info on reversing the binary file format of the Prince 180D ECG and its interpretation.




Sunday, November 8, 2015

Comparing the "nonAP" Dexcom G4 with the "505 AP" Dexcom G4

I am delaying, once again, the RR variability ECG post: while it was probably one of the most enjoyable thing I did, both from a "minor hacking" and from a learning point of view, it probably doesn't interest many readers.

G4 "non AP" vs the G4 "505 AP"


As you probably know if you follow the CGM world, the Dexcom G4 is currently available with two different algorithms.

The first one which, for convenience I will call "non AP" can be found in all G4 receivers sold outside the US and in all "pediatric" receivers currently sold in the US. It is the original algorithm the G4 used when it was first released. 

The second one which I will call the "505" can be found in the firmware updated original receivers, the adult "share" receivers and, of course, the newly released G5.

It is widely assumed that the "non AP" algorithm relies on an average of the previous secondary raw values it as received while the "505" algorithm is, at least in part, influenced by the collaboration Dexcom had with the Padua University and tends to use secondary raw values more aggressively than the "non AP" algorithm, at least when they are marked clean...

Why only now?

 

Since I live in Belgium, the "505" algorithm has not yet officially been made available to us: it will, when the G5 hits our shores, which could be any time soon... or not. To be honest, I could and probably should have made this comparison earlier. By the way, I want to take this opportunity to thank reader "D." who offered to send me a "505" share receiver as soon as it was released (but I was busy with the Libre back then), reader "J." who offered to send G5 sensors for my "mad scientist experiments" and the many readers who offered tips on how to bypass the restrictions and re-flash the non US G4 firmware to the latest version. The T1D communities I have joined are full of wonderful people.

So why take the bait now? The first reason is that is that reader's "K." offer to send me a G4 Share came at the right moment... exactly when Dexcom was releasing the G5. The second reason is that I don't plan to upgrade (or should I say downgrade) to the G5 any time soon, or until I have no choice. While the G5 runs the 505 algorithm, it would, at this point, be a step forward and three steps back from my point of view.

Test Setup and Limitations

 

  • We decided to run a "non AP" and a "505" version side by side. Max was extremely reliable during the test and both receivers were with him at all times. There's a very small difference in the number of packets received (data below).
  • Both receivers were started at the same time and have been calibrated, per manufacturer's instructions, at the exact same second using both hands to press OK simultaneously.
  • Limitation: we did not use our non standard - but for us optimal - calibration strategy. That strategy has served us well with the "non AP" but I wasn't sure it would help the "505" as much. We skipped it to keep the field as even as possible.
  • The sensor we used lasted the whole seven day period but will not be remembered as the best sensor we ever had. Post insertion, it showed quite a bit of that oscillation/secondary level noise on data marked as clean. 
  • Our ISIG profile during the period wasn't probably a typical Type 1 Diabetic profile. That may have limited the benefit we derived out of the "505"
  • While I try to do my best not to make unsubstantiated claims and present only data I have enough confidence to use for myself, keep in mind I have one subject and one glycemic profile. I exclude data I could not defend (see the accuracy comment below), I go through a ton of double checks, confidence factor calculations and other goodies on my data set but my goal is NOT to publish rock solid authoritative stuff. I just want to look at things less subjectively than what is seen in the average user report.

 

 Results 

 

!!! important note: when reading chart, keep in mind that the data is interpolated every 30 seconds between actual data points !!!
The results given by both sensors were extremely close. The Pearson correlation coefficient for the whole period was 0.97559. Most of the discrepancy came from the first day post insertion where the "505" would happily display "clean" but jumpy data whereas the "non AP" would average the jumps out. Here is a zoom of one of such events.

The "505" may actually have spotted the rise sooner that the "non AP" but it spoiled its advantage by tracking the jumpy secondary raw too closely because it was marked "clean".

Here is the whole period display.
The first thing to note is that, when calibrated at exactly the same time, the two algorithms will produce results that are extremely close. Except for the startup issues shown above, we could not find a single situation where one algorithm became so confused that it differed markedly from the other.

The second aspect I wanted to look at was the speed of reaction - how much faster was the "505" compared to the "non-AP"? In a similar comparison, the Libre beat the G4 non AP by no less than 9 minutes (this correlates well with the Dexcom reported "G4 vs YSI" data and the Abbott reported "Libre vs YSI" data). This type of comparison or time delay determination is usually done by shifting the signal and finding the best correlation.

In this test average, the "505" beat the "non AP" by only 3 minutes (6 period of 30 seconds).  Cherry picking periods of rapid changes, I could find runs where the optimal correlation time shift was 5.5 minutes. I could also find runs where it lagged (on average) by 30 seconds. I decided not to cherry pick as it is a slippery slope: selective cherry picking could be used to demonstrate anything. Here is the zoom on a tennis afternoon where we keep falling and correcting as the exercise went on (keep in mind that in this zoom, there are again 10 points for a standard Dexcom data point). This particular exercise occurred 20 hours after insertion and the jumpiness of the signal is still somewhat present. In some circumstances, such as the last fall and rise, the 505 clearly reacted more quickly and accurately. But the first peak is clearly a draw.



Here  is another 23 hours period where the "505" algorithm is mostly (on average), but not always, ahead of the "non AP" one (on average, one minute or two 30 seconds readings).
 

Again, two points worth noting
  • as far as use in sports is concerned, the "505" is marginally better than the "non AP". However, that improvement is not nearly as spectacular as it is with the Libre and, in particular, the Libre spot checks (which I assume to be in part predictive).  While we could play a whole tennis tournament using the Libre as only BG measuring tool, we were forced to use BG tests to check that out re-carbing was sufficient during tennis training sessions. 
  • the "data reality" differs markedly from our perception. I believe this is caused by the following subjective factor: if the "505" picks up a trend faster than the "non AP", it will remain ahead for the rest of the trend and the user who compares will be constantly reminded that the "505" is ahead. That creates a positive reinforcement and falsifies our perception a bit.
While I was a bit disappointed - I expected something like a systematic 5 minutes and an optimal 7.5 minutes advantage for the "505" - by the numbers, let's keep in mind that every occasion where the 505 is ahead is a bonus for the user: knowing you are falling more quickly than you expect 5 mins in advance is significant, knowing your re-carbing out of hypo worked 5 mins earlier is reassuring.

Accuracy

 

As I said above, our sensor wasn't a stellar performer (we averaged a 14% MARD for both algorithms, which is on the very low end for us). This is why I won't provide a detailed accuracy analysis but just impressions: in a "gut feeling but not statistically significant way" I'd say that the "505" showed better accuracy in the low range (below 80 mg/dl) but worse accuracy in the high range (above 150 mg/dl). This is somewhat visible in the global view where you can see the "non AP" climb above the "505" on several occasions: in all the cases we tested, the "non AP" was closer to the BG meter. Both the "non AP" and the "505" underestimated the BG value.

Closing Thoughts

 

The "non AP" approach is better for the possibly unstable conditions that characterize post-insertion. The "505" algorithm is generally better in all other cases, especially in the low range. It will, in most fast changing conditions, flag the rise or the fall more quickly than the "non AP".

This being said, the 5 minutes sampling frequency severely limits the CGM usefulness for sports or any other activity where ups and downs are to be expected. While Dexcom seems to have cornered the market at the moment (late 2015) I can't help thinking about how much better the Libre sensor...
  • the fact that the Libre can be factory calibrated means that, in most cases, Abbott is able to produce and characterize sensors more accurately and consistently than Dexcom.
  • the fact that the Libre is able to deliver a decent raw value, from close to off the shelf TI components,  almost every minute while Dexcom needs 5 minutes to deliver a secondary raw value that it does not always characterize accurately (jumpy secondary raw data marked as clean)
  • the fact that the Libre, in our experience, shows almost no drift until around the 12th day of its wear period
seem to indicate that, at the core, they have a much better technology.

In the world of my dreams, Abbott would have a very good customer service, would tell customers that it steals their data upfront, would have no supply chain issues and would provide more user friendly remote data transmission.

About once a month, the conspiracy theorist in me wonders what exactly has been agreed between Dexcom and Abbott when they dropped their mutual lawsuits just before the Libre hit the market...

Even if Abbott is prevented from going full CGM by itself by some agreement, I still hope that when the supply chain issues are resolved, some slight Libre hardware modification could increase the transmission range somewhat in a way that could be practically exploited by the community to develop what Abbott doesn't want or can't put on the market...

Additional Data (and minimal comments)

 

Desync is equal to:   0:02:55
(clocks were desynchronized at start, packets were realigned)
New start after resyncG4 Non AP   :   2015-10-30 19:49:41
G4    505        :   2015-10-30 19:49:41
(lets force synchronization)
New 505 end after period selection :     2015-11-06 17:43:58New nAP end after period selection:     2015-11-06 17:44:19
(sensor was stopped a bit early for a more convenient restart/reinsertion)
WTF, out of sync by 0:00:21
(receiver internal clocks drifted by 21 seconds over seven days - resynchronizing by 3 secs per day)
(a better approach would probably be to use identical time buckets, on the to-do list as impact is minimal)
Length of period 9954 mins - should have 1990 packets
Length of nAP 1939
Length of 505 1945
(nAP lost 51 packets, mostly water submersion during bath)
(505 lost 45 packets, same reason, no statistical difference)

Start Time: 2015-10-30 19:49:41
Correlation: (0.97559478434207947, 0.0)
505 Mean 106.337201125
G4 Mean 107.807866184
(mean values for the period almost identical, so are SD and other indices, not shown here)

(best correlation data shown below - the differences were so small that I switched to a 30 seconds resolution for the virtual sensor)
Shifting G4 non AP data by 0.5 minutes
Correlation: 0.97679314605
Shifting G4 non AP data by 1.0 minutes
Correlation: 0.9775384715
Shifting G4 non AP data by 1.5 minutes
Correlation: 0.978129017586
Shifting G4 non AP data by 2.0 minutes
Correlation: 0.978566125344
Shifting G4 non AP data by 2.5 minutes
Correlation: 0.978851137811
Shifting G4 non AP data by 3.0 minutes
Correlation: 0.978985400009
Shifting G4 non AP data by 3.5 minutes
Correlation: 0.978970258929
Shifting G4 non AP data by 4.0 minutes
Correlation: 0.978807063511
Shifting G4 non AP data by 4.5 minutes
Correlation: 0.978497164627
Shifting G4 non AP data by 5.0 minutes
Correlation: 0.978041884366
Shifting G4 non AP data by 5.5 minutes
Correlation: 0.977444434606
Shifting G4 non AP data by 6.0 minutes
Correlation: 0.97670490467
Shifting G4 non AP data by 6.5 minutes
Correlation: 0.975823208612
Shifting G4 non AP data by 7.0 minutes
Correlation: 0.974799259945
Shifting G4 non AP data by 7.5 minutes
Correlation: 0.97363297164
Shifting G4 non AP data by 8.0 minutes
Correlation: 0.972324256124
Shifting G4 non AP data by 8.5 minutes
Correlation: 0.970873025277
Maximum correlation  0.978985400009 with delay  3.0 mins


Friday, October 30, 2015

FreeStyle Libre - US blind clinical tests results are in!

Very quick post as the article is in pre-print and may disappear behind a paywall at some point.

I am glad to say that it correlates very well with what I have observed and reported on this blog (drumroll please ;-)). The time delay vs YSI is as it was speculated to be here, approximately 10 mins ahead of what was reported for the non AP Dexcom G4 (4.5 minute +/- 4.8 mins vs previous studies reporting 15 +/- 5 mins for the non AP G4). Accuracy is eerily in line with what was reported. 

Note 1: sometimes, confidence intervals are amusing. The above seems to indicate they can't exclude that your Libre travels through time, guessing your BG in advance. Who knows? It could be the result of the predictive algorithm...

Note 2: this study is clearly aimed at obtaining FDA approval. Based on what is reported, I don't see how the FDA could reject that system. This is also why, probably, I find no trace in that study of a couple of worrying issues that have been consistently reported by users and that I was able to investigate a bit here. 

Anyway, enjoy the paper!

Tuesday, October 27, 2015

Getting the most of your Dexcom (non AP algorithm)


Getting the most of your Dexcom (non AP algorithm)


Note: this is a relatively old post I had prepared in early July 2015. I planned a couple of them listing all our Dexcom G4 (non AP) issues and solutions but ultimately lacked the motivation to fully document the accuracy improvements process. While I did get good results, the whole process was a bit cumbersome, the eventual slight differences between algorithms added to the cognitive load (as running two CGM in parallel did when we compared the Dex to the Libre) and I came to realize that while going below a 10% MARD was cool and satisfying, it served no practical purpose. I've decided to publish this first part as is. 


The Freestyle Libre sensors we have used have generally outperformed our Dexcom in terms of accuracy and tracking speed. In terms of convenience, the Libre system wins hands down., But the Libre is not perfect.
  1. it does not work out of the box as a full CGM and consequently does not fit in the Nightscout/xDrip ecosystem. The lack of range is a practical killer for those applications.
  2. it is not widely available.
  3. it did generate, in special circumstances, a few values that weren't in the A+B zone of the Clarke grid. Yes, a poorly calibrated G4 could also deliver amazingly bad and dangerous results (see below), but thanks to the wisdom we've collected over time, we don't see them any more (unless we experiment - but in this case we use two trackers).
  4. the Libre community is mostly a community of users, the Dexcom community is a bunch of builders. Well, there are builders in the Libre community, but a lot of them are interested in a quick buck and that does not fit my thought model...
But... when you have experienced the Libre, the non AP Dexcom is definitely disappointing. Is there hope! In this long multi-part post, I'll report on our experience trying to elevate the non AP Dexcom to a Libre-like level of performance and usefulness. We'll start with basic things and ending on possibly somewhat esoteric developments.

Physical aspects

  • Insertion should be as smooth as possible, which isn't necessarily easy with a small kid and the Dexcom inserter. The less traumatic insertion is, the smoother the immediate post insertion values will be.  
  • The sensor should be firmly attached and remain firmly attached throughout the wear period. A slightly mobile sensor will cause microscopic traumas that will negatively impact its accuracy. A sensor that lets liquids sip around the sensor wire's wound may lead to infection and temporary membrane characteristic changes that will also lower the quality of the readings. Triboelectricity caused by sensor movements can disturb the readings.
  • The location of the insertion site should be selected so that sensor compression issues are minimized. Observe how your kid sleeps and try to avoid areas that he typically sleeps on.
For a more complete description of the issue, have a look at the following paper

The first day: pre-insert or use xDrip.

The few hours after insertion can be difficult. This is well documented in the scientific literature and the reason why some of the most meticulous comparison tests were done with sensors that had been pre-inserted 24 hours before the measures started. Dr Damiano used pre-insertion in his CGM evaluation.


Insertion is an "analog" process, not something that is perfectly reproducible. How and where it occurs is a soft parameter that can't be precisely quantified. How your body reacts to the insertion and how quickly it heals the wound is as variable as any other parameter across a population of humans, from 10 kg toddlers to overweight retirees... While pre-insertion remains, in my opinion, the optimal strategy it is not always convenient and, unfortunately, the Dexcom "flipper" must be secured if the receiver is not immediately locked into place.

Sealant (probably silicon based) on the sensor base.
The bottom-line is that, in the time frame following the insertion, the G4 can return erratic data for... a certain amount of hours... How many? It depends. We have seen sensors settling after a couple of hours while others took almost a day.

We've observed, using either xdrip or a wired solution, very noisy secondary raw data as shown in the screenshot below (more about primary and secondary noise a bit later).


or noisy but slightly more coherent oscillations as shown below


Sometimes, the behavior may border on the insane if you focus on one single value, as in "Oh my God! The kid is crashing!"

except of course that, ten minute later, the kid isn't crashing anymore.

A word about noise: as you can see, in the above case, the synthetic "raw" value that the Dexcom returned has been marked as "clean". What this means is that the low level collection sampling loop that provided the synthetic (secondary) raw data worked with a decent strong signal. That unfortunately does not necessarily means that the signal is stable or clean at a larger time scale.

The problem, in that bad first hours scenario, is that calibrations occurring at a time the Dexcom essentially sees semi random data, is likely to negatively impact the accuracy of the following day(s). Your Dexcom may or may not decide to display "???" but you will probably end up getting something as senseless as this

Careless first day calibrations leading to absurd results.
The initial low was wrong. The subsequent sharp rise was wrong. The second sharp rise was also wrong. In short, the whole night was a painful mess...

Since xDrip reports a direct translation of the secondary raw value it receives, it will more clearly show if a sensor is unreliable or misbehaving. If you have not pre-inserted your sensor, don't feel forced to enter the initial calibrations exactly after the 2 hours wait period: initial calibration should occur only when the sensor is fairly stable and consistent.

For example, This startup sequence does not suffer much from secondary noise (there is a compression event starting around 3h30) and is good enough for acceptable calibration.



[Older post delayed - see NOTE on top of page]


Sunday, August 16, 2015

Diabetic Autonomic Neuropathy: where variability is a good thing.

Diabetic Autonomic Neuropathy: where variability is a good thing.

We are all familiar with the most spectacular complications of Type 1 Diabetes: cardio-vascular damage, nephropathy, neuropathy, diabetic retinopathy, etc... That's part of the basic information package every newly diagnosed patient receives, the Damocles sword that motivates us to control blood sugar. Most of them are, to a significant extent, linked to the glycation of proteins. They are typically called “advanced glycation end products” (AGEs) in the literature. In many cases, AGE have been directly linked to the damage observed. This is also the main reason why HbA1c, as an AGE, is such a good control indicator: it is a proxy for what happens elsewhere in your body. Good blood glucose control is the tool of choice to prevent or minimize a lot of the complications we are facing.

However, things aren't that simple. Diabetic neuropathy is often understood as peripheral neuropathy as in
“Uncle Jim lost sensations in his foot, he had blisters, they became infected and did not heal because his arteries were bad too. They had to cut his leg.”

That is not the whole story... enter the autonomic nervous system.

Our nervous system does not consist only of a cortex, a sensory sub-system and a motor sub-system. There's a thing called the autonomic nervous system that controls, mostly unconsciously and without our intervention, the basic functionality of our bodies: the rhythm of our heart, our respiration, vasodilation, vasoconstriction, the behaviour of our stomach, intestines and bladder, our reaction to exercise and stress, even sexual arousal...

While it is mostly invisible, the autonomic nervous system is what keeps us comfortably alive.

A detailed explanation of how the system works is, of course, outside the scope of a mere blog post. But one simple way to visualize the autonomic nervous system is to think of two separate controllers called the sympathetic and parasympathetic systems.
  • The sympathetic system would be most active in a “fight or flight” situation (increases heart rate, sends blood to muscle, redirects blood flow away from secondary functions, etc...).
  • The parasympathetic system would be most active in a “read and digest” situation (sends blood to intestines, increases peristalsis, decreases heart rate). 
Here is an illustration of the ramifications of the system (marked as free for non commercial use by Google Image search - do not hesitated contacting me if any of the republished illustrations are in violation of anything)

Most of the time, in healthy people, the systems are said to be “balanced”. The concept is a bit fuzzy: it basically means that the systems do what they have to do in an appropriate way. When the balance is lost through diabetic autonomic neuropathy, life can be hell. This often cited paper gives a good overview of the ton of severe issues it is directly responsible for. Warning: do not read it if you are the type of person that worries endlessly.

Wiring

Roughly speaking, most of the wiring of the system goes through two big nerves: the vagus nerve and the splanchnic nerve. Some other smaller nerves such as spinal nerves serve other territories. The type of wire that goes into each nerve is a complex topic in itself, especially since the heart is a special case. No worries though, we won't need to go into details for our purposes.

If you have read the paper above, you have seen that, in diabetes, this system can be badly damaged. And to add insult to injury, while poor control has the obvious deleterious effects, it can be damaged very soon in the course of the disease and, apparently, somewhat independently of your blood glucose control.  How does that happen? Well, we don't really know. Just as we don't know why some nerves seem to be impacted more than others. Glycation as usual. Auto-immune reactions and inflammation do play a role, but beyond that, looking at the literature, it is again a depressing case of “probably affects”, “deserves further attention”, “seems to be implicated”...
It is the main actor behind gastroparesis, the delayed, inconsistent emptying of the stomach that can wreak havoc on the best control strategy. But it can also lead to orthostatic hypotension (low blood pressure when standing up from sitting), dizzyness, erectile disfunction, lack of exercise adaptation, etc... 

Unfortunately, the heart is also a target, so much that it deserves its own acronym: CAN for cardiac autonomic neuropathy (http://circ.ahajournals.org/content/115/3/387). CAN is also suspected to play a role in sudden death (certain for Type 2 Diabetics, may play a role in Type 1 Diabetics although ionic and pH disturbances my be enough by themselves)

The heart of the matter

Our heart runs a natural pacemaker, called the sinoatrial node. It triggers roughly 60-70 times per minute: that is, if you want, our natural spontaneous rhythm. Its activity is modulated by the sympathetic – parasympathetic balance. The parasympathetic impulses reaching the sinoatrial node through the vagus nerve tend to lower the rate at which the natural pacemaker fires. The sympathetic impulses, traveling through the spinal nerves, increase the firing rate and the strength of the ventricular contraction, for example when we exercise. In a healthy subject, the systems are ideally balanced.

However, if the vagus nerve is severely damaged, an imbalance is introduced: the sympathetic system will work almost as it should but the parasympathetic activity will be lower. At the extreme, an old diabetic will have resting heart rate higher than an healthy individual and will not adapt as easily to exercise or even suddenly standing up from a sitting position.

And that slowly brings us to the fancy world of tachograms and RR interval analysis

A healthy S/PS balance is always ready to react almost instantly to any change of conditions. The sinoatrial node is normally in a very unstable “trigger happy” state. Mere emotions can accelerate our firing rate within seconds. At the peripheral level, sudden vasodilation can make us faint. Run five steps, your heart responds at once.

That instability is highly desirable (an unusual concept for diabetics) as it reflects our ability to adapt to changes in life. You do positively want to have a constantly unstable heart. That variability can be quantified in a myriad of ways. It is loved by researchers as it gives them plenty of opportunities to publish papers on the correlation between dozens of indicators with dozens of outcomes under a dozen of circumstances such as post myocardial infarction, aging, exercise recovery, or even our propensity to socialize (where's free will anyway?) etc...

As far as the T1D patient is concerned, the story begins around 1975 when DJ Ewing  and others looked at RR variability (basically how unstable your cardiac rhythm is) and published this paper http://www.ncbi.nlm.nih.gov/pmc/articles/PMC482890/  which can be summarized by this figure
The hearts of diabetics did not seem, on average, to behave like the hearts of healthy controls.

Very quickly, lots of people jumped on the concept, confirmed the findings (here for example, here) and the concept was used as a mortality predictor (here for example) and as a tool to detect early asymptomatic autonomic neuropathy (see again the Vinik paper for explanation and links).

As a T1D parent, I am always a bit paranoid. Once I learned that diabetic autonomic neuropathy could potentially exist at the time of diagnosis, I absolutely, totally and utterly needed to know if my son was affected. On the basis of that paper that, armed with an ECG device, I embarked on what I expected to be a simple check and ultimately was dragged into a tricky journey in the very muddy waters of RR Interval Analysis, tachograms, power bands and clinical protocols (or the lack of them)

That will be for the next post.

Thursday, August 6, 2015

Meter vs Meter or a quick shot at some Internet and marketing diabetic memes.

What about the BG meter issues?

What do we actually measure?

Well, in principle, a BG meter test measures capillary blood glucose. It is constantly changing, some times very quickly. It differs from interstitial glucose, venous, arterial blood glucose, etc... On top of that, the differences aren't static. Think about it in terms of shifted waves going up and down. Complex? Yes. But even that is a simplification: think about them in terms of shifted going up and down where the shift is not constant. Going that deep isn't very useful. What is useful is a reasonably representative snapshot of some value you want to keep in some range. There is no need to hunt for the perfectly accurate glucose value, it doesn't exist. 

Back to BG meters

What do I want from a BG meter? Within limits, I don't care that much about accuracy: if the reader tells me I am at 90 mg/dL when it should have measured 100 mg/dL. 110 mg/dL is also fine. I am measuring a fleeting local reality that does not exist as an absolute truth. What I do care a lot about is precision, consistency. If my fleeting reality was at 100 mg/dL and just dropped to 80 mg/dL, my precise but inaccurate BG meter would tell me that I fell from 90 mg/dL to 72 mg/dL while an accurate but imprecise reader could have given me a stable value.

Of course, ideally, you would want a BG meter that is both precise and accurate. But a device that is biased consistently 10 mg/dL lower at +/- 5% is obviously less dangerous than a perfectly calibrated device that works at +/- 15%.

Internet meme 1: "Your reader is only accurate +/- 20%"

Where does it come from? A misunderstanding in the coverage by most diabetic sites of the ISO BG meters criteria and tests that basically state that 95% of the time, the results should fall withing 20% (or 15%) of the "correct" value. 

How is it typically interpreted? As "The result you got is +/- 20% anyway..." 

I am sorry to say that it is total bull****. Anyone with a basic high school statistical education has been given a free hint with the 95%.

Lets look at a real example. Here is the data of the 38 double BG meter tests we did, within a 120 seconds interval, since January 1st 2015. I am actually cheating a bit here, we did 40 double BG meter tests but we'll get to that later. These tests are the ones we did for the initial calibration of the Dexcom sensor and random double checks we did when we just wanted to be sure.

[70, 90, 95, 65, 84, 242, 81, 69, 119, 88, 110, 109, 85, 66, 182, 162, 245, 53, 55, 111, 140, 119, 170, 56, 77, 80, 234, 78, 55, 129, 79, 93, 77, 88, 135, 77, 77, 124]

[64, 85, 104, 60, 91, 268, 82, 78, 108, 74, 102, 108, 86, 64, 189, 147, 240, 47, 58, 100, 134, 109, 160, 54, 81, 89, 206, 75, 55, 128, 93, 89, 75, 91, 146, 76, 69, 130]

Here are the differences

[6, 5, -9, 5, -7, -26, -1, -9, 11, 14, 8, 1, -1, 2, -7, 15, 5, 6, -3, 11, 6, 10, 10, 2, -4, -9, 28, 3, 0, 1, -14, 4, 2, -3, -11, 1, 8, -6]

Let's plot that data in terms or error percentage. Does that ring a Bell?



Even if you know nothing about statistics and don't recognize the curve, you can't fail to notice that most of the results will be found in the +/- 10% range. Strictly speaking, we can't say on the basis of that sample alone that the distribution is purely normal but it is certainly much closer to normal than a random +/-20% error would be. My hunch is that it is essentially normal, plus a time drift (BG can change in 2 minutes), plus an "accidental" component.

I said above that I removed two data points. One of the tests we did was, in fact, a triple test. Why? Because it was very visible that there wasn't enough blood in the well. That test gave us 101 mg/dL while the two controls with enough blood gave us 124 and 130 mg/dL.

The other removed test is more interesting. What would have happened if we had included it?


The values returned by the two BG meter tests were 283 mg/dL and 24 mg/dL. Something was wrong. And that something was a failing battery in the BG meter.

That illustrates the fact that while BG meters will generally deliver results around the fleeting "correct" value, they aren't immune to extra-ordinary errors. Dextrose powder on the fingers, water, lack of blood are typical factors that will lead to inconsistent results. The list is long but, in practice, most of them are available (the topic of another blog post maybe).

At this point, I hope I have put the "anything +/-20%" Internet meme to rest. It should be rephrased into something like "very often quite close, sometimes 20% off, potentially anywhere if not used properly"

The emerging marketing meme

Now, let's have a look at the currently emerging meme: "CGMs are now more accurate than BG Meters".

Before I start, I'd like to stress that I am totally convinced that CGMs are the best tool to manage your diabetes. I can't stress that enough. But the reason why they are the best tool is not that they are more accurate. The reason is that they allow patients to understand how their diabetes work, how their body reacts to meals and exercise. 

But that is not necessarily how they are marketed. Dexcom said in one of their conference calls (I summarize) that CGMs were now more accurate than BGMs, opposing their best MARD (around 10-11%) which most people don't get in real life to the above +/-20% Internet BG meter meme.

The problem is that the current G4 needs BG meter calibrations. You can't logically claim that a measuring instrument B calibrated with a measuring instrument A will be more accurate than the instrument A. 

The eventual unavoidable systemic error in instrument A will be added to the error of instrument B in a complex way (error chaining analysis). Even if you are using "optimal" calibrations you will still introduce a bit of additional error, as Abbott has apparently shown in its Libre papers.

[Note: can be skipped if you don't want to nit pick... You could actually calibrate a device B with an inaccurate device A and get the device B to perform much better than device A if you do a large number of tests with device B. If you do 2, 4, 8, 16, 32, 64... inaccurate tests you will reduce the error by a factor of 1.4 at each step. Unfortunately, unless you want to do a lot of simultaneous blood tests, you are unlikely to approach perfection.]

And lastly

How did the BG meter worked vs itself in a more conventional medical view? In other words, how consistent was it?


In other words, excluding any bias, our BG meter works within the latest ISO spec in terms of result consistency and the ISO spec does not mean that its results are randomly distributed in the +/- 20% (or 15%) range.

Wednesday, August 5, 2015

Predictive is the new buzzword: Libre, Dexcom and Roche...

In November 2014, when we received our first Libre sensor, I was immediately impressed by its uncanny ability to closely match its spot checks to our BG meter values. The "keep the delay in mind" had been our mantra with the (non AP) Dexcom G4. Our first interesting test - a post meal increase - came up 14 hours after our insertion and blew me away.


 It almost immediately seemed a bit "fishy". Surely, the Libre wasn't immune to the average BG-ISIG delay. Could it be that it was actually wrong but lucky? But this kind of freak occurrence repeated itself throughout our Libre phase. The Libre was systematically ahead of the Dexcom G4. After the initial lowish startup few hours (a behavior that was seen in all our sensors except the ones we pre-inserted) it caught up and started to be ahead.

However, spot checks remained peculiar. Correct most of the time, but with a distinct overshoot in situations of fast rising BG. I couldn't shake the feeling that something strange was going on. This "paranoia" was also fueled by the observation that the Libre historical "average" value wasn't written at the end of the period, but after a certain delay. The Libre became, at least in my mind, the first "revisionist" CGM. Some of the spot checks did not materialize in the period averages but did fit quite nicely with what a predictor would have calculated.


At that point, I became convinced the Libre spot checks were, in times of changing BG, predictive.
As an amateur observer of the tools we use to manage our diabetes, I was also a bit shocked by the use of prediction. I guess that my expectations were that a CGM would try to match its current reality as closely as it could and that was it. None of the Abbott's material mentioned "predictive" as far as I could tell. Interestingly enough, the Libre RAW data that I had been able to mostly understand in December wasn't always showing these spikes either. Approximate averages of RAW closely matched historical values, more than they did match the "over-shooters" spot checks that did not ultimately materialize in  BG. Still I thought I was wrong, I had missed something and kept looking.

I then stumbled upon a paper that described how Abbott compensated for slopes in its Navigator II calibration algorithm. (FreeStyle Navigator Continuous Glucose Monitoring System with TRUstart Algorithm, a 1-Hour Warm-Up Time)

If it was kosher to compensate for lag in a calibration algorithm, the next natural step was to display  projected values to patients... As it turns out, using a predictive model allowed my raw data interpretation to closely match what the Libre displayed and understand odd, potentially dangerous behaviors such as the one described in the "meal and bath" incident. As an informed patient, that incident annoyed me profoundly: it is one thing to rely on actual measured data and another to rely on projected data that is equivalent 80% of the time, better 19% of the time and outrageously wrong 1% of the time. Your mileage may vary and this probably doesn't matter much in the grand scheme of things for the general T1D population, but still.

Back to Dexcom

This may have been one of the reasons we went back to the non AP G4 Dexcom, along with the obvious Abbott sensors availability issues. But then, I missed the quick reaction time of the Libre. To some extent, using xdrip solved the issue partially as it allowed me to get rid of the non AP G4 algorithm induced delay. But it also incited me to hunt for possible improvements on the G4 reaction time by using, you guessed it, predictive algorithms...

Now, before one gets too excited about the results, I'd like to insist that my "work" has been of the dirty, inconvenient, unpractical kitchen sink type of work. The first constraint is, of course, to have access to the dexcom secondary raw data in real time through xdrip. The dirty part involves artificially tampering with the 5 mins data frequency of the Dexcom. I needed value every minute and decided to interpolate the min by min data between Dexcom readings. That is not very clean but, based on the resampling done when comparing the Libre and Dexcom 14 days run, it doesn't seem to have any impact on the big picture. Then, based on that minute by minute data, I started issuing "predictions" of what 9 minutes later would look like and how it would match BG meter readings.

Here are the results: in both cases, my kitchen sink approach was able to drive the non AP G4 MARD from above 10% to below 10%.


Two points worth noting:
  • the resampling adds or removes a couple of points. This happens because my BG meter clock doesn't have second resolution and drifts a bit. Since I use the closest previous CGM data points provided it is withing two minutes on the BG test and since the granularity of the prediction is 1 minute, a few points drift in and out of the window. This has no impact on the results in one case and actually worsens it a bit in the other case.
  • the predictive algorithm usually improves accuracy but worsens it in some cases (not unlike the Libre in fact)
Anyway, since this is a kitchen sink experiment, who cares...

 

 Roche


But things start to get interesting with the Roche sensor. The Roche sensor is a bit of a "Loch Ness" sensor. Roche published material talking about Artificial Pancreas Development in 2003, supported by a micro dialysis CGM sensor. Twelve years later, they are still talking about it, but outside of a few clinical tests, I don't think many people have seen the beast. Micro dialysis has a few advantages over glucose oxydase based sensors, but also a few inconveniences. It relies on a flow of ringer through a double lumen catheter and the ringer, as I understand it, doesn't recycle itself and must be discarded. Even at a few micro-liters per minute, that puts some limits on what it achievable in the form factor diabetics are now expecting from their CGMs. Plus there is the issue of generating the flow.

The Roche sensor claimed extremely good results in that paper in 2013 (so much that I was expecting it instead of the Libre in 2014), was used in AP tests in 2014 And come 2015, new sightings of the monster have been reported. In this paper, for example, where it is reported to track rapid changes much better than the Dexcom G4: Rate-of-Change Dependence of the Performance of Two CGM Systems During Induced Glucose Swings.(Pleus S1, Schoemaker M2, Morgenstern K2, Schmelzeisen-Redeker G2, Haug C3, Link M3, Zschornack E3, Freckmann G3.)  

Hmmmmm, tracking rapid changes much better than the Dexcom? Where have we seen this before?

And where does the magic come from? Can you guess? A second paper by some of the same authors provides the answer: Time Delay of CGM sensors Günther Schmelzeisen-Redeker, PhD1,Michael Schoemaker, PhD1 Harald Kirchsteiger, PhD2 Guido Freckmann, MD3 Lutz Heinemann, PhD4 Luigi del Re, PhD2

And the answer is: predictive algorithms.

In an environment where the vast majority of endocrinologists are still somewhat uncomfortable with basic CGMs, devices that often do not actually display a measured value but show a predicted one could be a tough sell. Fortunately, the vast majority of practicing endocrinologists will neither have the time nor the desire to explore the darkest recesses of the technology behind their tools and they won't know.

PS: and yes, I am aware that yet another player based on another technology (senseonics) has also published significant results. But at this point, I have mixed feelings on the potential scarring.