Sunday, April 19, 2015

I love the Libre when it misbehaves!

UPDATE 20/04/2015: a lot of info about the methods used can be found in this Abbott patent. Very interesting and quite longish patent. One of the scenarios makes use of both the external thermistor and the internal TI thermistor. There's also a mention of a possible in-sensor thermistor. If you read the patent, you'll see why a straightforward temperature interpretation is hard to come by.

I just love when the Libre misbehaves! Really!

Let's come back to the temperature impact and the predictive nature of the Libre again.

Same very obvious scenario tonight: Libre checks at 109 then 111 mg/dL, slowly rising (expected). A few minutes later, BG meter check is at 125 mg/dL. Perfectly in line with the slowly rising trend and the RAW data interpretation I have been using since day 1 with this sensor. So far so good.

Then, Max has a very quick warm bath (Call of Duty time depends on the bath not being too long - always a guarantee for results). He dries and dresses himself and comes down.

Libre spot check at 194 mg/dL: perfectly in line with what extrapolation from past data would give. In this case, linear regression and derivative based methods give roughly the same result. Pick your points between -16 and -7, you will still end up in the same neighborhood regardless of the method.

Blood check at 118 mg/dL: BG is actually falling. We have a 50+ % difference between the spot check and BG. RAW data also seems to be falling and the trend of the fall fits with what you could expect from a standard IG-BG delay (although this could of course be due to the sensor cooling).

A couple of minutes later, as we want to track how fast the Libre spot check witll correct itself, the sensor gives us its "No result available, check in ten minutes" standard message. Yeah, we kind of expected that, didn't we?

Post mortem, here's what the Libre reports it saw. There's a possible mild overshoot at 152 mg/dL, not severe enough to be conclusive, but not in line with the later BG meter check. The 192 mg/dL and then the hypo (real) that came a bit later. I have added what I believe to be the real trend line based on BG Meter testing.

I really need to compare the response curve of that thermistor with previous Libre thermistors. If Abbott uses a look up table to compensate for the temperature, variability in thermistors could explain why this phenomenon has been as striking with this sensor.

Over enthusiastic predictive algorithm? Non-linear or bogus factory calibration curve? I have my opinion, but please make your own.

I will probably have a more in-depth look at what I have collected during this run in the next couple of weeks, but our next run starts tomorrow and it will be Dexcom + xDrip. It may be a while before I come up with new tests.

Friday, April 17, 2015

Freestyle Libre: questions, getting it to fail and possible answers.

Some CGM behavior characteristics are often a bit tricky for newcomers. The main one can be summarized as "CGM measures interstitial fluid glucose values and as such will trail blood glucose values by 15 minutes." New users are told not to trust CGM values when glucose levels are changing rapidly. They are told not to calibrate their devices when glucose levels aren't stable.

The scenario is always the same: a new user pops somewhere and complains about its CGM accuracy. Dozens of helpful users show up and explain the above point. A year ago, I would probably have been among the helpers.

There is indeed some very solid evidence to back those assertions. The scientific literature has consitently reported a delay. The field of BG <> IG exchanges and equilibrium has been studied extensively. A good summary can be found in A Tale of Two Compartments: Interstitial Versus Blood Glucose Monitoring. This article is a must read if you are interested in what really goes on beneath your skin in terms of G exchanges. Please note that there is some mild polite disagreement on the exact length of the delay which as been reported to be as small as 6 minutes to as big as fifteen minutes. Several mathematical models have been developed to model the exchanges and the correlation between IG and BG under different circumstances. There is also, believe it or not, a delay of around 1.5 minutes between the concentration of Interstial Glucose and its concentration in the sensor (because the sensor is protected by a selective membrane).

About a year ago, I visualized the process as two reservoirs: the blood reservoir, receiving a certain input of G (food, gluconeogenesis), connected by a pipe that would allow a certain flow to the interstitial reservoir which in turn would leaking G (for tissue consumption). I wasn't that far from the truth, except that my model was very basic. It is of course a bit more complicated in reality as explained in [SORRY WRONG LINK - WILL FIX] Estimating Plasma Glucose from Interstitial Glucose: The Issue of Calibration Algorithms in Commercial Continuous Glucose Monitoring Devices (which is probably not the most complex paper on the issue...).

Now, the interesting thing about having a model is that you can make predictions. We've all heard about the global warming models and their different predictions. Keep that in mind...

Questions, questions...

... and allow me to backtrack a minute. Over the last year or so, a few things kept bothering me.

First, while the small Dexcom calibration analysis I carried out on user submitted files clearly showed a couple of things such as the relative lack of accuracy of the Dexcom in low ranges and the consequences of calibrating in that range, I failed to identify an impact of the rate of change on subsequent accuracy. Why could I not find the obvious everyone was talking about? In a way, that was understandable since users were generally following the rule no to calibrate when G is changing. But even if I intentionally cherry picked the cases when they ignored the rule, I could not find an impact nearly as significant as low range calibration.

Then, when Dexcom released its AP 505 algorithm, it appeared that they suddenly started to trust the value they were receiving more often. That was a bit puzzling: how could simply upgrading a receiver suddenly transform data from an unchanged sensor/transmitter into "better" data?  I speculated that the sensors had possibly been improved and that the transmitter hardware and firmware had been upgraded and that Dexcom simply expected the changes to percolate as new sensors were being sent to users.

 Finally, the Libre behavior was puzzling at times. It would track wonderfully and suddenly jump the gun and display a value that was much higher than expected and higher than what the BG meter would say. Then, after a while, it would go back to extremely good tracking. Leaving aside the temperature issue, I noticed - thanks to our tennis sessions - that the Libre tracked better standard G increases (meal not followed by physical activity) than carb loading followed by exercise. The mystery deepened as soon as I gained a fair understanding of the raw data format. In some cases, the Libre would report spot check values that seemed to ignore what its own raw data said. Then, after a while, if those spot checks did not actually materialize, the historical data would simply act as if they had never existed.

The Libre constantly rewrites history! (but for a good cause from a clinical point of view). At that point, after having collected several typical situations, I became convinced there was a predictive part in the Libre's behavior and tried to develop a decent (but most certainly highly simplified) model of its behavior that led to decent pseudo CGM runs.

 One question lingered in my mind though: was applying a predictive model justified? Or was it simply because I missed a simpler explanation that I was forced to "cheat"?

I decided to run some tests again, starting from a very well behaved Libre sensor. As you can see on the left, that sensor performed almost flawlessly compared to our BG meter when we used it for scheduled tests. Data point 7 shows a hint of trigger happiness. Data point 11 was a compression event.

This extremely well behaved sensor (I can assure you that I have not cherry picked the best results: these are the only results we had for the first five days starting 16 hours after a pre-insertion) seemed to be an ideal candidate for reliable experiments.

Here's the result of an experiment intended to mislead the Libre, based on our understanding of its behavior. The blue line is my simple/direct interpretation of the raw Libre data: it is possibly inaccurate, but it has worked reliably in stable conditions for the first five days of the life of the sensor. Max eats a bit and starts moving. We start from a stable and accurate situation where the Libre and the BG meter still match. However, 15 minutes later, the Libre spot check gives 163 mg/dL while BG is stuck at 120 mg/dL. Has our perfectly matching Libre suddenly lost its marbles? Not really: it seems to have predicted the BG meter value based on the data it had roughly 10 minutes earlier. Twenty two minutes later, BG has started to rise again and the Libre seems to be back on track based on the data it saw roughly ten minutes earlier. That could, of course, be a coincidence. However, keep in mind that it was a verification experiment designed explicitly what we noticed before... Also worth noting is the fact that simple linear extrapolation doesn't work too well in general, but we'll get back to this below.

Going back to the roots to get answers.

If you've been reading this blog for a while, you've probably understood that I am more interested in the process of investigation than in reaching some kind of goal. The fun is to ask questions and to try to answer them. But sometimes, you hit stumbling blocks and I started looking a bit at the literature. That's why I did when it turned out that my pseudo-CGM model wasn't working too well in all cases.

I had several questions in mind, seemingly somewhat unrelated.

How could I explain that the Dexcom signal quality suddenly increased without obvious sampling hardware changes?

How could I explain that the Libre seemed to predict BG uncannily at times but also overshoot badly in some usage scenarios?

How could I explain, when I looked at the Dexcom calibration, that the actual BG slopes at calibration did not matter as much as they should?

One very interesting paper to start with was FreeStyle Navigator Continuous Glucose Monitoring System with TRUstart Algorithm, a 1-Hour Warm-Up Time where we learn that in the Freestyle Navigator...  

TRUstart corrects for the effect of interstitial glucose lag, and the window for calibration has been opened to rates up to ±3.5 mg/dl/min. Also, the acceptable glucose range for calibration was increased from 60–300 to 60–400 mg/dl, because data collected after the initial product was introduced have demonstrated sufficiently accurate calibration in the range of 300–400 mg/dl. 
and that
To obtain accurate calibration during times of glucose change, a first-order linear ordinary differential equation is used to describe the difference between blood and interstitial glucose.3 Using this model, the sensor current for sensitivity calculation is corrected for an average time lag of 10 min. The model requires an estimate of the rate of interstitial glucose change, which is calculated from the 1 min measurements ±7 min from the time of the BG calibration test.

What I learned from this article was
  1. slopes of +/- 35 mg/dL in ten minutes are officially considered acceptable for time corrected calibrations in the Freestyle Navigator. That's a significant slope! Since the Dexcom users who shared their files generally tried to calibrate when the trend was stable, that also explained why I couldn't spot any impact of the trend on the calibration accuracy (OK, while this article doesn't describe the Dexcom, there's no reason to assume Dexcom hasn't thought about correcting for slope as well).
  2. calibrations in below 60 mg/dL are rejected. That information also confirmed what my calibration analysis showed about the impact of range (many other concerns in the literature about that low range anyway).
  3. that the notion of predicting blood glucose based on a model of the IG-BG exchanges was perfectly acceptable. We can, I believe, legitimately speculate that, if the Freestyle Navigator uses a predicted BG value to put its calibration value in the correct time frame, the Freestyle Libre could very well be using that predicted BG for its spot check values when conditions are changing. The algorithm could be augmented by safety rules that would, for example, not predict values if the model projected impossibly low BGs.
And finally, the reference to the first order linear differential equations leads us to the notion of derivatives. Derivatives are easy if you are working with schoolbook equations (symbolic differentiation) but it becomes much harder if you are dealing with a function specified by the data you are measuring. Leaving the approximation of that function aside, we hit the problem of noise. Measures typically contain a certain amount of noise and, unfortunately, that noise will be amplified by differentiation if the data isn't smoothed. The reference in the second quote links to this 1999 paper Subcutaneous glucose predicts plasma glucose independent of insulin: implications for continuous monitoring

which states that
Although the correction is essentially perfect in the absence of noise, the addition of even a small amount of noise (0.75% noise) dramatically degrades the  calibrated sensor signal
(which will not be a surprise to anyone with a background in signal analysis)

and then proposes to correct the problem by using a three point moving average, not of the values, but of the derivatives terms. Three values? Sensitivity to noise? That is also reminiscent of the Dexcom's behaviour and could explain why a different algorithm, less sensitive to noise, could suddenly consider data to be useable and "clean" when it wasn't before. It is therefore possible that the reason why the Dexcom AP algorithm treats more data as clean than the previous version: it could use either a totally different IG-BG model, or a more noise robust differentiation algorithm.

In a way, that was both bad and good news. While I was reassured that the core idea of my own interpretative algorithm is sound, I became painfully aware that it was a bit simplistic. Exploring a labyrinth in darkness is fun, but I probably look like a hopelessly naive idiot in the eyes of the real guys who are developing this thing. But that paper offered plausible explanations or validation for all the above questions!

Ah, one last question. What do I know about the noise? Very little actually. And data seems extremely hard to get. As the Roche researchers note in this paper
However, the glucose sensor raw data are not usually included in manufacturer publications, and therefore no quantitative statement can be made about comparing noise levels between different CGM systems.
This being said, the design of the Libre sensor on the TI platform is straightforward enough that it could give some insight...


For standard users: as CGM become more and more complex black boxes, common wisdom might be out of date. It may apply under some circumstances and simply be extremely wrong in others. The eventual model may decide not to kick in, might deliver too good to be honest results, might overshoot, etc... There is no need to repeat old truths endlessly when the reality evolves rapidly.

As far as Artificial Pancreas teams are concerned, they probably ideally would need to have access to actual raw data and noise data, as close to the source and as unprocessed as possible. Running a secondary model on an eventual black box predictive model as exposed in the spot checks of the Libre is hopeless. An extrapolative model built upon another extrapolative model is asking for big trouble. Business-wise, that could mean that a deep collaboration with a CGM manufacturer is required and that ultimately those guys will get to control who releases what.

Wednesday, April 8, 2015

Some Libre peculiar behavior you should know about: temperature

Have a look at the following strange ISG pattern (sorry for the poor quality, that was a quick cell phone shot). The sequence of events is roughly as follows: a low around 17:00, probably over-corrected. A bit before 18:00, we decide to go for our evening dose and meal since we probably would have had to correct, then stack insulin for the meal. Best to combine the two and assess the situation. At 19:00, we seem to be on our way down and Max decides to take a warm bath. Max, much to my despair since I am the one who checks for adhesion, loves longish warm baths...

As soon as he goes into the water, his ISG seems to start rising sharply again! One possibility could be that his stomach suddenly offloads food to the duodenum and that he enters a "secondary food absorption phase"  ( since a gravitationally lensed mirror ISG image is unlikely ;-) ). The problem is that I have never seen such a bi-phasic digestion pattern.

A bit before 20:00, a spot check gives a 254 mg/dL reading and the question of an additional dose of insulin becomes legitimate. Did we under-estimate the meal? Are we seeing some fancy cheese induced digestive delay?

While we usually have no issue giving corrections based on the Libre, once we have established it is tracking accurately, this isn't an obvious decision. The sensor has been perfect, but it is only in his first day, approximately 40 hours after insertion (we do pre-insert to avoid the first day low sensitivity phase).

But we remember that we have, in the past, already seen anomalous high readings after warm baths. A BG meter check indicates 137 mg/dL close to the top of the second bump where the Libre displays 254 mg/dL. A second BG check indicates 129 mg/dL a few minutes later and, around 21:00 the Libre and the BG checks are back in sync in the neighborhood of 100 mg/dL.

What happened?


See that protruding thingie in that opened Libre sensor? You could think it is an antenna. But it is not: the antenna is the concentric circle around the circumference of the board. What you see is a thermistor (tests indicates that it is either a TCS651 or a component with a similar response curve). Temperature plays an important role as far as the Libre system is concerned. You may have experienced the "too cold, scan later" message while jogging outside in cold weather. Or may have noticed a loss of historical data (gaps in the retrospective BG curve) in the same circumstances. What we did notice is that our usually extremely accurate Libre would often be a bit enthusiastic, even trigger happy after a bath.

Given the fact that we have three blood checks that are perfectly in line with the expected insulin induced downtrend, that the Libre progressively caught up and that we have raw data corroborating the scenario, we can safely affirm that what we have seen is a temperature induced artifact. It is also, for some reason, an artifact that the Libre's algorithm considered plausible and therefore did not flag as "not available".

Going Deeper

I am a sucker for that kind of thing and can't help looking for rational explanations. Unfortunately, since Max does not share my enthusiasm for investigations and since I can't decently expose him to successive cycles of heating and cooling, I have to limit myself to a somewhat theoretical approach: looking at and interpreting accessible raw data, understanding and measuring devices statically.

Here's a closer view of the system (capacitors unmarked, but they shouldn't be too far from TI reference designs)

A biological explanation would be that heat increases the blood flow through vasodilation and probably the speed of the GOx reaction which could lead to inflated results. But, in that case, shouldn't the role of the thermistor be to compensate? Could that mean the thermistor isn't filling its role? Or is it delivering correct data to an overly conservative compensation algorithm? Does the predictive algorithm whose existence I suspect become confused? No idea. If I knew everything, I would probably be building my own sensor. But exploring is fun!

In practice?

Even without understanding everything, there are a few practical lessons here. When the temperature situation is unusual or changes, too cold, too warm, consider your Libre results with some caution. The system does a decent job at detecting when its data is potentially unreliable, but it is not perfect. The Summer might be interesting. What impact will sunbathing have? The Libre is white and reflective. But what will happen if you stay for hours in the sun and are the proud owner of a Libre sensor that has been decorated with a black sticker? Just out of curiosity, I'll be watching.