Tuesday, May 24, 2016

Freestyle Libre: some data dump.

In previous posts, I explained how I regressed an approximate calibration slope for the Libre in early 2015. In this post, I will put it to use and provide some data, including full raw scans, for you to download. [download]. This is the data set used for the charts at the bottom of this post.

The source code below contains the data I used for this regression. There might be some overlap with the above files, and some missing. In fact, I could have used more than 100 values, which would not have helped much and would have added to the tedium.

I am sorry that I am unable to provide a ton of nicely arranged dumps, but remember that this data was acquired in 2014 and early 2015, with various tools, including pencil and paper. I actually shared that data previously on github, but removed it when it became clear I was getting zero data in return and tons of requests for the "formula".

 import matplotlib.pyplot as plt  
 import numpy as np  
 from scipy import stats  
 mono = {'family' : 'monospace'}  
 # scanned values  
 xlist = [101, 101,  72, 161, 75, 69, 112, 203, 163, 154, 168,  93, 99, 66, 80, 105, 137, 124, 156, 277, 141, 135, 67]  
 # observed counts  
 ylist = [906, 892, 689, 1291, 755, 664, 954, 1605, 1360, 1340, 1485, 805, 865, 632, 818, 971, 1206, 1064, 1368, 2150, 1300, 1167, 641]  
 # some outliers  
 x_outlier = [101, 153]  
 y_outlier = [1084, 1160]  
 # do the regression  
 gradient, intercept, r_value, p_value, std_err = stats.linregress(xlist, ylist)  
 print ("Gradient and intercept, r, p, std", gradient, intercept, r_value, p_value, std_err)  
 # plot it  
 fig = plt.figure()  
 ax = fig.add_subplot(111)  
 ax.set_title('Libre: correlation reported values / observed counts')  
 ax.text(350, 2500, '{:_<10}'.format('intercept: ') + str('{:06.2f}'.format(intercept)), fontdict=mono)  
 ax.text(350, 2390, '{:_<10}'.format('gradient : ') + str('{:06.2f}'.format(gradient)), fontdict=mono)  
 ax.text(350, 2280, '{:_<10}'.format('r    : ') + str('{:06.4f}'.format(r_value)), fontdict=mono)  
 ax.text(350, 2170, '{:_<10}'.format('p    : ') + str('{:.2e}'.format(p_value)), fontdict=mono)  
 ax.text(350, 2060, '{:_<10}'.format('std   : ') + str('{:06.4f}'.format(std_err)), fontdict=mono)  
 ax.set_ylim(0, 3000)  
 ax.set_xlim(0, 520)  
 ax.set_xlabel('Reported Value')  
 ax.set_ylabel('Observed Counts')  
 mono = {'family' : 'monospace'}  
 line = plt.plot([0, 300], [intercept, intercept+300*gradient], color='r', linestyle='-', linewidth=1, label=("04/2015"))  

Which gave

Gradient and intercept, r, p, std 7.26235656465 181.083590507 0.99295809065 6.0805538057e-21 0.189073777001

The conversion applied is as follows, based on the previously derived parameters. (I kept the habit of masking on 14 bits because in theory that is what the TI chip should deliver...)

def LibreConvert(r):
    bitmask = 0x3FFF    return ((r & bitmask) - 181.08) / 7.26

Here are the results on the data set above.

Small deviation, please note scanned value is "in trend"

Small deviation, trend uncertain

Small deviation, in trend.

Small deviation, in trend, possible noise

Small deviation, in trend

Bigger deviation, but in trend from the 145 going down, the Libre doesn't know we corrected the fall

Flattish, very small error

Again small deviation, fully in trend.

Nice regular trend, spot on.

Outlier, but in trend, the Libre doesn't know there's some exercise.

Stablish, in trend, possible noise our trend change.

Stable conditions, matching the scan

Deviation, but again in trend.

Nice trend, nice match.

Very large mismatch, but explainable if trend based on previous minutes
Note: I think I used this data previously on the blog - it should be noted that "raw" was in line with the BG meter testing, much more than the Libre Scan.

In trend.

Now, to recap...

  • I don't use the Libre anymore, this is based on 2014/2015 data. Some of you sent data, I will definitely have a look at it, thanks.
  • Match is near perfect, especially when the trend isn't changing abruptly.
  • These mismatches (I could devote an entire post to mismatched graphs) are always well in line with the previous trends and never against the trend which led me to examine the eventually predictive nature of the Libre's algorithm.
  • In many cases, direct interpretation of raw data led to better acccuracy with BG meter!
  • Those results are quite close to the actual factory calibration slope of the Libre (edit: the Libre we had - again, haven't tried any recent ones). I am confident this will be confirmed now that the official apk is in wide circulation. 
  • I still consider those results to be insufficient for release. That will be my opinion until the full Libre algorithm is documented.
PS: I'd like to stress, again, that I don't suggest, ever, that anyone uses a formula found on the Internet through copy and paste. IMHO and as shown by the outliers, calibration isn't the whole story.

Monday, May 23, 2016

When data fluxes collide: random thoughts on privacy and security.

When data fluxes collide...

"The privacy of your data is very close to our heart" is something the medical device industry likes to repeat over and over. Historically, they haven't done a great job. Some hilarious XOR loops managed to get HIPAA approval in the past, introducing additional vulnerabilities to the system they supposedly protected (http://www.securiteam.com/windowsntfocus/5HP0L152BA.html). Recent efforts haven't fared much better.

But lets go back in time a bit. Back in 1994 - I had just completed my military service as a MD in the Belgian Army stationed in Germany - I was still hesitating between Medicine and IT Security. That mix of competences was a bit unusual at the time and this led me to write regularly about IT and IT security issues in medical magazines.

The times were very different in 1994. A tech savvy MD might have had some DBASE based medical software, advanced ones were beginning to use modems to connect to labs and get their results through BBS software... Security was non existent. The only defense was the general ignorance.

This is also the first time (I had no connection to T1D back then) I considered what could happen if an insurance company got its hands on a HbA1c database. Back then, all the HbA1c values values of the patients treated at the large university hospital where I studied where one database query and a floppy disk away.

My Medipractice column in 1994
That scenario - someone stealing the Hb1c database and selling it to an insurance company - raised eyebrows and was generally considered too horrible to even be considered.

Fast forward to 2016 and, we have this

Theoretically, we could walk into Cigna and say, "You have 22,312 patients on our system. Here's how they're doing, and here's your 500 problem patients, and boom, b-boom, b-boom." Kevin Sayer - Dexcom (Motley Fool's Interview) - March 2013

Yeah, "boom, b-boom, b-oom"... 22 years later, the hypothetical scenario I feared could happen even if it sounded too blatantly criminal at the time, has not only become acceptable but is also, supposedly a sign of progress... so much that medical device makers, who hold "the privacy of your data so close to their heart" are willing, well, to peddle it around. "Theoretically, they could walk".  In practice, when it comes to patient access to their own data, they can also walk - backwards - and put all kind of limitations on your access to you data streams (remember, it is not yours, it is licensed to you).

But where does that lead us? Back in 1994, it seemed I was quite decent at spotting trends and risks. Today, I am getting old and am not so sure. One thing we know is that patients should be happy to pay a lot to get good treatments (Lily has this to say Yes, they (drugs) can be expensive, but disease is a lot more expensive,” Lechleiter, told analysts.) We also know that, thanks to our smartphones, we are leaking tons of data fluxes, apparently unrelated. Where could that lead us? I considered a few ideas, conjured awful total surveillance scenarios... Then, suddenly, I had a Eureka moment... UBER riders are apparently willing to accept 9.9x surge pricing when their phone battery is about to die.


I think there are a few lessons here.

  • your data will be locked up and secured as much as possible... from your eyes. While the concept will be sold to you as an obvious need for the security and privacy of your medical data, the real goal will be to limit access and monetize it. It's a variant of the "terrorists and pedophiles" argument in the total surveillance debate.
  • likewise, it is often stated that, if you are a good citizen, you have nothing to fear when it comes to total surveillance. I guess that you could as well say "if you are a good diabetic, you have nothing to fear about your data being sold".
  • and what if we transpose the Uber surge pricing model to the pricing of Insulin? Or the complexity of the hosted/concierged Artifical Pancreas algorithms? Or the number of adjustment scenarios? How much would you pay for an optional advanced algorithm if you have been hovering above 300 mg/dL for days, with no end in sight? Should Insulin cost more for people who don't exercise enough per their accelerometer data?
Do those scenarios sound as absurd or criminal in 2016 as the sharing of HbA1c with insurance companies sounded in 1994? Time will tell I guess.

"Yeah, we just increased the price of your insulin, but you know, being high carries a lot of risks"

"We're sorry Sir, your CGM company told us you are the problematic one".

Fortunately, all hope isn't lost. Just like in 1994, medical device security, algorithm obfuscation remains a field where the one eyed guys lead the blind...

Saturday, May 21, 2016

a bit of tennis...

It has been a while I haven't talked tennis. To be honest, when I started this blog, I expected it to use it as a channel to report on our sport and diabetes challenges and/or eventual successes. The reality hasn't kept up with our hopes and diabetes isn't the only culprit.

A few months after diagnosis, Max won its first real adult tournament. We experienced lows in the first rounds, but Max was playing against lower skilled opponents so they did not matter much. We took a more conservative insulin approach in the next rounds and discovered the drawbacks of having not enough insulin: Max became sluggish, extremely tired and lost his lucidity: it took an extremely honest opponent to tell Max he had won the game with his last forehand for him to realize he could stop playing. We fined tuned our preparation and Max finally won that tournament: quite an achievement at 13 against players aged 20 to 30, when your C-Peptide is at zero...

Unfortunately, the school+tennis program Max was a member of ended up being cancelled at the end of that year (this was not diabetes related). We soon discovered that it was easier, in terms of diabetes management, to play 10 to 12 hours of tennis each week on a regular schedule than to play less hours on a bi-weekly basis.

In terms of competitive games, In his pre-diabetes year, Max had played 49 games and won 4 tournaments, with a 73% win rate. The "diabetes" year saw only 34 games, with a 64% win rate, most of the losses occurring pre-diagnosis. Post diagnosis years saw 17 games (68%), 9 games (67%) and, this year 4 games (100%) so far.

A very mixed bag, indeed: on one hand I can only admire a kid who gets out of an hypo being led 3-5 in the third set to win 7-5... but on the other hand, the whole process is so frustrating.

I can't resist posting a small training video: the coach calls the serve type and location and Max is supposed to hit the correct cone. Easier said than done (and something I certainly couldn't do myself)

Our problems are, in decreasing impact
  • the Belgian weather and custom of playing outside on clay court whenever it is possible. This sounds a bit ridiculous, but is true. The best plan falls apart when you encounter a 6 hours rain delay. This could probably be partly addressed from switching to a pump from our current MDI scheme. Maybe we'll get to that.
  • our desire to maintain a very tight control (latest HbA1c was 5.1% on a high carb diet) which definitely rules out hovering in the 180-220 mg/dL range for hours while waiting to play.
  • the fact that it is hard for a normal teen to start worrying 12 hours pre-game, take all the correct decisions during the game and then, the correct decisions for the next 24 hours. 
  • maintaining a daily training routine (running a bit at least every day) is non-obvious when you are coming back from school a bit late, could not adjust your meal time and doses, and happen to have a mild hypo that needs a correction. Running too late could also lead to a delayed hypo, which then restarts the unfortunate instability cycle we all know too well...
This being said, Max keeps playing tennis. And, while this would have been unremarkable if he wasn't a Type 1 diabetic, he won his first four "interclub" games of 2016, playing either as second or first player in his team. His team mates have been great, ready to replace him at a very short notice when the weather does not cooperate. They also - for the most part - won their games and the team will now play the regional finals a couple of weeks from now.

Thursday, May 12, 2016

Libre Data Interpretation (continued - and probably final for parameters)

In the previous post, I left you with this approximate calibration curve which served me well for a couple of months.

However, there were occasional hiccups, that I spent a couple of months investigating. If you have a sharp eye, you will have noticed that it contains two outliers (highlighted in the second to bottom graph).

Similar outliers were also identified vs blood tests. Excluding reader errors, that could only mean one thing - the outliers were the results of an algorithm. In April 2015, I had enough outlier samples to reach significance and summarized my findings here.

 In short (graphs from the above post).

The Libre predicted highs that did not materialize and rewrote them after the fact.

 The BG Meter often agreed with a direct interpretation of the raw data in those cases (implying the outlier spot checks were algorithmic)

I spent some time looking at problematic cases (above) and less problematic ones (below)

That allowed me to detect outliers, remove them, and fine tune the data I used for the calibration slope parameters. Here are the resulting parameters, when blatant outliers are removed. You can see that, suddenly the correlation and confidence level improve tremendously.

The math and the data sets told me that I was really close to the real thing. On individual runs, my "private experimental" algorithm started tracking the Abbott data very very closely. (to be honest, I do not remember precisely if I had already started at algorithmic issues when I generated this chart as I was doing many things in parallel.)

At that point, in April 2015, I moved to 181 intercept and 7.26 slope. While the difference in numbers may seem large, it does not make a huge difference as you see below.

If my profound indignation at discovering the "divide by ten" surprised you, consider that I spent two months (not continuously of course ;)) waiting for special cases to examine and understand to go from the parameters  of 01/2015 to those of 04/2015, which I consider (because the math tell me so, and the subsequent runs confirmed) very significantly better.

Wednesday, May 11, 2016

A Libre summary, some disappointment on the third party side and possibly new information

As I have said before, we have been off the Libre and back to the Dexcom G4 and now G4AP. That decision was mostly motivated by availability issues, not product issues. I learned a lot during my Libre investigations and used the knowledge I had gained to improve the results we got from the G4 (non AP) running a custom algorithm on "raw" data provided by xDrip.
This being said, most of the traffic, the mail and contacts I get from this blog are still Libre related. That is why I have decided to post a summary of the Libre information I posted here, take a look at what the current landscape offers, rant a bit and... maybe provide a bit of new information for those who can read between the lines.

Quick Summary

I could not resist comparing the Libre and the Dexcom as soon as I got it (first days - full 14 days). We were lucky and the Libre lived up to the hype, especially in terms of speed. I also started looking at the Libre technical details as soon as I got it. Reading the NFC tag and interpreting its structure led me to post this in December 2014 which was soon picked up by others and led to the first Libre data interpretations attempt on GitHub. Meanwhile, I worked on my own interpretation.

As great as it is, the Libre had its dark side as well, uploading your data to a remote server while explicitly telling you it did not. That issue is now "solved", but solved in a way only big companies can get away with: the Libre still grabs your data, but its license has changed...

Anyway, the Libre still outperformed the Dexcom in terms of speed and accuracy, which was extremely useful for sports. Speed matters in other practical circumstances, alerting to a bad hypo before it could harm while the Dexcom was on strike. That hyper reactivity unfortunately has downsides as well, which brings us back to data interpretation.

By then, I had a fairly decent data interpretation going. But, working from scratch, with no outside help and with an extremely limited supply of sensors was becoming a bit tiring. It involved a lot of fiddling, reading and toying with the custom TI/Abbot chip uses (did you know the FRAM also contains part of the code - taught myself a bit of MSP430 assembler). The Libre behaved oddly at times, simple data interpretation failed... The reasons behind this became clear to me: the interpretation algorithm was not direct, temperature also played a role (there are two thermistors used). So did algorithms. Abbott had used predictive algorithms in the past, Roche was using them in their yet to be released sensor and that is what I used to improve on our Dexcom G4 non AP result. Could it be a factor? I had, as I have shown above, a decent interpretation in January 2015 but certainly did not have all the answers. This is why I decided not to release anything in public: I explained my reasons here. Looking back at them, the custom micro-controller remains an issue, the interpretative algorithm remains an issue, Abbott has not been as aggressive as I thought it could be and I still don't care much about a smartphone application. Point 3 was essentially: "it would be in the hands of real people" and, with that, comes a certain amount of responsibilities. Which brings me to...

The current state of third party applications.

When I started looking at the Libre, despite enjoying the process I had hoped to get some assistance. Nightscout was a team effort, thriving on the talents and skills of many different people. I can't remotely hope to emulate a whole team of dedicated competent and generous people. And I did make a lot of contacts and virtual friends in the Libre world. Some of you have been extremely generous, offering to send me sensors or the very impressive hardware they have developed. But an awful lot of contacts were simply "gimme, gimme your formula", some of them offered fair commercial deals (not what I was looking for), some were either a bit delusional or too optimistic as far as my abilities are concerned as in "I will pay you so you develop an artificial pancreas on my current insulin pump controlled by the Libre"... Yeah, just a small afternoon project...

But, what I expected (possible spoilers ahead) would be some level of technical assistance. Something like "I happen to be an engineer with embedded TI 430 experience and this is how you look at the SRAM" or "These are obviously CRCs, that's how data corruption is detected" or "have you looked at the difference between thermistors" or "Which sub algorithm do you use? Some algorithms are well suited to smoothing signals while keeping the actual peaks"... 

Looking back, maybe I was the delusional one. :)

Third party apps

Third party apps were released, one of them I am told is fully open source. They were welcomed with enthusiasm (people really love to see number on cell phones). That was great, and I was actually happy that maybe this blog has helped a tiny little bit in their genesis. But, to be honest, I have not tried any of them, simply because I had no sensors. Then, readers of this blog contacted me and asked questions or even expressed some level of disappointment. That piqued my curiosity and I had a look: from what I remembered, there has been some improvements in the way the sensors are read, that's a plus. 

However, glucose level computations seem to be lost in the dark ages!

One of the applications I looked at simply divides some value by 10, another interprets the values with an approximate formula derived from what was posted on Github more than a year ago. A third one is a basic copy paste of that initial Github formula. The problem with that formula, which was recognized at once by their authors, is that it does not work very well. To be honest, I was floored. I haven't seen any attempt at addressing the thermistor and real algorithmic issues. Somewhat hilariously, some of those formulas are kept "secret" behind... well... very weak doors.

So, basically, those apps display approximate values (how approximate depends on the range in which they are applied) which are considered something like "good enough anyway", they are a direct (incorrect) translation of a single value in the Libre and disregard any of the things that make the Libre a great CGM/FGM. 

And, as usual, people start using those values. When the results were too bad, the idea of "calibrating" them was floated (and possibly implemented, I wouldn't know). Rich idea: calibrating a device whose main advantage is to not need calibrations. As a bonus, it will allow people to whine against their BG meters again...

Let's compare this to xDrip/NS for example

  1. xDrip works on pre-processed data (not pure raw).
  2. used/uses an interpretation of that data that has been extensively tested and does not break down in some ranges (and whose initial version has been, incidentally, clearly detailed in Dexcom patents)
  3. sticks closely to the Dexcom data.
  4. uses proven mathematical techniques to generate its calibration, techniques that have also appeared in the newer Dexcom versions
  5. adds to the Dexcom functionality, does not remove anything.
whereas Libre Applications

  1. work on non pre-processed/adjusted data (pure raw)
  2. use a mix of approximate formulas that are know to break down in certain ranges and situations
  3. are hit and miss as far as matching the Libre
  4. do not generally bother with any recognized techniques
  5. remove reactivity, potentially add calibrations, lose accuracy

I can see the moment when someone will run a home made artificial pancreas based on divide by ten.  I love the decimal system, I really do. But that scares the shit out of me.

Turning responsibilities around.

Let's go back to the responsibilities issues. Whenever you release something that can have an impact on people's health, you are assuming some responsibility whether you think so or not. Authorities, even if they are at times heavy handed, think so as well. Regardless of your intentions, honorable or not, driven by profit or by a genuine desire to help, responsibilities remain.

This is why I decided not to release my interpretations: even if they work for me most of the time, there is still a risk they won't in some cases and I will feel guilty. But I will not be delusional and believe that other people will agree with me.  That is why I have decided to release some more information on what I did, now and possibly more in the future. 


The first topic I will address will be reaching an approximate formula that works a bit better than the "magic" ones, they way I did it in January 2014.

Fact: the Libre is a CGM. It uses a technology called "wired enzyme" which basically means that it works at a lower voltage (less or no interactions with other substances such as paracetamol), a different electrode configuration from the Dexcom's and a different chemical process (mostly a different electron acceptor). However, it still remains an amperometric sensor. That means that it does not escape calibration, base signal, etc... Since Abbott's sensors are stabler and more consistent than Dexcom sensors (something Dexcom will have to address at some point, not sure they can in the sea of patents in the field), they can be factory calibrated. It is not visible to the user but they do have a standard linear calibration! That calibration is used to eventually convert the nanoamps provided by the system (if the system does not provide nanoamps directly, calibrations can be chained) into glucose values.

Problem: how do you get access to that calibration? Well, there is the non politically correct method, which could eventually yield the exact intercept and slope of the factory calibration curve, but there is no need for it if you sweat a bit. You simply measure the system. In the case of the Libre, that means take a reading with the Abbott reader and dump the NFC data as closely as possible.

This is the result from a series of such tests I got in early 2015 on my 2014 data. That process is all you need to replace any eventual magic or copy pasted formula.

Key points

  • that is a process, a method: by all means derive your own values on current sensors. My values worked very well for us (no catastrophic breakdowns at any range) from November 2014 to the beginning of May 2015. All it takes to invalidate it is Abbott's changing the tiniest thing (membrane permissivity, chip, chip gain, whatever...) to invalidate those values. This is really what struck me when I looked at interpretations in the recent weeks. It seemed that no one had moved from basic, known to break down, formulas (apologies if you did, maybe I haven't seen it). Do not copy paste my parameters. Don't be lazy. Derive your own.
  • you will notice that there is some dispersion for some values. They are caused either by the algorithm, thermistors and relative thermistors, or errors (where your app would display a value when the Libre reader would not). I may or may not look at those issues in the future. There is a valid baseline (calibration slope and intercept) around which those values will gravitate.
  • that baseline (well, the one you have established for yourself) is a better platform to begin investigating the other issues.
  • don't mail me if you don't know what the above chart means.


When you release an app, you are putting glucose values into people's hands. While I tend to agree that, for a lot of applications, rough zones can be OK, drift in the high ranges can be a lot of trouble if they are used to calculate corrections or drive pumps. Think about it FFS!

Magic is not acceptable...

Tuesday, May 10, 2016

CGM: post insertion wound impact - an interesting paper/patent.

We all know (well almost) that the first 24 hours after sensor insertion can be a bit tricky in terms of accuracy. The Libre and Dexcom typically display very different behaviors: the Libre tends to read too low for a while. The Dexcom tends to oscillate and be a bit (or a lot) noisy...

During our fist Dexcom year, I tried various mathematical techniques to analyze and possibly improve the first day results. Once you have collected a bit of data, it is relatively easy to retrofit a solution and find out how interstitial glucose evolved. This is unfortunately not that useful in practice: smoothed retro-active data is nice, but hardly helpful in the moment.

Tapping in the "Internet wisdom" isn't very helpful either.
  • a lot of Libre users have a few hours where the sensor reads too low, but some don't.
  • a lot of Dexcom users see a noisy post insertion period, some see oscillations, some don't.
  • for some people, bleeders are keepers. For others, bleeding means sensor failure.
  • etc...
Our strategy for dealing with them has been to feed more calibrations during the first 24 hours, with the goal of reaching a close approximation as quickly as possible, thanks to error averaging. That approach worked well, but isn't easily explainable to users who aren't comfortable with maths and can't analyze the situation they are in. If the sensor values oscillates for example, you need to recognize that and feed the calibrations along the baseline. But what worked with the G4 non 505 will not necessarily work with the G4 505 or the G5 as the "blackbox" algorithm basically does as it pleases with the calibration data you provide...

The bottom line is that we'll simply have to wait for manufacturers to improve the way they deal with the first day.

As far as the root causes of that first day inaccuracies are concerned, the medical literature usually does not go into details. It is obvious that a less traumatic insertion will lead to better results (blood is not the sign of trauma, just the sign that you went through a small vessel). It is well known that the foreign body gets encapsulated over time. Individual responses in terms of trauma and encapsulation run the usual biological variability gamut...

If you are interested in the detailed mechanisms behind those phenomenons, I suggest you read this Analyte Sensor patent. It's a patent, but it reads like a very nice accessible science paper from location [150]. And it is a fascinating read! I knew of some, but by no means not all, the source of interference it describes.

Please keep in mind however that this is a relatively old patent which means that while all the potential issues it describes are valid, some of them have been addressed in new sensor designs.