Sunday, March 20, 2016

Genetics and Type 1 Diabetes - SNPs and some basic stuff

T1D patients, caregivers, we all ask ourselves the same questions: why me? why us? who is next? There is no simple answer. Except for special cases where a mutations in specific genes have been clearly identified (often called Monogenic Diabetes), the true causes and origins of Type 1 Diabetes remain unknown. We do know that genetics play a role, we do know that the environment plays a role, but we still don't have a good understanding of what is going on. Sadly, despite tremendous advances in investigative technologies, genetics and immunology, the 1987 description of Type 1 Diabetes pathogenesis taken from my old 11th Edition of Harrison's Principles of Internal Medicine would still be acceptable today.

That's, in a way, very depressing. The field has been investigated so much that the chance of a serendipitous discovery that would explain everything is extremely low.

The Human Genome Project disappointment.

A few years after that book was printed, the Human Genome Project started (1990-2003). Hopes were sky high: cancer and diabetes were the main targets. On those fronts, the project ended up being a huge disappointment. The somewhat simplistic view that, once we would know our genes, we'd end up knowing all we needed to know to cure many diseases turned out to be false. Protein folding, gene expression, control mechanisms, interactions with the microbiome, etc.... "Minor" complications abound. The main benefit of the HGP is that it opened the doors leading to many new labyrinths that will keep u busy for the next century.

What do we know about the genetics of T1D?

We know that the major genetic Type 1 Diabetes risk factors are genes/proteins of the Major Histocompatibility Complex coding in part for proteins called Human Leukocyte Antigens (HLAs). The terminology in science papers can be a bit confusing because our knowledge of HLAs predates our knowledge of the genes coding them. Researchers typically use the same terms (such as HLA-A) to refer to either the antigen or the gene, depending on their field of investigations (transplant rejection or genetics for example).

A small vocabulary refresher now. Broadly speaking...
  • amino-acids are the bricks out of which proteins are made.
  • a "base" or "nucleotide" is a molecule used to encode genetic information. There are four different bases in our DNA (we'll call them by their abbreviations A,T, G, C) . In groups of 3, they form a redundant (fault tolerant) code for amino-acids. The two DNA chains are complementay G will always pair C and A will always pair with T. RNA uses the same complementary code with its own set of bases (3 identical to DNA, one different) transfers information from our DNA to ribosomes to create proteins.
  • a gene is a string of bases that code for a protein.
  • an antigen is a molecule that triggers an immune response. It can be a protein... or almost anything else.
  • an antibody is a Y shaped molecule that locks on to antigens during the immune reaction.
We also know that several other genes and SNPs are associated with T1D.

Now, that I have said this, I'll need to explain "SNP" and "associated"

SNP stands for single nucleotide polymorphism. Our DNA is made of two complementary chains of nucleotids, for example...


We have a single nucleotide polymorphism when, in a population, we have different "options" for the same location, for example...


in 80% of the population and


in 20% of the population.

It is called "single" because even though two nucleotids change in our DNA, the other change is automatic. On top of that, since our DNA has a preferential direction in which it must be interpreted, only one change matters.

SNPs can be anything: they can be in a coding gene, in a regulatory region, in a "junk DNA" region.

SNPs can lead to the generation of a different protein (when they change the amino acid they code for) or can change nothing (because of the redundant nature of the genetic code).

SNPs can reflect a functional change in a function we haven't discovered yet.

The initial reaction is "What a mess! What can SNPs be useful for?"

When they are in the coding region of a gene, the answer is pretty straightforward, the SNP can directly tag the gene variant you carry. In other cases, the SNP can be part of a piece of your DNA that moves around with a certain gene and be a good proxy for that gene. In other cases, an "apparently random" SNP we can't connect to anything can be correlated/associated with certain diseases in statistical studies called GWAS (Genome Wide Association Studies) - I'll probably go back to those in a later post.

But, as the saying goes, correlation does not mean causation. This is something everyone in the field is fully aware of. When one says that a certain SNP is associated with Type 1 Diabetes, one basically means that this SNP is slightly more frequent in the T1D populations than the non T1D population. Again, in some cases and for some diseases, the SNP may happen to tag the exact coding location that modifies the protein and is directly linked to the disease. But this is not the rule.

If a SNP is more present in the T1D population, it can - in theory - be used to calculate a relative risk increase. In extreme cases, you could read that the presence of SNPxxxxxxx in your DNA indicates that you have 200 times more chances to develop a specific disease. In other cases, the association is weak and the uncertainties so large that the SNP is useless by itself.

So why is there so much research around SNPs? For several reasons

  • getting a SNP coverage is much cheaper than full DNA sequencing.
  • when SNP match coding genes, the benefit is immediate.
  • some specific SNPs or combinations of SNPs can replace more expensive tests.
  • SNPs can allow to statistically rebuild a "full" approximate genome (the process is called imputation)
  • combining multiple significant SNPs may allow to discriminate otherwise outwardly similar patients


An example

Let's now look at a sample abstract

In this example, the rs763361 SNP belongs to a coding gene CD226 and has a direct effect on the coded protein. That protein happens to be a glycoprotein involved in immunity (mostly of interest here are the NK and cytotoxic lymphocytes for which it could promote adhesion to the target). The coding here could be CC, CT or TT, each of them with a different association level with T1D. The CC genotype would be the "normal" population. The TT genotype would be at some level of risk and the CT allele at yet another level. Risk is provided as an odds ratio, with a wide confidence interval and a P-Value.

This example also shows how things can become tricky extremely quickly. Firstly, CC and TT are two different genotypes. CT is the single nucleotide polymorphism (either vs CC or TT). Secondly, the 95% confidence interval for the increased risk covers the 1.25 to 4.18 range. From a mild increase in risk to a very significant one. Thirdly, the study would probably not stand alone but is considered in the wider context of other studies.

Not that convincing or useful on its own, fits in the bigger picture.


Another Example

In this very recent paper A Type 1 Diabetes Genetic Risk Score Can Aid Discrimination Between Type 1 and Type 2 Diabetes in Young Adult published in Diabetes Care, Oram et al. exploit cheap SNPs to discriminate between Type 1 and Type 2 diabetes in young adults.

The obesity epidemics has unsettled the old stereotypes: a young adult or late adolescent with Diabetes is not almost automatically a Type 1 Diabetic anymore. The Type 2 Diabetes epidemic has reached such a level that the confusion is possible, especially in antibody negative T1Ds.

The paper exploits two "features" of SNPs
  • the ability to act as proxies for actual genes allows the authors to obtain a HLA typing without doing expensive tests.
  • the combination of multiple "risk SNPs" results in a strong global risk assessment which in the presence of diabetes confirms the Type 1 diagnostic.
 "In the presence of diabetes" is intentionally emphasized because, and this is extremely important to keep in mind, even if an extremely high risk of diabetes was computed by this method, it would not have the value of a diagnosis.

This is intuitively easy to understand: if your relative risk, computed by SNP odd-ratios, of contracting a rare disease is very high, it does not mean that your absolute risk is also very high. This is a topic that begs for a concrete example - and possibly another blog post - but it was the main reason why the FDA hit personal genomics sites very hard a couple of years ago.

The bonus

If you have made it that far, the time has come for your bonuses (yeah, you get two!)

  • my very own 23andme raw SNP data
  • and excel spreadsheet with a lot of the currently identified SNPs associated with diabetes and their OR (all those included in the Oram paper cited above and those used by the Stanford Interpretome. Please note that OR, P Values and even in some cases the risk allele are somewhat vague or uncertain.

Tuesday, March 1, 2016

Artificial Pancreas for Dummies, plus the mandatory whining.

The goal of this post is to go a bit beyond the twenty line press releases served to diabetic communities and try to explain in simple terms some of the concepts and some of the limitations of APs. Please do keep in mind the following limitations about this post itself:
  1. while I read an awful lot, I am not involved in AP research.
  2. now that the smell of dollar is on the horizon, a lot of the recent research has gone underground. There could very well be a breakthrough lurking somewhere that I am unaware of.
On the plus side, I am totally free to express my opinions, unencumbered by conflict of interest, friends in high places or political correctness. This being said, let’s start.

A simple problem

The problem an artifical pancreas tries to solve is simple: keep your blood sugar in an optimal range. It is so simple that it was solved, somewhat impractically, by Dr Kadish in 1964. This evolved into the biostator (full article)  in the seventies and in closed loop research systems such as the ones used in clinical studies or insulin characterization studies. They are impractical for normal life use – think very noisy fridge like impractical - but invaluable for research where constant levels of either glucose or insulin infusion have to be maintained. So, why has it been so hard to develop a portable AP system? After all, it seems that the machines described above have solved the issue? To understand a bit more, let’s look at what controlling a loop implies.

The PID Controller

Image result for pid controller
One way to control a system is through a so called “PID controller” where PID stands for “proportional – integral – derivative”. I imagine that some of you are already about to run away at the mere mention of integral and derivatives but bear with me for a minute: a PID controller is simply a formal way to ask three simple questions (and hope for a correct answer) in a never ending loop.
  1. where do I stand now compared to where I want to be? Try to adjust.
  2. how did my previous stage 1 corrections work? Try to adjust.
  3. where will I stand in the future if I continue to correct based on 1 and 2? Try to adjust.
An often used example is the stopping of a car at a red light. When the light turns red, you have the following parameters at your disposal: your speed, the distance from the red light and your past experience in terms of stopping a car (implicitly using the famous f=ma equation). You make an initial decision - how hard you will brake – and adjust it as you go: if you braked too hard or not enough at the start, you will release or increase the pressure on the pedal a bit. Finally, as you approach the red light, you fine tune the braking based on your current deceleration.

That’s not too hard, is it? In fact, you probably have been the PID controller of your diabetes for ages. It could also be said that, when you had a fully working pancreas, it was, in a way, a PID controller itself, constantly adjusting the level of hormones to achieve a stable level.

If we had a complete set of correct information and an immediate way of acting on the parameters, a PID controller could work very well. Robots and drones make heavy used of PID controllers (or more advanced variants). But, in diabetes management, we do not! Our BG level information is partial and somewhat inaccurate. We can’t dose Insulin in a timely and precise enough way to be the perfect PID controller.

The clinical devices described above have the advantage of near immediate delivery, directly into the bloodstream, of precise doses of glucose, insulin and glucagon. So, PID would be nice, but we can’t really use it as such in real life. What else can we do?

MPC (and variants such as Robust MPC, Constrained MPCs, etc)

MPC stands for “Model Predictive Control”. As you can see from the name, a MPC controller relies on observations in the framework of models to issue predictions and act upon them to put you in the desired range. Models exits for glucose homeostasis, insulin absorption and dynamics, and tons of other things.

Going back to our car example, we would use, as a elements of our model, the weight of the car, the power of its engine, the grip of its tires, the position of the accelerator pedal and the air resistance to predict where the car would be after 10 seconds if we accelerated on a standard straight road. Or, if we were driving at 160 km/h, we could simply calculate a new accelerator pedal position that would let us slow down to 120 km/h and then again a new position to keep that speed.

MPC models work extremely well in the industry where they deal with well defined mechanical systems, linear (directly proportional) responses and fairly simple (at least compared to the messy soup a biological system is) chemical reactions.

In the field of diabetes, many of the model parameters have been determined empirically. Their variability has been poorly characterized (in part because studies are small and expensive). They did get a lot of attention though, in part because people who have devoted their lives to the development of models are often emotionally attached to them… but the truth is that they don’t work that well in practice (the old “consider a spherical horse” story)… Why? Here are some of the reasons…
  • human variability: one could roughly say that any biological parameter varies +/- 50%. You can be 140 cm tall just as you can be 210 cm tall and be perfectly fine samples of the human race. Some people metabolize alcohol faster than others, some transport B1 into cells at a faster rate than others, as we have seen in a recent complication study. It boils down to genes, their expression, the regulation of their expression, etc… The same variability occurs at many levels in diabetes: the liver at rest and the liver during exercise are like Dr Jekyll and Mr Hyde.
  • site variability: that’s an easy one. Injections sites aren’t equal. Tubes get clogged.
  • unknowns: there is still a lot we don’t know about many of the physiological mechanisms involved and how they change in different circumstances.
  • physiological: a model can be perfectly valid in a certain range and suddenly become totally invalid in other ranges. 
  • being alive: stress, exercise, intercurrent infections can completely change the dynamics at any given time.
  • mathematical: in some cases, the math just fails.
Toying with parameters in the iHOMA2 

You have also been the MPC controller of your diabetes. Your mental model tells you that you should probably reduce your insulin dose before you exercise, that you are likely to go low after a long shopping afternoon, etc… If you have a wrong mental model, you will have poor control.  The same holds true for a MPC based AP.

Informed or uninformed.

Until now, we have talked about “ignorant” or “uninformed controllers”. If we had both exact information and immediate way to act on it, a perfect PID controller such as the pancreas would not care about being explicitly informed. The indirect information – stress hormones going up for example – is automatically perceived and acted upon.

On the other hand, even a “perfect” MPC controller, if such a thing existed, would care a lot about information. It is all very good to have a model of intestinal carbohydrates absorption but it does not help much if the algorithm does not know how many carbs you just ate. Except, of course, that the data is subject to the same uncertainties, inter-individual variable absorption, etc... The list is, of course, endless but it is obvious that even if we had a perfect MPC controller, we would have to assist… assist so much that we would become a slave to our controller… An AP should make your life simpler, not harder. There is, we are told, some good news on that side though: heart rate and skin are a good proxy for exercise and stress and our sensors are getting smaller and better. Maybe five years from now? (cough, cough)

Mixed models and other approaches

Well aware of the above shortcomings, researchers have developed hybrid models that try to combine the advantages of PID and MPC controllers, while minimizing their drawbacks.

Other approaches such as neural network based pattern recognition have been studied and may, one day (cough, cough again, five years from now?), be actually helpful. This is probably what Medtronic is trying to achieve by partnering with IBM’s Watson team. Artificial Intelligence on Big Data has one big advantage: no one really fully understands how it works… (LeCun) It is a bit like magic and, as such, the perfect destination for all our irrational hopes.

Fractal Control, subspace-based linear multi-step predictors, stochastic differential equations based models, adapted constrained weighted recursive identification methods and GPC, etc…, etc… the list of options is nearly endless. While I do understand the rationale (and sometimes the math!) behind some of  those, telling the research that can actually have an impact from the utilitarian “let’s publish something catchy” is extremely hard. You got to know where your utter incompetence begins and, at this stage, I will draw the line.

The deeper problem with models

Now that we have a rough overview of the basic ideas behind some models, let me give you some food for thought…

How have those intrinsically imperfect models been validated until very recently? Well, you guessed it: they have been mostly validated against a patient model (UVa/Padova Patient Simulator). That approach is not totally without merit but is far from perfect.

Imagine yourself in a world, constantly covered by thick clouds, that has computers but no theory of gravity. As you work to discover Newton’s Laws, another team implements a solar system model based both on the same starting set of assumptions and, as you progress, on your recent discoveries. Then, from time to time, you double check your theory against the model, exchange ideas and restart another cycle. Chances are they will agree. But what does that really mean? What will change when you rise about the clouds and discover the real solar system?

In the past, a lot of the control algorithms research went a bit like this:
  1. X develops a model of glucose absoption, insulin action, insulin-glucagon interaction… That model is partially, or even mostly, validated for a single meal in stable conditions.
  2. Y develops a control model based on X’s model.
  3. Z develops a patient model based on X’s model.
  4. Y tests its control model on Z patient model and promptly publishes an enthusiastic paper.
(fun tidbit: in the diabetes world, X, Y and Z could actually be the same person!)

Can you spot the catch? Of course, there’s no intention to mislead and researchers are fully aware of the problem. That is why they are constantly fine tuning and updating both models. Until recently, they had no choice. It was practically and ethically impossible to run tests on real patients. That model on top of a  model research did yield interesting results which were widely echoed. But limitations rarely make it in enthusiastic press releases whose cumulative impact has now set AP expectations too high.

Lastly, the idea that an ideal model can be approached and that all patients will somehow find themselves in some gaussian cloud around that ideal line at all times is, in itself, probably deeply flawed (possible material for another blog post).

Hey, whiner, were do you stand? Complexity or simplicity?

At this point, you may think that I am just a random guy being grumpy and overly negative about the AP control algorithms… Well, there is some truth to that. So what do the other say?

Well, the published results so far have been nicely summarized in the recent coverage of the annual JDRF report.

"At ATTD, Dr. Buckingham also shared the first-ever insulin-only data on the Bionic Pancreas. The headline? The insulin-only system showed roughly similar efficacy in pilot studies to other published systems: an average glucose of ~154-161 mg/dl (depending on the target glucose), with just ~1-3% of the time spent <70 mg/dl."

Depending on your mood, the glass is either half full, or half empty… On one hand, it seems the AP is finally coming. On the other hand, it is a bit disappointing to see that, after so many years and so many nice looking model on model results, we are still seeing real life results that are above the recommended guidelines. Yes, things will improve. They always do. CGMs could be better than they are today if we did not have to suffer the consequences of patent wars. Patience. Let’s wait another five years (cough, cough).

But beyond the disappointing results, the fact that the bionic pancreas system showed roughly similar efficacy to other single hormone AP systems is a bit worrying. It seems to indicate that, despite using different approaches and controller mixes, there is some kind of fundamental block in the curren strategy. Possibly the block evoked in the previous paragraph. Possibly something less fundamental that some algorithm I do not understand will overcome.

Am I overly pessimistic? At least, I am not alone.

As H. Kirchsteiger summarized in the recent “Prediction Methods for Blood Glucose Concentration” (note: some of the book chapters are way too hard for me to understand fully)

“Unfortunately, in spite of 40 years of research, the results of this “artificial” or “virtual pancreas” are still not there where they should be and simple safety rules –e.g., avoiding insulin infusion during or near to hypoglycemia- seem to be able to offer the largest parts of the benefits of closed loop control in a much simpler way as well”

That is, in a way, a confession. The kind of confession you only can make behind semi closed doors, to an audience that knows that you are correct. Maybe a low glucose threshold suspend is all we need after all. Maybe the comparatively unsophisticated DIYPS will match the fancy research results.

One other aspect I’d like to address is the issue and perception of risk. It can be summarized, in the average patient opinion, by this statement.

“My CGM was off by 80 points this morning! How could I ever trust an AP relying on crappy data”.

Rest assured that CGM off by 80 points do piss AP researchers tremendously as well. However, those guys are not stupid. They don’t treat CGM values as  gospel. Fighting on their side are many mathematical tools that can give them a good idea of the trustworthiness of what the CGM reports. Is the signal of the sensor noisy? Does what the sensor reports diverge too much from what the model (!) or the recent past situation indicates? Are we in presence of a known issue such as a severe compression? Is what we are seeing even remotely physiologically possible? Etc…
There’s also the notion of the potential “cost” of a decision. Not the financial cost – although it could be if their company is sued in a wrongful death. Not, strictly speaking, “cost” as it is defined in the theory of control system (although it is related), but the potential clinical cost or risk of a wrong decision.

How costly is a decision? Failing to predict that your BG will reach 300 and “only” correcting for 200 carries a small health cost. Failing to predict that an 80 is heading to 56 potentially has a huge cost. The risk associated with a decision has a tremendous impact on whether that decision is taken or not taken. That extreme risk aversion explains why all current clinical tests seem to converge around the same values. It is also the demonstration that the current approaches are nowhere where we hoped they would be in 2016.

In fact, I am willing to bet that, if we had held a referendum asking people what their “safest” value is, regardless of other concerns, it would have been in the 140 to 160 range.
In short, there are real risks: an AP can go on the wrong path. Just like a pump can clog. But rest assured that most, if not all teams, have the simple cases covered and a wide safety margin.

Final Words
I fell obliged to mention that I am aware that some of the artificial pancreas teams have made public claims of better results (but not published results or launched clinical trials). While I do not doubt the integrity of people who made those claims, their sample population is obviously extremely biased: people who have access to either the open source artificial pancreas or pre clinical trial models are, without any doubt, at the very top of the pyramid of diabetes management abilities. It remains to be seen if those mouth watering claims can be replicated in an average population.