Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Turing and Shkreli - at the risk of being somewhat unpopular ... while I understand the national reaction to the Turing price raise I find a lot of the reaction in biotech and biotech investor land a little odd given the recent behavior of pretty much all of biotech and Big Pharma. There is a lot of rent seeking going on - from companies just a little more subtle than Turing (e.g. JAZZ) to routine large yearly price raises on block busters that came out 20 years ago, to patent extension attempts that are, at best, cynical.
And, FWIW, in light of my views I somewhat cynically choose to make my own money off the biotech sector (because of the above) but hope that the rules change that would allow more competition and thus deflate my own money maker. Shkreli was just doing it bigger and with more tone deafness than most of us?
Setting the p value discussion on the right track (but note that there are other factors important to patients that affect N):
Recently it has become common to make the following logical argument:
a) the threshold at which a p-value is called stat sig is arbitrary (and was picked by Fisher)
b) Therefore we can change what we call 'stat sig' if there is need.
Although that is what Fisher (the father of statistics but also a renowned god-complex individual) would probably like us to believe it is mathematically incorrect - because in reality the aggregate p value is driven by the base rate of false hypotheses (in the case of biotech that is the number of INDs for which the original drug/protocol did not work (generally meaning Adverse Events worse than benefit))
See here for a good, first order, numerical example. But note that he assumes only 90% Base Rate of bad drug/protocol at IND. The reality is that it is probably closer to 99% either fail or have substantive changes in their protocol after IND in order to provide measurable overall benefit.
So lets run some numbers:
a) Assume, generously, 90% of drugs in their original IND protocol are actually meaningfully bad for QOL, and another 9% are close to 'neutral'. And 1% are good in their original protocol.
b) If approval is one trial p<0.05 - then when you take your next drug you should expect 2.5x greater chance of harm than of benefit.
c) If approval is one trial p<0.15 - then when you take your next drug you should expect 8x greater chance of harm than of benefit.
d) If approval is per FDA guidelines (2 trials p<0.05) - then when you take your next drug you should expect 14x greater chance of benefit than harm.
I.e. the knee in the curve is probably somewhere between 1 and 2 stat sig (0.05) trials.
So with this base rate approving drugs on lesser evidence because some patient set is in more need is very likely to cause those "in need" patients additional harm.
The only way around this is if there are some classes of drugs for which the base rate of worthlessness/harm is a lot lower. Clearly not all diseases/drug-classes are the same (e.g. psychiatry is clearly worse, cancer better - Nature Biotech did a survey a while ago in this although I no longer remember the details).
Lets assume that a particular disease/drug class has only a 50% rate of harm, 30% neutral, and 20% beneficial (other than me-toos I can't think of any drug/diseases that would be better than this).
1) If approval is 1 ITT trial p<0.05 on prespec primary endpoint - then when you take a drug of this class you should expect 13x greater chance of benefit than harm.
2) If approval is 1 ITT trial of p<0.15 on prespec primary endpoint - then when you take a drug of this class you should expect 4x better chance of benefit than harm.
3) If approval is 1 ITT trial of p<0.3 on prespec primary endpoint - then when you take a drug of this class you should expect 50:50 chance of benefit vs harm.
And note that there probably are diseases/drug-class pairs where the base rate approximates this. E.g. monogenic diseases where the disease protein is fairly static and the drug makes a meaningful change in protein levels.
But, finally, as I noted in the beginning - there are other factors besides p value that influence the necessary trial size. In particular small trials cannot detect uncommon but extremely severe adverse events, and because small n = huge uncertainty it is, essentially, impossible to figure out which drug is meaningfully best.
AKBA
AKBA -
FDA and trials (not strictly to deal with recent changes in the FDA) - too lenient or too stringent?
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2641547
This is a yet another factor bearing on the topic. One that has had several recent mainstream articles written about it and seen some biotechie twitter exchanges about it. Note that what I find particularly interesting about the whole stream has been that the first article I saw on it was someone saying that this proved what they have always been saying - that the FDA is too stringent - despite the fact that that is not, in fact, what the article says. But it provided a good Rorschatz Test.
BTW - my opinion on the article is that it is the right kind of approach, but that it needs to be replicated, and it probably needs to consider a few other factors. But I suspect it probably fairly deals with a variety of issues not generally acknowledged - e.g. the fact that early (risky) approval of a few bad drugs could quickly outweigh the benefit of early (risky) approval of a few good drugs (partly because it takes so long to get the drugs proven bad and then removed).
FDA – Easier or not, second perspective on efficacy criteria (2010 vs 2014)
Another way to look at the data is to look at the tails of the curves (in this case efficacy in 2010 vs 2014). Looking at the tails exaggerates the differences.
So:
Step 1: divide up every 2010 and 2014 NME approval into 3 different strengths of statistical proof of efficacy:
1) Good Statistical Proof of Efficacy per FDA Historical Standard - Proof of efficacy that the FDA has always considered good: a) very strongly stat sig in prespecified primary endpoint of one trial (p<0.001), or b) strongly stat sig (better than 0.01) in the pre-specified primary endpoint of an RCT and have supportive RCTs, or c) are strongly stat sig in a single arm against a prespecified threshold, or d) are stat sig vs a prespecified endpoint in 2 RCTs, e) in an ultra orphan (often taken as less than ~10,000 patients) they are stat sig in the prespecified primary endpoint in one RCT.
3) The What The Heck Category – complete statistical/protocol mess. No primary endpoint (believe it or not this happens), primary endpoint of no known clinical benefit (and with, at best, weak clinical efficacy measurements in trials), single arm trials only with no reliable threshold (or historical data), …
2) Moderate Statistical Evidence of Efficacy - Everything in between 1 and 3. Note that there is still some subjectivity in the above – but to repeat what I have said previously in this thread, my goal is not high precision. Instead my goal is to detect any large change in FDA behavior and at least partially characterize any change – and high precision shouldn’t be necessary to meet these goals.
Results:
2010 – of ~37 NME NDAs the FDA approved 2 Moderate Evidence drugs, and 2 What the Hecks (and one of those was in very ultra orphan (based solely on 24 case studies))
2014 – of 47 NME NDAs the FDA approved 7 Moderate Evidence drugs and 3 What the Hecks (and one of those was in very ultra orphan (with very poor protocol trials))
Commentary: It is in the category of “Moderate Evidence” that extra approvals happened in 2014 vs 2010. There are, of course, two possible explanations: a) The FDA has moved the bar, or, b) the companies are submitting fewer What the Hecks, and more Moderate Evidence NDAs. So one more comment: of the 7 Moderate Evidence NMEs approved in 2014 three were Oncology drugs (e.g. Keytruda) approved with very high ORRs or low HRs in PFS but without strong pre-specifications (in 2010 there was only one). Given the huge advances in Oncology over the last 5 years I think it is safe to say that the oncology piece is #b (the companies are submitting more Moderate Evidence because the drugs are so clinically effective)
Conclusion based upon this look – the FDA does seem to have become somewhat less stringent about strict statistical rules for efficacy, but for many of the 'extra' drugs there is no appreciable risk that they have little efficacy. I.e. at some point, with enough efficacy (e.g. HR<0.2 and p<0.00001 vs a generally accepted clinical endpoint like PFS for AA), it is a virtual certainty the drug is useful even without a good pre-specified protocol. In fact a reasonable portion of the ‘extra’ approvals looks like Break Thru Designation going all the way to approval. But still, in aggregate, there are still a somewhat higher percentage of moderately risky approvals (e.g. Neuraceq for imaging Alzheimer plaques – where two of the five readers couldn’t identify the required artifacts).
FDA – Becoming Rubber Stamps or not?
Matt Herper wrote an interesting article highlighting the difference in the percentage of NME NDA/BLAs approved in 2014 (and, so far, in 2015) vs earlier years. And there has been a fair amount of subsequent debate as to how much, if any of that, is the FDA getting easier. For example he wrote a followup piece here about other factors that could improve the percentage approval without the FDA becoming easier. But nonetheless I have yet to see substantive, fact-based, discussion on this topic – have they become easier or are other factors at play? No doubt part of the reason for the lack of a fact based debate is that many of the facts are hidden (e.g. not all CRLs are announced, and some don’t have AdComms either), and even with facts much of the interpretation is nuance. Nonetheless it should still be possible to suss out major changes in the FDA’s approval strictness.
Over the next few weeks, as I have time, I’ll write up some comparisons of 2010 (a year with a very low rate of approvals per Herper’s article) vs 2014. Note that I will generally be dividing up the analysis into
a) Proof of clinical efficacy – e.g. post hoc subgrouping, # stat sig trials, strength of thresholds vs historical comparisons for single arm trials, strength of surrogates, … .
b) Proof of do-no-harm – e.g. size of the placebo population (because the only way to detect many side effects is RCTs), or, when comparing to historicals for a known issues, a size of the treated population.
Ideally I would actually evaluate the ratio of the above two items for each drug since, in theory, that is what the FDA should be evaluating. But that is particularly difficult and thus I will generally keep the evaluations separate.
Core Data - Herper’s article (stats from BioMedTracker) says:
88% NME approval in 2014. FDA says 41 NMEs approved. So about 47 total NME NDAs and 6 CRL’d
57% NME approval in 2010. FDA says 21 NMEs approved. So about 37 total NME NDAs and 16 CRL’d
My first metric of interest with regard to the 2010 vs 2014 comparison is that although 2010 had a lot of CRLs many of those rejections were easily fixed. I.e. of 9 2010 NME Adcomms that resulted in a 2010 CRL there were 4 that were subsequently approved in 2011 (i.e. with no new trials, only some additional analysis or additional data from existing trials) – a 44% easy fix ratio. If in 2010 the FDA had just accepted those drugs their rate of approval would have been 76%. Probably within the noise level of the 2014 rate of approval. The point here is that the FDA doesn’t have to utterly abandon their obligations to get the rate of approval up substantially.
Note: no doubt BioMedTracker isn’t using the same counts of NMEs that the FDA is using. And thus the math doesn’t work perfectly. But, again, I am not hoping to be precise here since even with precise numbers the reasons for acceptance or rejection are fuzzy anyway.
Interesting note: Of the 2010 NMEs with AdComms that got 2010 CRLs (9 of them) there are only 2 that haven’t yet been approved.
TRVN
TRVN
TRVN
TRVN - results commentary.
I said in #msg-108701979 that they were probably trying to thread the needle with this trial that just finished. I.e. By allowing the patients to control their own dosing they hoped to get both better pain relief and better side effect profile than morphine (in the first randomized trial, with fixed doses, they definitely had much better pain relief, but worse average nausea).
But it looks like that although they even adjusted the dose at the interim for the trial they were unable to accomplish this. Instead they just traded their better pain relief for better side effect profile. Not obvious that that is a huge seller? ?
FGEN
To drag out a nit (consistent with my calling it a pet peeve -g-):
FGEN
FGEN
21st Century Cures Act and MOA
Part of the 21st Century Cures Act is supposed to be about using more MOA data as the basis for approval. There have even been some scientists saying this would be a good idea. Recently I tweeted that clearly it was easy since there are lot of biotech investor billionaires who can predict trial success from MOA (sarcasm), but Derek Lowe has done a much more in depth expose pointing out that even "well understood" items really are NOT well understood (e.g. T2D).
http://blogs.sciencemag.org/pipeline/archives/2015/08/07/on-the-fda-drug-approvals-and-not-knowing-enough
For the record I've said forever that it is absolutely clear that predicting efficacy based upon a belief that you understand the pathways etc is almost always fairly unreliable. Particularly so in complex interlinked control loops.
SRPT vs BMRN ph 1/2 comparisons:
Because I know you'll be interested, part of the comparison I talked about earlier between Etep and Dris:
Key protocol and baseline data:
Age range: Dris: 9.5 +/1.7, Etep ITT 8: 9.4 +/- 1.0 (commentary is that this is roughly a wash since Dris had one 5 year old, but also 2 11 year olds.)
Starting 6MWD: Dris: 383m +/- 113m, Etep ITT 8: 371m with range somewhere within 261 to 456m. (commentary is that again this is probably roughly a wash since Dris had 2 start over 500, but also had multiple under 300, including a 260 and a 230m.)
Caveat: Sarepta's numbers do not all self-agree. Some of the earlier presentations had somewhat different ages etc. This, like the same image used with 2 different labels in one of their papers, indicates that, at best, Sarepta is sloppy. So take the Sarepta numbers with that in mind.
Compare the 'mITT' (removing non-ambs in first year) groups (to make the Sarepta fans happy I've used their preferred analysis first)
Etep: -76m
Dris: +32m
Compare the ITT groups (the 8 continuously treated Etep vs the 10 Dris).
Etep: -132 meters.
Dris: -25 meters. (They had this despite 2 kids becoming non-amb - i.e. the mean of the other 8 actually improved)
Commentary: that is a HUGE difference.
Compare Dris' ITT vs Etep's mITT - a huge huge advantage to Etep because I am comparing Etep's best to Dris' worst:
Etep: -76 meters
Dris: -25 meters
Commentary: I've stacked everything I know of against Dris in this case and it still comes out ahead. Proof? Absolutely not. But probably stronger than the 'proofness' of Sarepta's data against historicals. Enough proof by itself that Dris is a better drug that I'd gamble on it. Again, no. Just as I think the Etep data isn't good enough. But with other data, maybe.
BMRN -
SRPT -
SRPT
SRPT (and BMRN) -
MGNX - Notes (with comments) from macrogenics 8/5/15 end of quarter cc.
Margituxumab ph3 (SOPHIA) - first patient now enrolled. 3rd line. Patients will have already progressed on her2 therapy. Trying to start a ph1/2 study in gastric later this year. Would be a combination study.
MGA271 - expect to start enrolling the PD1 combo in Q2. Plan to present in Q4 the monotherapy cohorts - although the later cohorts are still enrolling so data will not be complete. One question (from Philomena Cavia) about MOA. Response was that they don't know right now what MOA is the primary benefit in clinic. His hope is that as they gather tissue samples they will be able to pin it down. (Comments: I like this answer much better than his previous assertions that it is a checkpoint inhibitor in direct conflict with his own mouse data - which shows absolutely unambiguously that without ADCC engagement their own molecule has no anti tumor effect. It might be different in humans - but you need data to prove that. Not just assert. That said, at this stage i am mot sure it really matters as long as it is synergistic with checkpoints - which is already high probability just due to potent ADCC enhancement. Note: my operating hypothesis (mostly based upon preclinical) is that, effectively, the tissue selective ADCC mab focuses the action when the checkpoint removes the brakes.)
MGD006 is enrolling patients in dose escalation. This is their only DART without a Fc component and thus a short half life. They had to expand beyond 1 site because of slow enrollment. They further expect to expand to sites in Europe in 2016, with additional cancer types. They do have a Fc version of this molecule, but they wanted their first DART in the clinic to have short half life for safety reasons.
MGD007 (gpa33 vs cd3) is enrolling in dose escalation for colorectal.
MGD011 (CD19 vs CD3). Ph1 is enrolling various lymphomas etc in ph1.
MGD009 will be in the clinic by year end. But still haven't disclosed molecular targets.
New monies:. They are using the new monies they got in the raise for a variety of things, but two are enlarged manufacturing facility and their new initiative on checkpoints (my comment is I sorely hope they can avoid being yet more "me too's". I'd personally suggest it would be smarter to do more ADCC to synergize with everyone else's checkpoints.)
Random note: They said they are not at liberty to discuss the progress in many of the collaborations.
SRPT
SRPT