Sunday, April 24, 2016 11:08:27 PM
What to expect at midterm review
I recently had a chance to work with the original clinical design document of the Neuvax Phase 3 PRESENT trial (dated 2011). Some of this information may be outdated, but it can serve as a baseline for our expectations at this juncture in the clinical process. There are some indications that there have been amendments, so keep that in mind.
Futility Test
A hazard ratio of 0.9 (vaccine arm recurrences divided by control arm recurrences) means an efficacy threshold of 10%. This means that Neuvax clears the simple test for futility with a 33:37 split of recurrences between the vaccine and control arms.
This original part of the design was probably amended, if we are to take CMO Bijan Nejadnik's statements in the most recent conference call. Now, the bar seems to be set at a conditional power of 0.15, meaning that the trial should have a 15% chance of reaching its stated endpoint goal (38% recurrence reduction at 141 total recurrences)
It is still a pretty low threshold. It might as well be a rubber stamp. There should be no chance of a stoppage for futility at this stage if Neuvax even works a tiny little bit.
Efficacy Stoppage Rules
Most of you may not be familiar with the term "error spending," but the concept should be fairly easy to grasp. Type I error, or alpha, is the chance of mistakenly rejecting the null hypothesis of a study - in other words, it is what is seen commonly as the target "p-value" of a study, indicating its statistical significance. Almost always, this is set at p < 0.05.
In a typical study designed for interim efficacy endpoints, certain amount of this final error threshold is "spent" at each interim review. The total amount of error in the end should add up to the target value (almost always 0.05). To give you an example of how this works, this is a hypothetical study with 2 interim reviews at the 1/3 and 2/3 completion time points. The spending function is the standard O'Brien-Fleming, used in the vast majority of trials that plan for interim efficacy endpoints.
Completion%.......p-value for halt........error spent (total=0.05)
00.........................0.0000.....................---------
33.........................0.0002.....................0.0002
67.........................0.0121.....................0.0119
100.......................0.05.........................0.0379
The issue with the Neuvax Phase 3 PRESENT's design is that it does not seem to be designed with early efficacy stopping in mind. The total amount of error allocated to interim reviews, instead of allocating the full 0.05 of final Type I error, is a mere 0.001 ("this alpha spending function will collectively spend 0.1% of alpha on interim analyses"). However, there are multiple interim looks planned, meaning that all of that 0.001 will not be allocated at first.
In fact, the trial design calls for spending "0.01% of the type I error at each safety and efficacy analysis." This makes the p-value target 0.0001 for the first review, 0.0002 for the second, 0.0003 for the third, and so forth until we get to 0.001 for the 10th review. The trial design calls for 10 years of data collection with "up to 20" interim assessments.
A p-value of under 0.0001 will be very, very difficult to attain with the 70 events at the first interim look, but what takes this from the unlikely to the nigh-impossible is the fact that for some reason interim efficacy stoppage is established by OS (overall survival), not DFS (disease free survival). Overall survival is defined simply as the number of people left alive at a given time point, while disease-free survival requires people to be living and disease-free. This means that the people who have had recurrences but are still living get to be counted under "overall survival." In these kinds of adjuvant studies (studies to prevent cancer recurrence after successful primary treatment), separation of DFS curves can happen relatively quickly, but it won't be many years until OS curves show significant divergence. Median overall survival time after breast cancer recurrence, according to data from the previous decade, is in excess of 4 years.
Comparison of DFS and OS plots from a trastuzumab trial.
There is zero divergence in the OS curves at the 2 year mark.
Designing interim endpoints with completely different criteria compared to primary endpoint criteria is a major pitfall of study design, especially when the interim evaluation is tied to data that takes so much longer to properly accrue. This is a study that demonstrates this exactly.
The primary metrics were response rate and progression-free survival (metastatic setting), but because interim efficacy stoppage was only concerned with overall survival, the study failed to receive an interim halt for efficacy. The eventual data ended up very, very strong (p < 0.0001), but because they used the criteria of overall survival instead of progression-free survival, the experimental and control groups did not show adequate divergence at the interim look.
To bring this back to Neuvax, it is objectively the case that there is a clause for possible interim efficacy stopping. However, it seems that this is a possibility that Galena never really expected or planned for. The reasons are thus:
1) Instead of allocating the full final Type I error of 0.05, they chose to allocate only 0.001, split into 0.0001 increments. A p-value target of 0.0001 with an interim data pool of 70 events is a tall order.
2) In a trial whose primary endpoint measures DFS, interim efficacy stoppage is tied to OS, a metric that by definition takes much longer to establish statistical confidence. In an adjuvant trial where every patient starts "disease free" so as to measure reductions in recurrence rates, it is very likely that there will be little to no divergence in overall survival outcomes in the 1-2 year time frame.
In conclusion, unless people in the control arm start literally dying at historically unprecedented rates, we should not expect even a reasonable possibility of an early efficacy stoppage. This conclusion is, however, contingent on the observation of the criteria laid out in the original clinical design from 2011. There may have been amendments since then, though per Galena's company line, this interim review is for "safety and futility" only.
Expect a near-certain "go to completion."
I recently had a chance to work with the original clinical design document of the Neuvax Phase 3 PRESENT trial (dated 2011). Some of this information may be outdated, but it can serve as a baseline for our expectations at this juncture in the clinical process. There are some indications that there have been amendments, so keep that in mind.
Futility Test
A hazard ratio of 0.9 (vaccine arm recurrences divided by control arm recurrences) means an efficacy threshold of 10%. This means that Neuvax clears the simple test for futility with a 33:37 split of recurrences between the vaccine and control arms.
This original part of the design was probably amended, if we are to take CMO Bijan Nejadnik's statements in the most recent conference call. Now, the bar seems to be set at a conditional power of 0.15, meaning that the trial should have a 15% chance of reaching its stated endpoint goal (38% recurrence reduction at 141 total recurrences)
It is still a pretty low threshold. It might as well be a rubber stamp. There should be no chance of a stoppage for futility at this stage if Neuvax even works a tiny little bit.
Efficacy Stoppage Rules
Most of you may not be familiar with the term "error spending," but the concept should be fairly easy to grasp. Type I error, or alpha, is the chance of mistakenly rejecting the null hypothesis of a study - in other words, it is what is seen commonly as the target "p-value" of a study, indicating its statistical significance. Almost always, this is set at p < 0.05.
In a typical study designed for interim efficacy endpoints, certain amount of this final error threshold is "spent" at each interim review. The total amount of error in the end should add up to the target value (almost always 0.05). To give you an example of how this works, this is a hypothetical study with 2 interim reviews at the 1/3 and 2/3 completion time points. The spending function is the standard O'Brien-Fleming, used in the vast majority of trials that plan for interim efficacy endpoints.
Completion%.......p-value for halt........error spent (total=0.05)
00.........................0.0000.....................---------
33.........................0.0002.....................0.0002
67.........................0.0121.....................0.0119
100.......................0.05.........................0.0379
The issue with the Neuvax Phase 3 PRESENT's design is that it does not seem to be designed with early efficacy stopping in mind. The total amount of error allocated to interim reviews, instead of allocating the full 0.05 of final Type I error, is a mere 0.001 ("this alpha spending function will collectively spend 0.1% of alpha on interim analyses"). However, there are multiple interim looks planned, meaning that all of that 0.001 will not be allocated at first.
In fact, the trial design calls for spending "0.01% of the type I error at each safety and efficacy analysis." This makes the p-value target 0.0001 for the first review, 0.0002 for the second, 0.0003 for the third, and so forth until we get to 0.001 for the 10th review. The trial design calls for 10 years of data collection with "up to 20" interim assessments.
A p-value of under 0.0001 will be very, very difficult to attain with the 70 events at the first interim look, but what takes this from the unlikely to the nigh-impossible is the fact that for some reason interim efficacy stoppage is established by OS (overall survival), not DFS (disease free survival). Overall survival is defined simply as the number of people left alive at a given time point, while disease-free survival requires people to be living and disease-free. This means that the people who have had recurrences but are still living get to be counted under "overall survival." In these kinds of adjuvant studies (studies to prevent cancer recurrence after successful primary treatment), separation of DFS curves can happen relatively quickly, but it won't be many years until OS curves show significant divergence. Median overall survival time after breast cancer recurrence, according to data from the previous decade, is in excess of 4 years.
Comparison of DFS and OS plots from a trastuzumab trial.
There is zero divergence in the OS curves at the 2 year mark.
Designing interim endpoints with completely different criteria compared to primary endpoint criteria is a major pitfall of study design, especially when the interim evaluation is tied to data that takes so much longer to properly accrue. This is a study that demonstrates this exactly.
The primary metrics were response rate and progression-free survival (metastatic setting), but because interim efficacy stoppage was only concerned with overall survival, the study failed to receive an interim halt for efficacy. The eventual data ended up very, very strong (p < 0.0001), but because they used the criteria of overall survival instead of progression-free survival, the experimental and control groups did not show adequate divergence at the interim look.
To bring this back to Neuvax, it is objectively the case that there is a clause for possible interim efficacy stopping. However, it seems that this is a possibility that Galena never really expected or planned for. The reasons are thus:
1) Instead of allocating the full final Type I error of 0.05, they chose to allocate only 0.001, split into 0.0001 increments. A p-value target of 0.0001 with an interim data pool of 70 events is a tall order.
2) In a trial whose primary endpoint measures DFS, interim efficacy stoppage is tied to OS, a metric that by definition takes much longer to establish statistical confidence. In an adjuvant trial where every patient starts "disease free" so as to measure reductions in recurrence rates, it is very likely that there will be little to no divergence in overall survival outcomes in the 1-2 year time frame.
In conclusion, unless people in the control arm start literally dying at historically unprecedented rates, we should not expect even a reasonable possibility of an early efficacy stoppage. This conclusion is, however, contingent on the observation of the criteria laid out in the original clinical design from 2011. There may have been amendments since then, though per Galena's company line, this interim review is for "safety and futility" only.
Expect a near-certain "go to completion."
Join the InvestorsHub Community
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.