Here’s something I wrote a year or so ago with a couple minor edits.
So simply by breaking the 232 treatment group off from the 331 total group, the censors to the left of the median no longer contributed to the benefit of the 232 group. So any benefit that provided has already been subtracted from that group and is reflected in the JAMA numbers.
The 99 group we don’t know, but if you subtract the four or five censors (drop outs) from that group (which are in the noncrossovers) because you learned the date of their likely early death and thus replaced censored data with real data, you might get a statistically significant lower median for that group of 99. Not because the four or five were not part of the group, but because their data removed (subtracted) four or five censors and replaced them with four or five real data points. (If those dates were found)
Otherwise, maintaining censors to the left of the median typically helps the control group, because they are treated mathematically as if they have the same chance to live as long as the other patients that made it that far. A far more generous outcome than early sensors on this trial probably had — as I’ve explained in other posts.
The result is likely that the (2022) 331 median OS decreased, the 232 median decreased* (*we already know this), but the 99 median likely decrease the most of all compared to the 331 median of 23.1 months from surgery provided at the (2018) SNO conference. (Assuming four or five early ltfu data was followed up on and replaced the censored data, and assuming those deaths were earlier than 22.4 months from surgery, which is likely, imo)
The reason one would not go with that data as a primary endpoint, even if it turned out to be significant for the treatment group, is that it doesn’t account for crossover impact, and would be unlikely to fairly value the therapy when looked at with barometers like QALY. Therefore, you still need the ECA data and the mature data that was developed.