If I were a reviewer at the FDA or another regulatory body, I would tell any company that comes to me with a raw p-value of 0.33 and a Cox-adjusted p-value of 0.02 to redo the trial and come back with clean data.
You should work for Pazdur. You are missing the point that the WHOLE purpose of Cox Regression is to clean up the data. So in a sense the more unclean the data, the more that should be expected from a Cox Regression. (e.g. a linear curve fitting routine doesn't do much when the raw data is already linear. Where it is really useful is when the data is very unclean.)
Put more analytically and less philosophically, if you were correct that really big Cox Regression changes were less reliable than the comparatively small ones then I would expect that in simulations (where Truth is known) a histogram of really big corrections would be further from the Truth on average than small corrections. I'll bet good money that isn't true.