Good morning ZZ, sorry for the late response, been very busy.
The reason subgroup analysis is "hypothesis generating" is because of "multiplicity".
If an intervention had no effect whatsoever (RRR=0%) and you looked at 10 subgroups, the chances of finding a significant result, by chance alone, is 40%. And that chance approaches 100% if you allow yourself to look at an infinite number of subgroups.
The "multiplicity" issue is obviously important if you want to discover the truth about an intervention. Not only is it the reason that post-hoc subgroups are "exploratory", it is also the reason why interim analysis are limited and pre-specified (the more interim analysis you perofrm the greater the chance you will observe a random high and stop in error. Note this error was controlled in R-IT by allocating some of the final p-value to each of the two interims).
There is nothing wrong with doing subgroup analysis, even un-pre-specified subgroup analysis. The issue is over-interpreting the results. Accepting a false positive as the "truth" is obviously a bad outcome.
Two quotes come to mind (and the first I've seen used in several FDA stat reviews)
"The first principle is that you must not fool yourself – and you are the easiest person to fool." - Richard Feynman
"What gets us into trouble is not what we don't know. It's what we know for sure that just ain't so." - Mark Twain
The very best discussion I've heard about this is offered by Tom Fleming. The link below is to a day long lecture on biostats and clinical trials. The relevant part to our discussion starts at minute 7 where he discusses the maternity ward example.
I'm a big "fan" of Dr. Fleming; not only is he a world renowned biostatician, he is also an educator. He explains this very complex stuff in an understandable way.