Let's examine the scenario where one keeps on getting a practical in-game probability Y (from rolling a d20 die "n" times) which appears to consistently deviate from a presumed theoretical probability "p".
We will assume the underlying distribution for rolling a die "n" times is binomial with a theoretical probability p. For n large enough (ie. np(1-p)>10), the normal approximation for the binomial distribution can be used.
The test statistic "z" used to test whether the underlying distribution indeed has a theoretical probability of p, will follow a standard normal distribution with:
z = (Y-p)/sqrt[p(1-p)/n]
For a 95% certainty that the theoretical probability is something other than p, one requires |z| > 1.96 (from a table of areas under the normal distribution). Similarly for a 99% certainty, |z| > 2.575.
For some concrete numbers, we'll look at the case where the presumed theoretical probability is 50% (p = 0.5). From this, we can calculate what the minimum number "n" of d20 die rolls it takes to determine with 95% certainty that the underlying theoretical probability is not 50% (p != 0.5).
Doing some algebra, we get for "n":
n > [1.96/(Y-p)]^2 *[p(1-p)]
for 95% certainty that the theoretical probability is not p. (Similarly for the case of 99% certainty that the theoretical probability is not p, we get n > [2.575/(Y-p)]^2 *[p(1-p)] ).
For the case where |Y-p| = 0.05 which corresponds to a constant +1 bonus or -1 penalty to a d20 roll, one needs to do n > 384 rolls of a d20 in order to determine with 95% certainty that the underlying theoretical probability is not 50% (ie. p != 0.5).
Similarly for different |Y-p| values:
|Y-p| = 0.05 --> n > 384
|Y-p| = 0.10 --> n > 96
|Y-p| = 0.15 --> n > 43
(For |Y-p| = 0.20, the normal approximation to binomial is no longer valid).
For the case where one wants 99% certainty that the underlying theoretical probability is not 50%, we get for different |Y-p| values:
|Y-p| = 0.05 --> n > 663
|Y-p| = 0.10 --> n > 166
|Y-p| = 0.15 --> n > 74
|Y-p| = 0.20 --> n > 41
(For |Y-p| = 0.25, the normal approximation to binomial is no longer valid).
These results suggest that in a generic four-five hour DnD session, there may not be enough d20 die rolls (of a few dozen) with bonuses/penalties of 1 or 2, to determine whether the underlying theoretical probability "p" is not 50% (p != 0.5) with a 95% (or 99%) certainty. One needs more than a hundred or so d20 die rolls to make this determination.
When one is dealing with bonuses/penalties of 4 (or greater), there may be enough d20 die rolls (of a few dozen) in a generic four-five hour DnD game session, such that one can determine with 95% (or 99%) certainty that the underlying theoretical probability "p" is not 50% (p != 0.5).
Bonuses/penalties of 3, are the borderline cases where there may be enough d20 die rolls to determine if the underlying theoretical probability "p" is not 50% (p != 0.5) with 95% or 99% certainty. (One requires at least 43 d20 die rolls to determine this with 95% certainty).
A +5 or +6 magic weapon being used at heroic tier in 4E DnD, or the players are subjected to an "aura" which causes a -4 or -5 penalty to hit, would be noticeable and conclude with 95% (or better) certainty in a four/five hour DnD session that the underlying theoretical probability "p" is not 50% (p != 0.5).
Possibly this explains why so many bonuses/penalties in 4E DnD are +/-1 or +/-2 to the d20 roll, along with all kinds of kludges to prevent the stacking of too many bonuses/penalties on top of one another. Essentially they're attempting to maintain an illusion of "always fighting orcs" with the underlying theoretical probability "p" appearing to be 50% (p=0.5) over a four-five hour DnD game session. (One would need over a hundred or so d20 die rolls with bonuses/penalties of 2, in order to determine with a 95% or better certainty, that the underlying theoretical probability "p" is not 50%).