As one may (or may not) have noticed, rolling a die multiple times does not always follow the theoretical probabilities that closely. For example, it's possible to roll a lot of low numbers on a d20 in a streak. It's also possible to roll a lot of high numbers on a d20 in a row (ie. a hot hand).
So how does the relative frequency of successful die rolls, relate to the theoretical probability for rolling a success?
It turns out, the act of rolling a die multiple times follows a
binomial distribution.
To introduce some nomenclature, let's use:
X = number of successes
n = number of trials
p = theoretical probability of success
1-p = theoretical probability of failure
In the case of rolling a die "n" number of times, one will have "X" number of success. (For example, rolling a d20).
The theoretical probability "p" is what one would expect, from examining the DC's for a success. For example, rolling an 11 or over on a d20 is a 50% theoretical probability.
For a binomial distribution, the average value is m = np and the standard deviation is stdev = sqrt[np(1-p)]. For a large enough number of trials "n", one can
approximate the binomial distribution with a normal distribution with average value m = np and standard deviation stdev = sqrt[np(1-p)].
(From a probability textbook, such as Sheldon Ross' textbooks on probability, the criteria for the normal approximation being reliable is np(1-p) > 10).
So for rolling a die "n" times and getting "X" successes (where "n" is large), it will follow a
standard normal distribution (ie. with mean 0 and variance 1), with:
Z = (X - np)/sqrt[np(1-p)]
Doing some algebra, Z = [(X/n) - p]/sqrt[p(1-p)/n]. In this form, let Y = X/n represents the relative frequency of successes of rolling a die. This would be an estimate "Y" of the theoretical probability "p", from rolling a die "n" times and getting "X" number of successes. We would like to know how "Y" is related to the theoretical "p".
Since Z = (Y-p)/sqrt[p(1-p)/n] approximately follows a standard normal distribution, this means the confidence intervals can be determined for a range of Z's.
For a 95% confidence interval, Z will fall between -1.96 and 1.96 (from a table of areas of a normal distribution). (For a 99% confidence interval, Z will fall between -2.575 an 2.575).
In the case of a 95% confidence interval |Z| < 1.96, this means:
Probability ( |Y-p| < 1.96 * sqrt[p(1-p)/n] ) = 0.95
(Similarly for a 99% confidence interval, Probability ( |Y-p| < 2.575 * sqrt[p(1-p)/n] ) = 0.99 ).
Let the deviation d = 1.96 * sqrt[p(1-p)/n], which represents the deviation of the estimate "Y" from the theoretical probability "p".
From this we can determine how many number of trials "n" (ie. rolls of a die) it takes, such that the deviation between the estimate "Y" and theoretical probability "p" will stay within "d" with a 95% probability.
Doing some algebra, we get "n" in terms of "d":
n = (1.96)^2 * p(1-p)/(d^2)
For various theoretical probabilities "p" and deviations "d", we get:
(For a d20, a 5% deviation d = 0.05 starts to overlap between different values on the d20. If one wants more precision, one can use a 1% deviation d = 0.01).
d = 0.05p = 0.50 ---> n = 384
p = 0.40 ---> n = 369
p = 0.30 ---> n = 323
p = 0.20 ---> n = 246
p = 0.10 ---> n = 138
d = 0.01p = 0.50 ---> n = 9604
p = 0.40 ---> n = 9220
p = 0.30 ---> n = 8067
p = 0.20 ---> n = 6147
p = 0.10 ---> n = 3457
These results mean that for 95% of the time when rolling a die to hit a particular DC, the practical in-game probability (ie. number of successes divided by the total number of die rolls including misses/failures) deviating less than 5% from the "theoretical probability", would require at least a few hundred die rolls. (For a 1% deviation between the practical in-game and theoretical probabilities, it would require several thousand die rolls).
Not too surprising that in a generic four-five hour D&D game session where one only does a few dozen or so d20 rolls, the practical in-game probability "Y" does not always converge to the theoretical probability "p". One needs at least several hundred d20 rolls to see this convergence within a 5% tolerance band, 95% of the time.
One shouldn't be shocked at seeing somebody with a "hot hand" streak of rolling lots of high numbers on a d20, or somebody else on a losing streak of rolling lots of low numbers on a d20.