Unfortunately the problem with statistics is not that they are wrong but that there is a lack of understanding of what the statistics can tell us. Whether your sample set is 3 shots, 20 shots, or 100 shots it is statistically significant. It is the level of significance that is important to understand. Shooting a 3 shot group to zero a rifle will give you a pretty good zero but it is not likely to be the best zero for a 1000yd competition. In a sport where the term "precision" is constantly used when it comes to data we often refer to data that has little precision to it. This is especially true with chronograph data.
When a shooter takes a chronograph out and fires x rounds over it he is obtaining data for those x rounds, on the assumption that they represent what similarly prepared rounds will do in the future. The collection of the data and what it represents is the province of statistics. The leap from past or current knowledge to the future is the province of probability. The data obtained is a random sample and that sample can and should be used to predict what we can expect from those future rounds. The data can also be used to compare the likelihood that two or more sets of data are different.
Consider a shooter that loads and fires 3 shots over a chronograph looking for a 2900 fps mean velocity and a 10 or less standard deviation for his reloads to suit his application, and obtains a mean velocity of 2904 fps, a standard deviation of 8.5 fps, and an extreme spread of 17 fps. He says "Man, I'm there!" and proceeds loading annumition on that basis. But what does his data really tell him. The truth lies in the probabilities and confidence that those 3 shotsnare representative of what the future rounds will be. It is normal to consider 95% confidence interval to be used as a metric to have confidence in what the sample data tells us. In this case the mean of 2904 for three shots gives a confidence interval of 2883 to 2925 fps as the likely range for the true mean velocity. This range of probable velocity may or may not be an issue for the shooter. If we apply a 95% confidence interval to the test standard deviation the likely true range is between 4.5 fps to 53.7 fps. If the shooter was looking for a 10SD then is something between 4.5 and 53.7 fps acceptable? Probably not. So he goes back and shoots 5 shots, and the results are mean between 2889 and 2911 fps, SD between 5.21 and 25 fps. ES of 22 fps. Acceptable? If he shoots 10 rounds and the standard deviation is 8.8 fps then the SD range is 6.06 to 16.1 fps (ES 28) and if he shoots 20 rounds and the standard deviation is 8.7 fps the SD range is now 6.61 to 12.7 fps (ES 37). Notice that unlike mean velocity, standard deviation is not symmetrical and is skewed. It is not normally distributed and follows a Chi-squred distribution as opposed to normal. Had the shooter used the 3 or 5 shot data he likely would have been severely disappointed in the performance of his reloads on target.
Consider what happens when comparing standard deviations of two different samples with "something different" about the samples (think primers?). Sample 1 has 5 shots with a SD of 9 and Sample 2 has 5 shots with a SD of 15. How do we intrepret this data. Obviously the 9 is less than the 15 but is it better and by how much? When analyzed by the usual F test statistic there is an 82% chance that that the 9 is statistically less than 15 but we do not know by how much. We can look at the confidence intervals for the 9 and find it to be 5.4 and 25.9 fps and the 15 to be between 9 and 43.1 fps. So while we have a fair amount of confidence that for this test the 9 is statistically less than the 15 for a large population we have very little confidence in what SDs they actually represent or by actually how much the difference is. Because of the large overlap in confidence intervals (9 to 26) it's likely that if this test were run again with 5 shot samples the results could be different.
When a shooter takes a chronograph out and fires x rounds over it he is obtaining data for those x rounds, on the assumption that they represent what similarly prepared rounds will do in the future. The collection of the data and what it represents is the province of statistics. The leap from past or current knowledge to the future is the province of probability. The data obtained is a random sample and that sample can and should be used to predict what we can expect from those future rounds. The data can also be used to compare the likelihood that two or more sets of data are different.
Consider a shooter that loads and fires 3 shots over a chronograph looking for a 2900 fps mean velocity and a 10 or less standard deviation for his reloads to suit his application, and obtains a mean velocity of 2904 fps, a standard deviation of 8.5 fps, and an extreme spread of 17 fps. He says "Man, I'm there!" and proceeds loading annumition on that basis. But what does his data really tell him. The truth lies in the probabilities and confidence that those 3 shotsnare representative of what the future rounds will be. It is normal to consider 95% confidence interval to be used as a metric to have confidence in what the sample data tells us. In this case the mean of 2904 for three shots gives a confidence interval of 2883 to 2925 fps as the likely range for the true mean velocity. This range of probable velocity may or may not be an issue for the shooter. If we apply a 95% confidence interval to the test standard deviation the likely true range is between 4.5 fps to 53.7 fps. If the shooter was looking for a 10SD then is something between 4.5 and 53.7 fps acceptable? Probably not. So he goes back and shoots 5 shots, and the results are mean between 2889 and 2911 fps, SD between 5.21 and 25 fps. ES of 22 fps. Acceptable? If he shoots 10 rounds and the standard deviation is 8.8 fps then the SD range is 6.06 to 16.1 fps (ES 28) and if he shoots 20 rounds and the standard deviation is 8.7 fps the SD range is now 6.61 to 12.7 fps (ES 37). Notice that unlike mean velocity, standard deviation is not symmetrical and is skewed. It is not normally distributed and follows a Chi-squred distribution as opposed to normal. Had the shooter used the 3 or 5 shot data he likely would have been severely disappointed in the performance of his reloads on target.
Consider what happens when comparing standard deviations of two different samples with "something different" about the samples (think primers?). Sample 1 has 5 shots with a SD of 9 and Sample 2 has 5 shots with a SD of 15. How do we intrepret this data. Obviously the 9 is less than the 15 but is it better and by how much? When analyzed by the usual F test statistic there is an 82% chance that that the 9 is statistically less than 15 but we do not know by how much. We can look at the confidence intervals for the 9 and find it to be 5.4 and 25.9 fps and the 15 to be between 9 and 43.1 fps. So while we have a fair amount of confidence that for this test the 9 is statistically less than the 15 for a large population we have very little confidence in what SDs they actually represent or by actually how much the difference is. Because of the large overlap in confidence intervals (9 to 26) it's likely that if this test were run again with 5 shot samples the results could be different.