We already have a way of using sample size for statistics, confidence intervals. The smaller the sample size, the larger the confidence interval for a specific confidence level. We will never get confidence intervals that do not overlap when using a level of 95% when deciding on a load recipe, there simply is no way to do it with how short of a barrel life we have. As such we need to settle for a lower confidence level. Sample size of 3 is more or less useless, 5 isn't great, 10 is much better etc.
One trick you can do is what people do for OCW without even knowing it. Simply by saying, "I am looking for consecutive powder charges or seating depth that have similar POI" you are increasing your sample size by putting 3 groups into one box. Example:
Box A = 40.0, 40.2, 40.4
Box B = 40.2, 40.4, 40.6
Box C = 40.4, 40.6, 40.8
etc
So you took 3 shot groups and turned them into a sample of 9.
The Scott Satterlee test has a sample size of 1, so even with putting consecutive "groups" into a "box" that leaves you with 3, which is still trash. My velocity graphs are basically a straight line with 5 and 10 shot groups, no flat spots (at best the slope may vary but it has never flattened out).
I think most people don't shoot enough shots but I also don't think 50 rounds per interval is necessary.
I shot a ladder test at 600 5 times. Grouping consecutive charges gives me a sample of 15.
I used Sub MOA on iOS to get the center of each Box and then used that to get the mean radius, standard deviation and confidence interval.
The numbers are basically grid values in paint but it is all relative so it doesn't matter the actual MOA value.
Box A: 1, 2, 3
SD: 17.3
Avg: 26.4
90% Confidence Interval of Avg: 19.1 to 33.7
70% Confidence Interval of Avg: 21.8 to 31
Box B: 2, 3, 4
SD: 16
Avg: 22
90% Confidence Interval of Avg: 15.2 to 28.8
70% Confidence Interval of Avg: 17.7 to 26.3
60% Confidence Interval of Avg: 18.5 to 25.5
50% Confidence Interval of Avg: 19.2 to 24.8
Box C: 3, 4, 5
SD: 11.2
Avg: 16.9
90% Confidence Interval of Avg: 12.2 to 21.6
70% Confidence Interval of Avg: 13.9 to 19.9
60% Confidence Interval of Avg: 14.5 to 19.4
50% Confidence Interval of Avg: 15 to 18.9
Box D: 4, 5, 6
SD: 22.6
Avg: 34.9
90% Confidence Interval of Avg: 25.3 to 44.5
70% Confidence Interval of Avg: 28.8 to 40.9
There is a pretty large overlap of the confidence interval for 90%. If we look at 70%, Box C has no overlap with D and A (Max C Less than Min A and Min D) but there is still an overlap with B. There is still an overlap at 60%, we need to drop to 50% to have no overlap between C and B.
Keep in mind, I did 5 shots per charge weight when most people do 1 maybe 2. So the people who only have 1 or 2 shots per charge weight are going to have large confidence intervals for %'s of 50+.
You can mock the people who advocate for larger sample sizes all you want. Statistics and probability isn't a new thing. You can ignore it all you want but the fact is there is a hidden value (confidence interval at a specific level) when someone gives you a set of #'s for their group sizes or muzzle velocities or POI and leaving it out does a disservice to the practice of load development.
If you were to evaluate 10 pitchers to be on your baseball team, would you have them throw 1 pitch each? 3? 5? Just because reloading components are expensive and barrels have a limited life all of a sudden we are OK with using low sample sizes? Like I said earlier, there is a compromise but in my opinion 1 or 3 just isn't enough. People who say "I did the Scott Satterlee test and I got amazing results" either got lucky or are just using good components in a well put together rifle.
My stats for MV and Group Size at 100 yards were (5 Shots per):
1. SD 6.1 (deleted 1 MV {first of the day}), .6 MOA
2. SD 2.6, .525 MOA
3. SD 4.6, .32 MOA
4. SD 2.2, .5 MOA
5. SD 2.8, .32 MOA
6. SD 2.8, .33 MOA
The confidence level for having no overlap between #5 and the rest is very low, I am not even going to try to calculate it since I know a sample of 5 is going to require an insanely low confidence level (%). If I did a Satterlee test, it wouldn't matter which of these groups I ended up with, I would assume it was a success because I would be happy with any of these. The problem is we want a load that is going to be resilient, to resist change; temperature, erosion of lands, imperfect charge weight (scale variance), imperfect seating depth etc. I believe Audette Ladder test with 5+ shots per interval will get me there.