warning: wall of text incoming
So I was musing on this. Earlier I was just yanking
@physguy88 's chain, because what he said was essentially spot on, terminology aside. Having been through process control hell in several companies and even being the dreaded "black belt" of Six Sigma and teaching data science at a D1 school.
Back in the day I had a piece of equipment and there was a 'Spec' for its performance. This had VERY LITTLE to do with equipment, it was what was required for the product to function.
So my specialty was metals. Let's say my job was to deposit Nickel. My spec says that my layer of Ni has to be 100 +/- 10 units. In a perfect world this is my "six sigma" in other words the process I need only has a 1 in a million chance (roughly--long story short it really only 4.5 sigma) of producing a wafer where the thickness is 110 or greater or 90 or less. That means its 6 Standard Deviations in 10 units. Or in other words I have to run my equipment at 100 +/- 1.67 (1.67 being the STD DEV) or less units. to hit this specification.
BUT WAIT there is more! When I put my nickel down, its not 100 units everywhere. Some places it 99 and some places its 101. I'm working in semiconductors so I have a "within wafer" variation. So while the mean thickness may be 100, in some places its more, some places its less, so I have to take that into account as well! Typically I measured up to 49 points to get the mean and std dev within the run.
But behind Door #3 is a third problem:
Every time I run my machine its "destructive". It changes the output of my machine. If I run today and get 100 (mean), the next run I might get 101. Sometimes this is random, sometimes its systematic. In my case there was a large percentage systematic change, that we could account for. So now I have Wafer to Wafer Variation: some of which I can account for. I won't be giving any secrets away here but I had to adjust the height between my wafer and the "metal" being deposited every run. It was tiny, but something we could account for. Here's the fun part--this wafer to wafer (run to run) variation also effected the WITHIN run variation. (It got bigger if out of adjustment).
When I brought in a new piece of equipment to "qualify" for process matching it took us a minimum of 4 "runs" (wafers) if we were good and up to 6 if we were not.
Sidebar: Wafer to Wafer and Within Wafer variances add. (Variance is std dev squared) to that 1.67 is my budget for BOTH within and run to run
So who gives to shits about this and how does it affect accuracy.
Whether you fire 10 shot groups or 3 shot groups, there is a "shot to shot" variation. And then there is a "Session to session" variation. (or string to string). Along with this, we know from our OCD reloader friends that the "test" is destructive--every time you shoot, you put small wear and tear on the barrel that will effect both shot to shot and string to string. Its been a long time since I've done that version of statistics, but its pretty standard, but the long and short of it is:
You need 4 to 6 STRINGS of shots to "match" accuracy. (In other words, My rifle B matches the accuracy of Rifle A).
To ESTABLISH accuracy it would take around 40 Strings before we could call it "established" (40 points)
The long and short of it: 3 shot groups and accuracy guarantees are BULLSHIT. Even a 10 shot group is marginal. And if I want to "Guarantee" something I need a defect level. the "Six Sigma" is based on 3.4 per million. Now for a lower volume business like firearms, you may be ok with say 1 in a 1000 defects or 1 in 10,000 defects. And that means the AVERAGE accuracy is acutally going to be LESS than 1 MOA because the 1 in a million goes over 1MOA
But remember my whole "matching" comment earlier. You need at least 4 measurements (and usually more) to "match" a process and it takes upwards of 8 for a "disqualify/reject"
So yes, 3 shot guarantee is just "luck" If someone came to me with a statistical process and an N=3 (number of trials) I'd laugh their ass outta my office. If I tried that during process control days, I'd be fired. With N=10, you might just get a stern talking to.
Here's a test which one of these shoots better:
View attachment 8225151
View attachment 8225157
Hint: Its the same distribution and deviation (I just ran them back to back)