• Get 30% off the first 3 months with code HIDE30

    Offer valid until 9/23! If you have an annual subscription on Sniper's Hide, subscribe below and you'll be refunded the difference.

    Subscribe
  • Having trouble using the site?

    Contact support

Lot Testing - What is statistically significant

When looking for an answer to a completely different topic, my search returned a few posts regarding lot testing. There were comments and questions such as how many shots should be taken and will just 5 or 10 shots give any meaningful information. So I pulled out one of my old tools to see for myself.

Before I retired, I spent many years where my job required me to use statistics to validate both performance of systems and to validate that improvements made to a system were meaningful (significant). The (Excel) tools I created to perform these tasks were validated by statisticians who were university professors. Although I am fairly versed in statistical analysis, I am not a statistician myself. So, if there are any statisticians around that will validate my conclusions related to their use in lot testing, that would be great. Anyway...

The image I've attached is a screenshot of my analysis workbook where I've made comparisons between lots. The first five are actual test shots provided by Anschutz North America. The other five, I provided as additional examples. I purchased one of the lots (Lot 010). At the time, I purchased it because it had the smallest group of the fourteen lots they tested. But the question I wanted answered now is, are five shot groups adequate for demonstrating that one group is conclusively (statistically) better than another? Yes, Lot 010 is conclusively better than the other lots - even though only five shots were taken for comparison. That said, it is most certainly better if more shots are taken. I don't want to imply otherwise. The margin of error will be smaller as the number of shots taken are increased, so (for numerous reasons) the more shots the better. But groups of as low as five shots can be statistically meaningful.

If someone would like a copy of the Excel workbook used, I will be happy to provide it.

So, please take a gander and let me know what you think.

Lot Testing Analysis.jpg
 
When I lot tested my CZ Bench rifle I just looked at what shot best. Here is how I did my lot testing
I got 9 lots of Eley Match. I choose the lots based on there being at least 200 more boxes available.
My plan was to shoot 5 5 shot groups with each lot. I found that 5 lots were not good and I could see this after just 3 5 shot groups. The reaming 4 lots shot much better with 2 lots really standing out. I then shot 5 more 5 shot groups with the two best lots and one lot was the clear winner. I then ordered a case of the best lot for KSS sports and I had in 3 days. I can tell you the case I ordered does also shoot very well in two other CZ's I have. Here is a pick of my bench CZ.

IMG_0636.jpg
 
  • Like
Reactions: bobby1028
I'm not a stats guy, but have cursory knowledge of it. Differentiating sets from population can be confusing, at least for me. Given the wide variability in 5 shot group sizes, I would not draw any conclusions based off a single 5-shot group. I'd like to see 5-10 groups of 5, so we can get a 25 or 50 shot population.

When I lot tested my CZ Bench rifle I just looked at what shot best. Here is how I did my lot testing
I got 9 lots of Eley Match. I choose the lots based on there being at least 200 more boxes available.
My plan was to shoot 5 5 shot groups with each lot. I found that 5 lots were not good and I could see this after just 3 5 shot groups. The reaming 4 lots shot much better with 2 lots really standing out. I then shot 5 more 5 shot groups with the two best lots and one lot was the clear winner. I then ordered a case of the best lot for KSS sports and I had in 3 days. I can tell you the case I ordered does also shoot very well in two other CZ's I have. Here is a pick of my bench CZ.

View attachment 8504744
That is basically how I do it. You can sometimes see inconsistency fairly quickly, but consistency only appears after several groups.
 
First off, the subject needs more specification. "How many shots?" depends on the level of resolution you want to determine "better" with confidence.

That being said, In order to create a repeatable test, expect to shoot 35-50 rounds of each variable change to reliably quantify performance based on a single test. This is around the order of magnitude that most practical precision shooters would want to see probably in the .0X" range for mean radius.

The variability of repeat tests in 3, 5, 10, 15, and even 20-shot strings is enough that considerable overlap can exist on single tests of each variable. 20 shots is sort of the bare minimum that I even look at anymore if I am attempting to quantify dispersion performance. The "confidence density" if you can think of it that way, is best doing large sample single tests, or composite small tests WITH A COMMON POA REFERENCE. If you lose the POA reference and only look at mean radius of four 5-shot strings, for example, you have less data than a single 20 shot test. If you correlate/overlay the common POA, you have the same data with a single 20 or 4x 5-shots. Hopefully that makes sense.

You will also see, when you dig into this subject that the viability of group size vs. mean radius is basically a moot point up to 15-18 shots. The variability as a percent of the long-term average is in the same realm (huge on both). Once you get to 18-20+ shots, mean radius is significantly a better predictor of the population from which the sample comes, and the more rounds you pile in, the more accurate mean radius is as a predictor vs. group size.

Unfortunately, the reality of this is that to get truly definitive values down to the .001" requires samples in the realm of 200 shots. In other words it's VERY common to get "lied to" if the results are remotely close to one another until/unless you plug what most people would consider an unreasonable amount of ammo into the tests.

Functionally, if you shoot a 20-shot string of 2 different lots/loads and one is a mean radius of .25 and one is a mean radius of .28, flip a coin. Long term they could go either way. However, if one is .25 and one is .40, odds are the .25 is going to be better.

Also, on the point of the coin flip situation, it really doesn't pencil out to much for hit probability and it's probably not worth worrying about. This is where I'm coming from when I've posted about this subject before, or even on the Hornady Podcast where I've said you won't likely see the difference between two similar loads even if one is technically better. At some point the best bang-for-the-buck option is to allow "good enough" to be good enough, and have an educated perspective on what that is.

Conclusion: Best bang for the buck if you really care about statistically significant data, 20-35 shots per variable change. Use mean radius. Understand there is still variability in those tests and set your expectations accordingly.
 
I should say, also, that groups will obviously never get smaller. If you shoot 3 shots and you have a 3" group.... It's going to take a very unique circumstance for that to be a viable option and you can probably write it off. It's not reliable to get the worst 3 shots of a population in the random 3 shots you shoot, but it can happen and can be used. Just bear in mind that if the 3 shot group is .75 MOA that might not be horrible, and the next 3 shot group could be .2 MOA. However, if the first 3-shot group is 1.5 MOA, odds are it's not a great fit for your barrel. YMMV.
 
When I lot tested my CZ Bench rifle I just looked at what shot best. Here is how I did my lot testing
I got 9 lots of Eley Match. I choose the lots based on there being at least 200 more boxes available.
My plan was to shoot 5 5 shot groups with each lot. I found that 5 lots were not good and I could see this after just 3 5 shot groups. The reaming 4 lots shot much better with 2 lots really standing out. I then shot 5 more 5 shot groups with the two best lots and one lot was the clear winner. I then ordered a case of the best lot for KSS sports and I had in 3 days. I can tell you the case I ordered does also shoot very well in two other CZ's I have. Here is a pick of my bench CZ.

View attachment 8504744
Gixxer that Woox chassis sure is pretty. question, that wide flat benchrest forend is it a Woox , don't see it on their website ?
 
  • Like
Reactions: 68hoyt
I've been to Lapua in Ohio several times both testing new rifles and retesting older rifles that have run out of the tested ammo. They use 10-shot groups to find out what looks "interesting" and then shoot another 10-shot group to either verify the best one or further lower the pool of candidates. Seems to work.
 
I still wonder what does the factory consider to be statistically significant ? :unsure:

Supposedly every batch of match quality cartridges are rated according to results
produced from the samples fired in the test tunnels at the factories.
What number of cartridges are needed to test to provide a reliable indication of batch quality?

What is the level of confidence used? 75%? 80%? 90%? 95%?
What error is acceptable in the calculation? 10%? 5%? 1%?
What is the acceptable defect rate per 100,000 units?
Considering the latest deliveries from Eley have produced some unsatisfactory batches,
is the factory batch testing less than reliable?

Here's a link to a sample size calculator:



Input some numbers and compare the sample sizes required
with the different levels of confidence desired.
 
Last edited:
I’ve been thinking about why statistics seem to show why small numbers of groups are significant but I do t find them so.
I’m now wondering if it is related to shooting inside vs outside. I do my testing at 100 yards.
It takes me several range trips to select the best ammo even under seemingly great conditions.
Otherwise I get what @justin amateur calls “random acts of accuracy” that skew towards one lot over another.
 
When looking for an answer to a completely different topic, my search returned a few posts regarding lot testing. There were comments and questions such as how many shots should be taken and will just 5 or 10 shots give any meaningful information. So I pulled out one of my old tools to see for myself.

Before I retired, I spent many years where my job required me to use statistics to validate both performance of systems and to validate that improvements made to a system were meaningful (significant). The (Excel) tools I created to perform these tasks were validated by statisticians who were university professors. Although I am fairly versed in statistical analysis, I am not a statistician myself. So, if there are any statisticians around that will validate my conclusions related to their use in lot testing, that would be great. Anyway...

The image I've attached is a screenshot of my analysis workbook where I've made comparisons between lots. The first five are actual test shots provided by Anschutz North America. The other five, I provided as additional examples. I purchased one of the lots (Lot 010). At the time, I purchased it because it had the smallest group of the fourteen lots they tested. But the question I wanted answered now is, are five shot groups adequate for demonstrating that one group is conclusively (statistically) better than another? Yes, Lot 010 is conclusively better than the other lots - even though only five shots were taken for comparison. That said, it is most certainly better if more shots are taken. I don't want to imply otherwise. The margin of error will be smaller as the number of shots taken are increased, so (for numerous reasons) the more shots the better. But groups of as low as five shots can be statistically meaningful.

If someone would like a copy of the Excel workbook used, I will be happy to provide it.

So, please take a gander and let me know what you think.

View attachment 8503917

Nice work (y)
 
IMG_3949.jpeg


Always wondered what the best way for expressing outcomes would be.

Group size would be easiest, but then - there’s probably also something to be said about the radius from the target.

I long lost the math skills to express a unified measure as an expression of both.
 
I still wonder what does the factory consider to be statistically significant ? :unsure:

Supposedly every batch of match quality cartridges are rated according to results
produced from the samples fired in the test tunnels at the factories.
Perhaps it's worth considering that the factory doesn't test all of the ammo it produces. After all, no variety of match ammo comes with a performance guarantee of any kind. Some lots are not very good and shoot rather poorly. Why test at all?

It's worth noting that after many complaints from buyers, it seems Eley issued a rare recall of some of it's Ultra Long Range ammo. A recall wouldn't be necessary if the lot in question had been factory tested. The message shown below was reproduced and posted in a thread here in post #32 https://www.rimfirecentral.com/threads/eley-ultra-long-range.1304633/