Retesting some reloads today. Very odd results!

sig2009

Full Member
Full Member
Minuteman
Feb 24, 2017
713
166
FL
Worked up a few great rifle loads last week. SD'S anywhere from 2-6. ES anywhere from 5-10. Reloaded the same this week. Same powder and charge, brass and primers and seating depth with same bullets and A&D 120 scale and Garmin Chrono. Well I must say this shocked me. SD'S were running in the 20's and ES was anywhere from 30-60. How can this be? I can't figure this one out! All Alpha Brass was prepared at the same time. FL sized with neck bushing and then ran a mandrel thru them before reloading. Am I missing something here?
 
Last edited:
5 shots. Same as last week.
5 shots is not enough data points to trust the reported SD. Even if you combined the data from the 10 shots and calculated an SD from that, it still wouldn't be statistically significant. Essentially what you found in your first outing was 5 data points that would be in the middle of your standard distribution (bell curve) and your last outing was 5 data points that would fall in the edges of that distribution. You need 25-30+ data points if you want a true picture of what your SD and ES really are.
 
5 shots is not enough data points to trust the reported SD. Even if you combined the data from the 10 shots and calculated an SD from that, it still wouldn't be statistically significant. Essentially what you found in your first outing was 5 data points that would be in the middle of your standard distribution (bell curve) and your last outing was 5 data points that would fall in the edges of that distribution. You need 25-30+ data points if you want a true picture of what your SD and ES really are.
This ^^...and its not just opinion, its mathematics (yeah, and I HATED statistics). I too havegenerated AWESOME SD's that just go up as I add more shots to the to the session.

Keep your Gamin session open and keep adding shots and when you get about 30 of them you will be pretty stable on ES/SD.
 
  • Like
Reactions: Simonsza1
that's why
I've always used 5 shots as a starting point with 6.5 and .308 and never had any issues. If they group well then I move on to repeatability and OAl. Normally the ones that group the best have the worst sd/es so I really don't chase those anymore. Learned that from experience. But every now and then the ones with low SD/ES I'll play with and try to get them to group better but from experience it never worked. Hell. The Hornady factory I used to break in the barrel were all within a half moa but the Garmin read them as 16 SD/25 ES.
 
  • Like
Reactions: ShtrRdy
I've always used 5 shots as a starting point with 6.5 and .308 and never had any issues. If they group well then I move on to repeatability and OAl. Normally the ones that group the best have the worst sd/es so I really don't chase those anymore. Learned that from experience. But every now and then the ones with low SD/ES I'll play with and try to get them to group better but from experience it never worked. Hell. The Hornady factory I used to break in the barrel were all within a half moa but the Garmin read them as 16 SD/25 ES.

If you work up a load in a clean barrel it probably won’t shoot the same next week in a dirty barrel. Carbon fouling changes things.
 
  • Like
Reactions: LR1845
Worked up a few great rifle loads last week. SD'S anywhere from 2-6. ES anywhere from 5-10. Reloaded the same this week. Same powder and charge, brass and primers and seating depth with same bullets and A&D 120 scale and Garmin Chrono. Well I must say this shocked me. SD'S were running in the 20's and ES was anywhere from 30-60. How can this be? I can't figure this one out! All Alpha Brass was prepared at the same time. FL sized with neck bushing and then ran a mandrel thru them before reloading. Am I missing something here?
You've been given the explanation that relates to sample size and it is also an issue of not understanding the concepts of sampling and expectations. You are looking at test data that is subject to random selection and errors. You are not considering what the test data means in relation to the population that you are trying to apply the test data to. I'll try and explain.

If you had a real standard deviation of 6fps (based on a very large sample size) and you took an large number of 5 shot groups then you would expect that 95% (confidence interval) of the 5 shot standard deviations would fall between 3.6 and 17.2 fps. Realistically a real SD of 6 is extremely hard to maintain and measure with consumer chronograph which has an accuracy of 0.1% or ~+/-2.6 to 3.0 fps. Similarly if the true SD is 10 and you take 5 shot groups 95% of the SD's would fall between 6 and 28.7 fps.

Standard deviation does not fit a normal distribution like the mean or average. Small sample sizes tend to bias the results low. This is a results of the probability bias of picking a sample from within the 1 SD range of the population. This is because 68% of the large population lies within the 1 SD range.

In order to begin to understand what the likely standard deviation of an anything that is tested requires sample sizes in the range of 20 to 30 data points. Even then, with 30 points the 95% confidence interval for a true SD of 10 would be 8 to 13.4 fps.
 
Last edited:
@Doom has given you a very comprehensive answer that almost certainly explains your situation.

The only thing I would add, is if you were shooting in drastically different temperatures/conditions this can effect sd/es. I have loads that will hover around 9-11 sd for 30 plus rounds when the temp is 20-90 degrees. Much above 100 and I will see the sd creep up into the high teens.

But if I was betting, I would say your sample size is too small.
 
  • Like
Reactions: Doom
I completely agree that the small sample sizes don’t tell the whole story. I thought I read somewhere on here that your SDx5 should be close to your ES if your data set is approaching a meaningful size….your aren’t even close. So, philosophically, I agree with everything said here. However, it’s possible that something else changed. Are you saying you shot five shots and had “good stats” and then a week later did everything the same and had “bad data”? Further, the good data was 6/10 and the bad was 20/60?

What I find hard to believe is that you found the best data set and then the worst, in two consecutive, same sample size, tests. It would be much more likely to be a case of “you got lucky” if your stats got worse by some smaller amount. So if you told us you shot three five shot groups and your sd/es data was all over a broad range or if you came back on session two and shot 20 rounds and your data went to shit. But to have terrific data followed by terrible (by a factor of 10) stretches the imagination for me. You MAY have encountered statistical outliers on the good and the bad end in consecutive tests but it’s possible you did something to make the problem worse.

Do you leave powder in the hopper? Was it a new jug that has now been open for a week. Do you calibrate your scale each time? Do you lube your mandrel and then do anything to wipe the lube out or to re-lube the necks if time has passed since mandrelling?

Just as an aside, I never get my most consistent velocities with virgin brass.
 
  • Like
Reactions: Doom
I completely agree that the small sample sizes don’t tell the whole story. I thought I read somewhere on here that your SDx5 should be close to your ES if your data set is approaching a meaningful size….your aren’t even close. So, philosophically, I agree with everything said here. However, it’s possible that something else changed. Are you saying you shot five shots and had “good stats” and then a week later did everything the same and had “bad data”? Further, the good data was 6/10 and the bad was 20/60?

What I find hard to believe is that you found the best data set and then the worst, in two consecutive, same sample size, tests. It would be much more likely to be a case of “you got lucky” if your stats got worse by some smaller amount. So if you told us you shot three five shot groups and your sd/es data was all over a broad range or if you came back on session two and shot 20 rounds and your data went to shit. But to have terrific data followed by terrible (by a factor of 10) stretches the imagination for me. You MAY have encountered statistical outliers on the good and the bad end in consecutive tests but it’s possible you did something to make the problem worse.

Do you leave powder in the hopper? Was it a new jug that has now been open for a week. Do you calibrate your scale each time? Do you lube your mandrel and then do anything to wipe the lube out or to re-lube the necks if time has passed since mandrelling?

Just as an aside, I never get my most consistent velocities with virgin brass.
Your comments on good/bad have merit. If the rounds are loaded at different times then those factors and others all come into play as they tend to indicate some type of inconsistency in the load process. That may well be powder charge seating depth, brass prep, orientation of the rounds prior to firing, and a host of other minor factors including the blind luck factor. The bottom line is we don't know.

However the fact remains that testing only five rounds bias the standard deviation on the low side and expecting to repeat that number is going to create exceptions that will likely not be met. Similarly the 20/60 may not reflect the true population either. These are two test based on a total of 10 rounds. All that we can say with any certainty is that the 6 SD and 20 SD are statistically different.

Now that there are two test with significantly different results you are faced with the option of selecting one as good or the other as bad. Neither decision is supported with any certainty by the data. If I encountered this situation my work career I would do exactly what you are discussing. I would examine my entire process and try to eliminate/minimize as many factors that affect the standard deviation as possible then proceed to perform the test again. I might also change the test depending on a re-examination of how I determined the original load.
 
I completely agree that the small sample sizes don’t tell the whole story. I thought I read somewhere on here that your SDx5 should be close to your ES if your data set is approaching a meaningful size….your aren’t even close. So, philosophically, I agree with everything said here. However, it’s possible that something else changed. Are you saying you shot five shots and had “good stats” and then a week later did everything the same and had “bad data”? Further, the good data was 6/10 and the bad was 20/60?

What I find hard to believe is that you found the best data set and then the worst, in two consecutive, same sample size, tests. It would be much more likely to be a case of “you got lucky” if your stats got worse by some smaller amount. So if you told us you shot three five shot groups and your sd/es data was all over a broad range or if you came back on session two and shot 20 rounds and your data went to shit. But to have terrific data followed by terrible (by a factor of 10) stretches the imagination for me. You MAY have encountered statistical outliers on the good and the bad end in consecutive tests but it’s possible you did something to make the problem worse.

Do you leave powder in the hopper? Was it a new jug that has now been open for a week. Do you calibrate your scale each time? Do you lube your mandrel and then do anything to wipe the lube out or to re-lube the necks if time has passed since mandrelling?

Just as an aside, I never get my most consistent velocities with virgin brass.
Ya. I don't buy that whole got lucky thing. In my 20 years of reloading I have never seen where you do a ladder test one week. Then repeat the test the next week and get totally different results. Brand new Alpha brass. In fact the only thing the brass needed was to chamfer and debur the case mouths. Everything was right on with the brass. The length. The sizing. The headspace. Same jug of powder. I never leave powder in the powder measure. Neck tension set with the Wilson mandrel on the brass. Using an A&D scale that is calibrated at every session. Granted. These were reloaded at different times but with all things equal this should not change the results. NOTHING changed when reloading the same rounds. The only difference was that the first 30 rounds shot were in a fully cleaned barrel. The next 30 were shot without cleaning. It's a new PVA barrel with 150 rounds down the pipe. I don't see how 30 rounds fouled the barrel so bad to get those results. I just spent an hour cleaning the barrel again and I will do the test all over again this weekend. I may also redo the test with some multi fired Lapua brass. Who knows. Maybe it is because of the virgin brass but I've had virgin Lapua brass and never had results like this. Now the other thing is. Could it be the Garmin chrono? Who knows. When I got home to look at the results, the Garmin did a software update.
 
Last edited:
The only difference was that the first 30 rounds shot were in a fully cleaned barrel. The next 30 were shot without cleaning. It's a new PVA barrel with 150 rounds down the pipe. I don't see how 30 rounds fouled the barrel so bad to get those results. I just spent an hour cleaning the barrel again and I will do the test all over again this weekend.

I have found over the years that every barrel is different. Also, forgot to mention that in my experience solid copper bullets will screw up the ecosystem in the bore and will require cleaning if you want normal bullets to shoot good again. I dunno if you shot any that day.
 
@sig2009, if I interpret you correctly you shot multiple groups of 5 shots and got sd's ranging from 2-6 fps. If this is correct and your ES was essentially 2xSD then you were loading very consistent ammunition. That's true even if the loads were different. On the second loading if the same thing is true then something is inconsistent. On the surface the first thing that comes to mind is neck tension since the brass may have changed slightly due to springback. The other obvious difference is the fouling of the barrel but I kinda doubt it is the source of the issue.

A word about the chronograph. Most radar chronographs are extremely repeatable. They are not subject to many of the issues that plagued optical chronographs so I would not worry about that data.

I will mention one other possiblity that probably doesn't come into play but if you case if is low (<90%) and the loads were handled differently then that can contribute to differences in SD and ES. For instance if all the first rounds were loaded from an ammo case (head down) and the second set was laid out on the bench before loading this might be a contributing factor.

Good luck.