I think a lot depends on just how accurate the a scale like the DS-750 really is. That scale comparison was mainly to compare scales the ChargeMaster. Though it was interesting that the numbers when weight on the 120i was that close.
Well, I've loaded up some test cartridges to compare chrono data. I'll be shooting two different bullets with two different charges, different cases and primers. Like I've got 10 loaded using the 120i and ten loaded using the using the DS-750 for one set that's in Lapua LRP's cases that have been sorted for the most consistent case weight and using primers that's also been sorted for uniformity. One set with 168 SMK's and the other with 169 SMK's. Bullets and primers are all seated to within .0005". This effort is to try and remove as much variances in the cartridges as I am able.
We'll see what the chrono numbers say when I get these fired
:
View attachment 8571860
Speaking of Rabbits and where they might lead. I'm going to comment on the planning and testing and what @staightshooter1 can expect from the results. This is kind of where the issue of statistical significance comes into play. This how a test engineer would look at the the data and what conclusions might been drawn from it. There is nothing wrong with the plan per se. Where the rub can come in is what the test results show and how they are interpreted. When testing the idea is to determine what the test results are (past history) and use that to predict the future (probabilily of future outcome), The ten shot test is going to yield different mean and standard deviation results for the two different groups. In the design of experiments it is desirable that only one thing be changed in a test (the test variable) but in our shooting world case everything is different on each shot, case, bullet, charge weight, neck tension, chamber and barrel condition, etc. Even the powder itself can very in energy density content in a minor amount. Each of these and other variables conspire to combine in different ways on each shot to create differences in the velocity of each shot that we are trying to assign solely to the different scales. Even though we try to minimize those differences they still exist. Once you have the results, mean and standard deviation, you have to determine if there is a statistical difference between between the standard deviations. Just because one is lower than the other does not mean that they meet a criteria that means something.
Without getting into a long treatise on statistical probability suffice it to say that small sample sizes under estimate the standard deviation and that it is not a normal distribution like the mean is. Small sample sizes severely underestimate the standard deviation. Let's say that one scale produces a SD of 15 fps and the other produces a SD of 10. So does our test result mean that the 10SD scale is actually better than the 15 SD scale? To compare the two results we would use something called an F Test for differences in standard deviation. In this case it would tell us that there is an ~87% probability (chance) that the 10 is statistically better than the 15. But it does not mean the difference is 5fps. Normally when testing we would not consider a result to be statistically significant unless the probability is 95% or greater. It is also important to look at the Confidence Interval (CI) for each standard deviation. Typically a 95% confidence interval is used, meaning that if we ran an infinite number of 10 round samples 95% of the standard deviations would fall between the upper and lower limits. This CI for the 15 SD is 10.32 to 27.38 fps and for the 10 SD is 6.88 to 18.26 fps. You can see that there is a significant overlap in the confidence intervals.
So for this example test we can infer that the SD of 10 appears to be better than the 15SD but statistically there is too little data to be certain of that fact. Even if the SD were to be different on this test it doesn't mean that would always be the case. We would need to run more tests before we could increase our confidence of the results.
It is also important in developing a test protocol to have a feel for what effect the intended variable can have on the results, in this case powder weight (only). This is actually something that is done prior to testing. This kind of estimate is what is typically done to prioritize what needs to be controlled and the type of equipment needed to perform a test. Taking a look at a 308 a 0.1gn change in powder can have about a 6 fps change in velocity if thet is the only thing that were to change. This is equivalent to a 6 fps ES and we can estimate its standard deviation by dividing that number by 6 (true ES is ~6xTrue SD). So our SD for 0.1gn is 1 ft/sec. If our test SD were 15 and the powder weight for the test varied by 0.1 gn (ES) we would expect the SD to go to Sqrt(15^2-1^2) or 14.97 fps. If the Test SD were 10 then the resulting estimate would be Sqrt(10^2-1^2) or 9.95 fps for a zero (0.0) powder weight change. The implication here is that if the SD isn't very low due to all of the factors involved to begin with it is going to be difficult to detect the difference in weight from the velocity because the other variables are influencing the results much more that the variation in weight would. If the standard deviation for the weight of the powder equated to 2 fps SD the reductions would be to 14.86 and 9.79. Still a small change in velocity SD.
Anyway, this how any test results should be analyzed. A similar method applies to the mean for average. Fortunately the mean requires much less data to be significant because it fits a normal distribution. The PrecisionRifleBlog has a great series on statistics applied to shooting and I would recommend it if anyone is interested.