Monday 5 December 2011

97.3% of all statistics are made up...


Today I wanted to get to the bottom of something I typically left to the pros: statistics.  I must admit that this was never one of my strong points.  In fact in University during our open book mid-term for our stats course I was only able to pull off a mediocre C.  And I thought statistics was the art of never having to say your wrong.

My fun is to compare two approaches often brought to the table when it comes time accepting an automated visual inspection system as an approved replacement for human inspection: Attribute Gauge R&R and Probability of Detection (POD).

First to establish what each means from a high level:

An attribute gauge R&R is a tool often used when a measurement system relies on human judgement and the R&R stands for repeatability and reproducibility. The repeatability is a measure of does the same operator, measuring the same thing, using the same gauge, get the same reading every time. Reproducibility is measuring if different operators, measuring the same thing, using the same gauge, get the same reading every time.

So the idea applying this to automation is to stack up the automated visual inspection system with a defined sample set against an expert inspector or a number of expert inspectors. 
The results of an attribute gage R&R are two percentages - percentage of repeatability and percentage of reproducibility. Ideally, both percentages should be 100 percent but generally anything above 90 percent is often fine. 

Using this test you make the assumption you are happy with the same results as your expert inspectors so if you can achieve a correlation between the inspector and the automated system, you are confident the system is at least as good.  See a small flaw?

Probability of detection gives similar results as far as numbers but it let’s the automated visual inspection system stand on its own.  A formal definition: probability of detection, as a function of the discontinuity size, is the fraction of discontinuities of a nominal size that are expected to be detected. Looking at the results this is normally expressed as a ratio of a probability of detecting a discontinuity with a confidence level (90/95).  The first number represents the probability that the anomaly will be detected, which is given as a percentage. The second number is the confidence level for detecting the anomaly.  This is usually represented as a graph of defect size versus probability of detection.

So stacking them up here are the numbers:
Attribute GR&R           POD
% repeatability                 % probability anomaly will be detected
% reproducibility              Confidence level the anomaly will be detected

And here is the tough part of a POD study.  To make the POD valid, depending which document you read, the number of indications available is usually in the range of 20-100 or more of each defect type you want to find.  So if you have 10 defect types this grows very fast.  The defects need to be smaller than the limit, around the limit and much larger than the limit and you will also need two to three times that number representing regions without discontinuities.

So while the Attribute Gauge R&R hints you are fine with the status quo, the POD study really taxes your resources and may not be even possible depending on the types of parts you are manufacturing.

I actually prefer to go through a POD with a customer.  It is much more of an extensive test and serves as a great platform for learning about automated visual inspection.  If budgets don’t allow for a full test the second best is the Attribute Gauge R&R.  Overall both have implementation flaws if not approached properly.  The key is to understand what you are realistically able to achieve in your production environment then setting expectations and planning appropriately.

Has anyone come across other tests that do a great job of comparing a human process to an automated process?


No comments:

Post a Comment

Let me know what you think! I love a great discussion!