Today I wanted to get to the bottom of something I typically
left to the pros: statistics. I must
admit that this was never one of my strong points. In fact in University during our open book
mid-term for our stats course I was only able to pull off a mediocre C. And I thought statistics was the art of never having to say your wrong.
My fun is to compare two approaches often brought
to the table when it comes time accepting an automated visual inspection system
as an approved replacement for human inspection: Attribute Gauge R&R and
Probability of Detection (POD).
First to establish what each means from a high level:
An attribute gauge R&R is a tool often used when a
measurement system relies on human judgement and the R&R stands for
repeatability and reproducibility. The repeatability is a measure of does the
same operator, measuring the same thing, using the same gauge, get the same reading
every time. Reproducibility is measuring if different operators, measuring the
same thing, using the same gauge, get the same reading every time.
So the idea applying this to automation is to stack up the
automated visual inspection system with a defined sample set against an expert
inspector or a number of expert inspectors.
The results of an attribute gage R&R are two percentages
- percentage of repeatability and percentage of reproducibility. Ideally, both
percentages should be 100 percent but generally anything above 90 percent is
often fine.
Using this test you make the assumption you are happy with
the same results as your expert inspectors so if you can achieve a correlation
between the inspector and the automated system, you are confident the system is
at least as good. See a small flaw?
Probability of detection gives similar results as far as
numbers but it let’s the automated visual inspection system stand on its
own. A formal definition: probability of
detection, as a function of the discontinuity size, is the fraction of
discontinuities of a nominal size that are expected to be detected. Looking at
the results this is normally expressed as a ratio of a probability of detecting
a discontinuity with a confidence level (90/95). The first number represents the probability
that the anomaly will be detected, which is given as a percentage. The second
number is the confidence level for detecting the anomaly. This is usually represented as a graph of
defect size versus probability of detection.
So stacking them up here are the numbers:
Attribute GR&R POD
% repeatability %
probability anomaly will be detected
% reproducibility Confidence
level the anomaly will be detected
And here is the tough part of a POD study. To make the POD valid, depending which
document you read, the number of indications available is usually in the range
of 20-100 or more of each defect type you want to find. So if you have 10 defect types this grows
very fast. The defects need to be
smaller than the limit, around the limit and much larger than the limit and you
will also need two to three times that number representing regions without
discontinuities.
So while the Attribute Gauge R&R hints you are fine with
the status quo, the POD study really taxes your resources and may not be even
possible depending on the types of parts you are manufacturing.
I actually prefer to go through a POD with a customer. It is much more of an extensive test and
serves as a great platform for learning about automated visual inspection. If budgets don’t allow for a full test the
second best is the Attribute Gauge R&R.
Overall both have implementation flaws if not approached properly. The key is to understand what you are realistically
able to achieve in your production environment then setting expectations and
planning appropriately.
Has anyone come across other tests that do a great job of
comparing a human process to an automated process?