1. Overall Accuracy Equality
When the overall procedure accuracy is the same for each group, overall accuracy equality is achieved. Overall procedure accuracy, as previously mentioned, is the total correct number of predictions made by the algorithm, and is defined as (TP + TN)/N.
This definition assumes that true negatives and true positives are equally desirable in our system. In real-world cases, however, it is possible that true positives may be more desirable than true negatives. Rather, false negatives (wrong predictions of negatives) might be twice as undesirable as false positives (wrong predictions of positives).
Overall Accuracy Equality, according to the State of the Art paper, is not commonly used as it doesn't differentiate between success and failure accuracy.
2. Statistical Parity
Statistical Parity is achieved when the prediction distribution is the same across all protected groups. From our previous example, this implies that if sleep patterns was a protected class, the proportion of people predicted to have coffee with (or without) sugar should be equal for people who get less than five hours of sleep, and for people who get more than five hours of sleep.
Statistical parity, also known as demographic parity, has been criticized as it can lead to highly undesirable outcomes. It is defined as (TP + FP)/N or (FN + TN)/N.
By simply sampling more (or less) people from these predicted groups, we could modify the proportions for prediction distribution and achieve statistical parity by changing our population, rather than modifying our algorithm, to equalize the proportion of predictions based on sleep patterns.
3. Conditional Procedure Accuracy Equality
As the name suggests, this fairness is achieved when the conditional procedure accuracy is the same across all protected groups. Simply put, the accuracy of predicting positives (and negatives) from all actual positives (and negatives) should be equal across groups. Thus, the value for TP/(TP + FN) and TN/(TN + FP) should be the same.
This is similar to considering that the false positive and false negative rates are the same across groups.
A closely similar definition of conditional procedure accuracy equality is also referred to as "equalized odds," and "equality of opportunity" is the same as this, for the more desirable outcome only.
4. Conditional Use Accuracy Equality
This differs from the previous fairness in that it depends on the ratio of a predicted outcome among all predictions. The proportion of each prediction should remain the same across groups. This fairness asks that conditional on the prediction of success (or failure), is the probability of success (or failure) the same across groups? Thus, the ratios TP/(TP + FP) and TN/(FN + TN) should be equal across all groups.
Both conditional procedure and conditional use accuracy equality are concerns in criminal justive risk assesments (more on this later). Chouldechova refers to conditional use accuracy equality as positive predictive value (PPV) and corresponds to TP/(TP + FN).
5. Treatment Equality
Treatment equality is achieved when the ratio of false negatives to false positives (or vice versa) is the same across protected groups. This equality is considered a "lever with which to achieve other kinds of fairness." It is defined as FN/FP or FP/FN.
We could treat people who get more sleep differently than people who get lesser sleep to achieve conditional use accuracy equality, by weighing their false negatives for heavily. This would change the treatment ratio, and would be an indicator of unfair treatment of people who get more seep.
6. Total Fairness
Total fairness is achieved when all the above fairness, overall accuracy equality, statistical parity, conditional procedure accuracy equality, conditional use accuracy equality and treatment equality are achieved.
Recent research shows that some of the above fairness are at odds at each other, thus making it impossible to achieve total fairness.
Again, it is important to remember that definitions of fairness exists only between different groups (or classes) of the sample, and when there are more than two outcome categories.