TY - JOUR
T1 - Comparison of methods for characterizing skin pigment diversity in research cohorts
AU - Lipnick, Michael S.
AU - Chen, Danni
AU - Law, Tyler
AU - Moore, Kelvin
AU - Lester, Jenna C.
AU - Monk, Ellis P.
AU - Hendrickson, Carolyn M.
AU - Chou, Yu
AU - Hughes, Caroline
AU - Behnke, Ella
AU - Elmankabadi, Seif
AU - Ortiz, Lily
AU - Negussie, Fekir
AU - Leeb, Gregory
AU - Ehie, Odinakachukwu
AU - Auchus, Isabella
AU - Igaga, Elizabeth N.
AU - Bisegerwa, Ronald
AU - Okunlola, Olubunmi
AU - Bickler, Philip
AU - Feiner, John
AU - Shmuylovich, Leonid
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2026/1/1
Y1 - 2026/1/1
N2 - Background Some pulse oximeters perform worse in people with darker skin, and this may be due to inadequate diversity of skin pigment in device development study cohorts. Guidance is needed to accurately and equitably characterize skin pigment to ensure diversity in research cohorts. We tested multiple methods for characterizing skin pigment to assess comparability and impact on cohort diversity. Objectives The objectives of this study were to assess reliability and comparability of common skin pigment measurement methods, compare findings from different anatomical sites and demonstrate that pigment cannot be assumed from US National Institutes for Health (NIH) race categories. Methods We used three subjective methods [perceived Fitzpatrick (pFP) scale, Monk Skin Tone (MST) scale and Von Luschan (VL) scale] and two objective methods [Konica Minolta CM-700d spectrophotometer and Delfin Skin Color Catch (DSCC) colorimeter] for individual typology angle (ITA) across multiple measurement sites in adults. We calculated ΔE to estimate operator perceptibility thresholds for subjective methods and to determine reproducibility for objective methods. We used each method to categorize participants as ‘light, medium or dark’ and compared the impact of method selection on cohort diversity. Results We studied 789 participants, with 33 856 assessments. The MST had the widest luminosity range, and the VL scale had the least discernible adjacent categories. With ‘dark’ defined as ITA<−30°, 14% of participants were categorized ‘dark’ as compared with 26% by pFP or 16% by MST. Approximately half of the ‘dark’ cohort had an ITA<−50°. With an ITA threshold<−50°, only 7% of the cohort was categorized as ‘dark’. When ‘Black or African American’ self-identification was used to define ‘dark’, 23% of the cohort was categorized as such. Each self-assigned NIH race category included a wide range of ITA and subjective scale categories. Both ITA and L* from the KM-700d and DSCC demonstrated strong correlation (ρ>0.7). Conclusions Common methods for skin pigment characterization, especially the use of race or subjective scales, have significant limitations. When applied to the same cohort, different methods yield significantly different results and some may overestimate diversity. Previously published ITA thresholds for defining ‘dark’ skin are too light and lead to under-representation of people with darker skin.
AB - Background Some pulse oximeters perform worse in people with darker skin, and this may be due to inadequate diversity of skin pigment in device development study cohorts. Guidance is needed to accurately and equitably characterize skin pigment to ensure diversity in research cohorts. We tested multiple methods for characterizing skin pigment to assess comparability and impact on cohort diversity. Objectives The objectives of this study were to assess reliability and comparability of common skin pigment measurement methods, compare findings from different anatomical sites and demonstrate that pigment cannot be assumed from US National Institutes for Health (NIH) race categories. Methods We used three subjective methods [perceived Fitzpatrick (pFP) scale, Monk Skin Tone (MST) scale and Von Luschan (VL) scale] and two objective methods [Konica Minolta CM-700d spectrophotometer and Delfin Skin Color Catch (DSCC) colorimeter] for individual typology angle (ITA) across multiple measurement sites in adults. We calculated ΔE to estimate operator perceptibility thresholds for subjective methods and to determine reproducibility for objective methods. We used each method to categorize participants as ‘light, medium or dark’ and compared the impact of method selection on cohort diversity. Results We studied 789 participants, with 33 856 assessments. The MST had the widest luminosity range, and the VL scale had the least discernible adjacent categories. With ‘dark’ defined as ITA<−30°, 14% of participants were categorized ‘dark’ as compared with 26% by pFP or 16% by MST. Approximately half of the ‘dark’ cohort had an ITA<−50°. With an ITA threshold<−50°, only 7% of the cohort was categorized as ‘dark’. When ‘Black or African American’ self-identification was used to define ‘dark’, 23% of the cohort was categorized as such. Each self-assigned NIH race category included a wide range of ITA and subjective scale categories. Both ITA and L* from the KM-700d and DSCC demonstrated strong correlation (ρ>0.7). Conclusions Common methods for skin pigment characterization, especially the use of race or subjective scales, have significant limitations. When applied to the same cohort, different methods yield significantly different results and some may overestimate diversity. Previously published ITA thresholds for defining ‘dark’ skin are too light and lead to under-representation of people with darker skin.
UR - https://www.scopus.com/pages/publications/105026689505
U2 - 10.1093/bjd/ljaf397
DO - 10.1093/bjd/ljaf397
M3 - Article
C2 - 41073884
AN - SCOPUS:105026689505
SN - 0007-0963
VL - 194
SP - 135
EP - 145
JO - British Journal of Dermatology
JF - British Journal of Dermatology
IS - 1
ER -