Background & Aims: Histologically assessed hepatocyte ballooning is a key feature discriminating non-alcoholic steatohepatitis (NASH) from steatosis (NAFL). Reliable identification underpins patient inclusion in clinical trials and serves as a key regulatory-approved surrogate endpoint for drug efficacy. High inter/intra-observer variation in ballooning measured using the NASH CRN semi-quantitative score has been reported yet no actionable solutions have been proposed. Methods: A focused evaluation of hepatocyte ballooning recognition was conducted. Digitized slides were evaluated by 9 internationally recognized expert liver pathologists on 2 separate occasions: each pathologist independently marked every ballooned hepatocyte and later provided an overall non-NASH NAFL/NASH assessment. Interobserver variation was assessed and a ‘concordance atlas’ of ballooned hepatocytes generated to train second harmonic generation/two-photon excitation fluorescence imaging-based artificial intelligence (AI). Results: The Fleiss kappa statistic for overall interobserver agreement for presence/absence of ballooning was 0.197 (95% CI 0.094–0.300), rising to 0.362 (0.258–0.465) with a ≥5-cell threshold. However, the intraclass correlation coefficient for consistency was higher (0.718 [0.511–0.900]), indicating ‘moderate’ agreement on ballooning burden. 133 ballooned cells were identified using a ≥5/9 majority to train AI ballooning detection (AI-pathologist pairwise concordance 19–42%, comparable to inter-pathologist pairwise concordance of between 8–75%). AI quantified change in ballooned cell burden in response to therapy in a separate slide set. Conclusions: The substantial divergence in hepatocyte ballooning identified amongst expert hepatopathologists suggests that ballooning is a spectrum, too subjective for its presence or complete absence to be unequivocally determined as a trial endpoint. A concordance atlas may be used to train AI assistive technologies to reproducibly quantify ballooned hepatocytes that standardize assessment of therapeutic efficacy. This atlas serves as a reference standard for ongoing work to refine how ballooning is classified by both pathologists and AI. Lay summary: For the first time, we show that, even amongst expert hepatopathologists, there is poor agreement regarding the number of ballooned hepatocytes seen on the same digitized histology images. This has important implications as the presence of ballooning is needed to establish the diagnosis of non-alcoholic steatohepatitis (NASH), and its unequivocal absence is one of the key requirements to show ‘NASH resolution’ to support drug efficacy in clinical trials. Artificial intelligence-based approaches may provide a more reliable way to assess the range of injury recorded as “hepatocyte ballooning”.
- Artificial intelligence
- Machine learning
- Nonalcoholic fatty liver disease
- nonalcoholic steatohepatitis