Implementation of the 2014 National Institutes of Health (NIH) response algorithm for joint/ fascia graft-versus-host disease (GVHD) has identified real-world limits to its application. To refine the 2014 NIH response algorithm, we analyzed multicenter prospective observational data from the Chronic GVHD Consortium. The training cohort included 209 patients and the replication cohort included 191 patients with joint/fascia involvement during their course of chronic GVHD. Linear mixed models with random patient effect were used to evaluate correlations between response categories and clinician- or patient-perceived changes in joint status as an anchor of response. Analysis of the training cohort showed that a 2-point change in total photographic range of motion (P-ROM) score was clinically meaningful. The results also suggested that a change from 0 to 1 on the NIH joint/fascia score should not be considered as worsening and suggested that both the NIH joint/fascia score and total P-ROM score, but not individual P-ROM scores, should be used for response assessment. On the basis of these results, we developed an evidence-based refined algorithm, the utility of which was examined in an independent replication cohort. Using the refined algorithm,∼40% of responses were reclassified, largely mitigating most divergent responses among individual joints and changes from 0 to 1 on the NIH joint/fascia score. The refined algorithm showed robust point estimates and tighter 95% confidence intervals associated with clinician- or patient-perceived changes, compared with the 2014 NIH algorithm. The refined algorithm provides a superior, evidence-based method for measuring therapeutic response in joint/fascia chronic GVHD.