The conundrum of kappa and why some musculoskeletal tests appear unreliable despite high agreement: A comparison of cohen kappa and gwet ac to assess observer agreement when using nominal and ordinal data

  • Michael T. Cibulka
  • , Michael J. Strube

Research output: Contribution to journalArticlepeer-review

15 Scopus citations

Abstract

n clinical practice, physical therapists often use different kinds of tests and measures in the assessment of their patients. For therapists to have confidence when using their tests andmeasures, an important attribute is having intratester and intertester reliability. Studies that assess reliability are cases of observer agreement. Many studies have been performed assessing observer agreement in the physical therapy literature. The most commonly used method to assess observer agreement studies that use nominal or ordinal data is the statistical method suggested by Cohen and the corresponding reliability coefficient, Cohen kappa. Recently, Cohen kappa has undergone scrutiny because of what is called kappa paradox, which occurswhen observer agreement is high but the resulting kappa value is low. Another paradox also occurswhen asymmetries exist between raters on their disagreements, resulting in a higher kappa value. In the physical therapy literature, there are numerous examples of this problem, which can often lead to misunderstanding the meaning of the data. This Perspective examines how and why these problems occur and suggests an alternative method for assessing observer agreement.

Original languageEnglish
Article numberpzab150
JournalPhysical therapy
Volume101
Issue number9
DOIs
StatePublished - Sep 1 2021

Keywords

  • Kappa
  • Observer Agreement
  • Reliability

Fingerprint

Dive into the research topics of 'The conundrum of kappa and why some musculoskeletal tests appear unreliable despite high agreement: A comparison of cohen kappa and gwet ac to assess observer agreement when using nominal and ordinal data'. Together they form a unique fingerprint.

Cite this