Research on metacognitive judgment accuracy during retrieval practice has increased in recent years. However, prior work had not systematically evaluated item-level judgment accuracy and the underlying bases of judgment accuracy in a criterion-learning paradigm (in which items are practiced until correctly recalled during encoding). Understanding these relationships during criterion learning has important theoretical implications for self-regulated learning frameworks, and also has applied implications for student learning: If the factors that influence metacognitive judgments are not predictive of subsequent test performance, students may make poor decisions during self-regulated learning. In the present experiments, participants engaged in test–restudy practice until items were recalled correctly. Once a given item reached criterion, participants made an immediate or delayed judgment of learning (JOL) for the item. A final cued-recall test occurred 30 min later. We examined judgment accuracy (the relationship between JOLs and test performance) and the underlying bases of judgment accuracy by evaluating cue utilization (the relationship between cues and JOLs) and cue diagnosticity (the relationship between cues and test performance). Immediate JOLs were only modestly related to subsequent test performance, and further analyses revealed that the cues related to JOLs were only weakly predictive of test accuracy. However, delaying JOLs improved both the accuracy of the JOLs and the diagnosticity of the cues that influenced judgments.