OBJECTIVES: Respiratory binning of free-breathing magnetic resonance imaging data reduces motion blurring; however, it exacerbates noise and introduces severe artifacts due to undersampling. Deep neural networks can remove artifacts and noise but usually require high-quality ground truth images for training. This study aimed to develop a network that can be trained without this requirement. MATERIALS AND METHODS: This retrospective study was conducted on 33 participants enrolled between November 2016 and June 2019. Free-breathing magnetic resonance imaging was performed using a radial acquisition. Self-navigation was used to bin the k-space data into 10 respiratory phases. To simulate short acquisitions, subsets of radial spokes were used in reconstructing images with multicoil nonuniform fast Fourier transform (MCNUFFT), compressed sensing (CS), and 2 deep learning methods: UNet3DPhase and Phase2Phase (P2P). UNet3DPhase was trained using a high-quality ground truth, whereas P2P was trained using noisy images with streaking artifacts. Two radiologists blinded to the reconstruction methods independently reviewed the sharpness, contrast, and artifact-freeness of the end-expiration images reconstructed from data collected at 16% of the Nyquist sampling rate. The generalized estimating equation method was used for statistical comparison. Motion vector fields were derived to examine the respiratory motion range of 4-dimensional images reconstructed using different methods. RESULTS: A total of 15 healthy participants and 18 patients with hepatic malignancy (50 ± 15 years, 6 women) were enrolled. Both reviewers found that the UNet3DPhase and P2P images had higher contrast (P < 0.01) and fewer artifacts (P < 0.01) than the CS images. The UNet3DPhase and P2P images were reported to be sharper than the CS images by 1 reviewer (P < 0.01) but not by the other reviewer (P = 0.22, P = 0.18). UNet3DPhase and P2P were similar in sharpness and contrast, whereas UNet3DPhase had fewer artifacts (P < 0.01). The motion vector lengths for the MCNUFFT800 and P2P800 images were comparable (10.5 ± 4.2 mm and 9.9 ± 4.0 mm, respectively), whereas both were significantly larger than CS2000 (7.0 ± 3.9 mm; P < 0.0001) and UNnet3DPhase800 (6.9 ± 3.2; P < 0.0001) images. CONCLUSIONS: Without a ground truth, P2P can reconstruct sharp, artifact-free, and high-contrast respiratory motion-resolved images from highly undersampled data. Unlike the CS and UNet3DPhase methods, P2P did not artificially reduce the respiratory motion range.