Image feature extraction based on deep disentangled representation learning of PET images is proposed for the prediction of lymphoma treatment response. Our method encodes PET images as spatial representations and modality representations by performing supervised tumor segmentation and image reconstruction. In this way, the whole image features (global features) as well as tumor region features (local features) can be extracted without the labor-intensive tumor segmentation and feature calculation procedure. The learned global and local image features are then joined with several prognostic factors evaluated by physicians based on clinical information, and used as input of a SVM classifier for predicting outcome results of lymphoma patients. In this study, 186 lymphoma patient data were included for training and testing the proposed model. The proposed method was compared with the traditional straightforward feature extraction method. The better prediction results of the proposed method also show its efficiency for prognostic prediction related feature extraction in PET images.