Medical ontologies are widely used to represent and organize medical terminologies. Examples include ICD-9, ICD-10, UMLS etc. The ontologies are often constructed in hierarchical structures, encoding the multi-level subclass relationships among different medical concepts, allowing very fine distinctions between concepts. Medical ontologies provide a great source for incorporating domain knowledge into a healthcare prediction system, which might alleviate the data insufficiency problem and improve predictive performance with rare categories. To incorporate such domain knowledge, Gram, a recent graph attention model, represents a medical concept as a weighted sum of its ancestors' embeddings in the ontology using an attention mechanism. Although showing improved performance, Gram only considers the unordered ancestors of a concept, which does not fully leverage the hierarchy thus having limited expressibility. In this paper, we propose Hierarchical Attention Propagation (HAP), a novel medical ontology embedding model that hierarchically propagate attention across the entire ontology structure, where a medical concept adaptively learns its embedding from all other concepts in the hierarchy instead of only its ancestors. We prove that HAP learns more expressive medical concept embeddings - from any medical concept embedding we are able to fully recover the entire ontology structure. Experimental results on two sequential procedure/diagnosis prediction tasks demonstrate HAP's better embedding quality than Gram and other baselines. Furthermore, we find that it is not always best to use the full ontology. Sometimes using only lower levels of the hierarchy outperforms using all levels.