TY - JOUR
T1 - PhraseMap
T2 - Attention-Based Keyphrases Recommendation for Information Seeking
AU - Tu, Yamei
AU - Qiu, Rui
AU - Wang, Yu Shuen
AU - Yen, Po Yin
AU - Shen, Han Wei
N1 - Publisher Copyright:
IEEE
PY - 2022
Y1 - 2022
N2 - Many Information Retrieval (IR) approaches have been proposed to extract relevant information from a large corpus. Among these methods, phrase-based retrieval methods have been proven to capture more concrete and concise information than word-based and paragraph-based methods. However, due to the complex relationship among phrases and a lack of proper visual guidance, achieving user-driven interactive information-seeking and retrieval remains challenging. In this study, we present a visual analytic approach for users to seek information from an extensive collection of documents efficiently. The main component of our approach is a PhraseMap, where nodes and edges represent the extracted keyphrases and their relationships, respectively, from a large corpus. To build the PhraseMap, we extract keyphrases from each document and link the phrases according to word attention determined using modern language models, i.e., BERT. As can be imagined, the graph is complex due to the extensive volume of information and the massive amount of relationships. Therefore, we develop a navigation algorithm to facilitate information seeking. It includes (1) a question-answering (QA) model to identify phrases related to users' queries and (2) updating relevant phrases based on users' feedback. To better present the PhraseMap, we introduce a resource-controlled self-organizing map (RC-SOM) to evenly and regularly display phrases on grid cells while expecting phrases with similar semantics to stay close in the visualization. To evaluate our approach, we conducted case studies with three domain experts in diverse literature. The results and feedback demonstrate its effectiveness, usability, and intelligence.
AB - Many Information Retrieval (IR) approaches have been proposed to extract relevant information from a large corpus. Among these methods, phrase-based retrieval methods have been proven to capture more concrete and concise information than word-based and paragraph-based methods. However, due to the complex relationship among phrases and a lack of proper visual guidance, achieving user-driven interactive information-seeking and retrieval remains challenging. In this study, we present a visual analytic approach for users to seek information from an extensive collection of documents efficiently. The main component of our approach is a PhraseMap, where nodes and edges represent the extracted keyphrases and their relationships, respectively, from a large corpus. To build the PhraseMap, we extract keyphrases from each document and link the phrases according to word attention determined using modern language models, i.e., BERT. As can be imagined, the graph is complex due to the extensive volume of information and the massive amount of relationships. Therefore, we develop a navigation algorithm to facilitate information seeking. It includes (1) a question-answering (QA) model to identify phrases related to users' queries and (2) updating relevant phrases based on users' feedback. To better present the PhraseMap, we introduce a resource-controlled self-organizing map (RC-SOM) to evenly and regularly display phrases on grid cells while expecting phrases with similar semantics to stay close in the visualization. To evaluate our approach, we conducted case studies with three domain experts in diverse literature. The results and feedback demonstrate its effectiveness, usability, and intelligence.
KW - Machine learning
KW - natural language processing
KW - textual data
KW - user-in-the-loop
KW - visual analytics
UR - http://www.scopus.com/inward/record.url?scp=85144010215&partnerID=8YFLogxK
U2 - 10.1109/TVCG.2022.3225114
DO - 10.1109/TVCG.2022.3225114
M3 - Article
C2 - 36441879
AN - SCOPUS:85144010215
SN - 1077-2626
SP - 1
EP - 15
JO - IEEE Transactions on Visualization and Computer Graphics
JF - IEEE Transactions on Visualization and Computer Graphics
ER -