WDS Projects

ACESO Biomedicine NLP Deep Learning

Evidence-based medicine (EBM) involves all available evidence, such as reports from randomized controlled trials (RCTs), to aid clinical medicine. In practice, available evidence can be obtained by manually searching and analyzing a large medical literature database such as PubMed. Medical literature often appears as unstructured text in scientific publications, so that clinicians need to read the publications line by line. At the same time, deluge of medical data has led clinicians to explore a lot of medical literature in order to master the latest medical research results, which is a burden for clinicians. Efficacy of EBM will be greatly improved if existing evidence can be automatically extracted from unstructured medical literature.

ACESO is a system that automatically generates a summary of evidence in the medical literature under the PICO framework.We adopt an active learning paradigm, which helps to minimize the cost of producing labeled data from systematic reviews, and to optimize the quality of summarization with limited labeled data. In ACESO, Deep learning is used to learn a distributed representation of each concept in UMLS (Unified Medical Language System), and we utilize the embedding of these medical concepts as a knowledge base in summarization.

Functions of ACESO: 1. Multi-person collaboration labeling training set. 2. Pick useful samples automatically using the active learning paradigm. 3. Generate a literature summary and visually present summarization.

Current developers: Ping Geng, Teng Zhang, Peng Gao, Qian Lu

EnProfiler Knowledge Graph Deep Learning

Knowledge Graphs (KGs) are graph-structured knowledge bases storing factual information about real-world entities. Comprehending the uniqueness of each entity is crucial to the analyzing, sharing and reusing of KGs Traditional profiling technologies encompass a vast array of methods to find distinctive features in various applications, such as characterizing commercial users, comparing gene expression or summarizing datasets. It can also help to differentiate entities in the process of humanunderstanding of KGs. In this work, we present a novel profiling approach to identify distinctive entity features in KG. Distinctiveness of features are carefully selected and measured by a HAS model, which is a scalable representation learning model to produce a multi-pattern entity embedding from graph structures.By using the model, we can generate the results of entity profiling.

This platform "EnProfiler" shows entity profiling of different datasets，like dbpedia,drugbank and so on.

"EnProfiler" is focused on the following areas: 1. Entity Profiling: Search entities and generate the profiling; 2. Label Sets: generate label sets of different types in datasets; 3. Evaluation: Provide labels for judges and evaluate quality of labels .

Current Members: Qingqing Yang, Jinru Ding, Yudong Yang

N-ary Relation Miner Knowledge Graph NLP

Knowledge graphs are typically represented as a set of binary relations between two entities or one entity and a value. However, many facts about the world involve more than two entities. A convenient way to represent certain facts is to use special relations to link multiple entities. These relations are called n-ary relations. The awareness and understanding of these n-ary relations will be helpful to the analysis, utilization of human knowledge in a higher order. To address this issue, many researches have been carried on finding n-ary language patterns in unstructured text. However, we observed that n-ary relations can also be identified in graph-structured data, such as knowledge graphs. In this project, we analysis the structure of n-ary relations, and we present a framework to discover these relations based on a multi-label frequent tree mining algorithm on knowledge graphs. At last,we evaluate the framework on real-world knowledge graphs, and discuss n-ary relations vs. correlated binary relations.

Current Members: Binchu Liu, ChenHui Lv, Miao Yang

Lawterm NLP Deep Learning

This study intends to propose a method to identify hypernym-hyponym relations in the field of traffic in law using tag embeddings. First, we use the neural network model to learn tag embedding, which is not only dependent on the hypernym and hyponym tags, but also dependent on the context information and its occurrence frequency. We then apply such embeddings as features to identify hypernym-hyponym relations using SVM.

Current Members: Ziyue Wang, Peng Gao

Gene Language NLP bioinformatics

link: deepgene
Current Members: Mingjie Yang, Yi Zhuang