Knowledge and user-generated content is proliferating on the web in scientific publications, information portals and online social media. This knowledge explosion has continued to outpace technological innovation in efficient information access technologies. In this paper, we describe methods and technologies for “Conversational Search” as an innovative solution to facilitate easier information access and reduce the information overload for users.
Conversational Search is an interactive and collaborative information finding interaction. The participants in this interaction engage in social conversations aided with an intelligent information agent (Cobot) that provides contextually relevant search recommendations. The collaborative and conversational search activity helps users make faster and more informed search and discovery. It also helps the agent learn about conversations with interactions and social feedback to make better recommendations. Conversational search leverages the social discovery process by integrating web information retrieval along with the social interactions.
Read the paper:
Collaborative Information Access: A Conversational Search Approach
by Saurav Sahay, Anu Venkatesh, Ashwin Ram
ICCBR-09 Workshop on Reasoning from Experiences on the Web (WebCBR-09), Seattle, July 2009
Effective encoding of information is one of the keys to qualitative problem solving. Our aim is to explore Knowledge Representation techniques that capture meaningful word associations occurring in documents. We have developed iReMedI, a TCBR-based problem solving system as a prototype to demonstrate our idea. For representation we have used a combination of NLP and graph based techniques which we call as Shallow Syntactic Triples, Dependency Parses and Semantic Word Chains. To test their effectiveness we have developed retrieval techniques based on PageRank, Shortest Distance and Spreading Activation methods. The various algorithms discussed in the paper and the comparative analysis of their results provides us with useful insight for creating an effective problem solving and reasoning system.
Read the paper:
iReMedI – Intelligent Retrieval from Medical Information
by Saurav Sahay, Bharat Ravisekar, Anu Venkatesh, Sundaresan Venkatasubramanian, Priyanka Prabhu, Ashwin Ram
9th European Conference on Case-Based Reasoning (ECCBR-08), Trier, Germany
To realize the vision of a Semantic Web for Life Sciences, discovering relations between resources is essential. It is very difficult to automatically extract relations from Web pages expressed in natural language formats. On the other hand, because of the explosive growth of information, it is difficult to manually extract the relations. In this paper we present techniques to automatically discover relations between biomedical resources from the Web. For this purpose we retrieve relevant information from Web Search engines and Pubmed database using various lexico-syntactic patterns as queries over SOAP web services. The patterns are initially handcrafted but can be progressively learnt. The extracted relations can be used to construct and augment ontologies and knowledge bases. Experiments are presented for general biomedical relation discovery and domain specific search to show the usefulness of our technique.
Read the paper:
Discovering Semantic Biomedical Relations utilizing the Web
by Saurav Sahay, Sougata Mukherjea, Eugene Agichtein, Ernie Garcia, Sham Navathe, Ashwin Ram
ACM Transactions on Knowledge Discovery from Data, 2(1):3, 2008
We describe our vision for a new generation medical knowledge annotation and acquisition system called SENTIENT-MD (Semantic Annotation and Inference for Medical Knowledge Discovery). Key aspects of our vision include deep Natural Language Processing techniques to abstract the text into a more semantically meaningful representation guided by domain ontology. In particular, we introduce a notion of semantic fitness to model an optimal level of abstract representation for a text fragment given a domain ontology. We apply this notion to appropriately condense and merge nodes in semantically annotated syntactic parse trees. These transformed semantically annotated trees are more amenable to analysis and inference for abstract knowledge discovery, such as for automatically inferring general medical rules for enhancing an expert system for nuclear cardiology. This work is a part of a long term research effort on continuously mining medical literature for automatic clinical decision support.
Read the paper:
Semantic Annotation and Inference for Medical Knowledge Discovery
by Saurav Sahay, Eugene Agichtein, Baoli Li, Ernie Garcia, Ashwin Ram
NSF Symposium on Next Generation of Data Mining (NGDM-07), Baltimore, MD, October 2007
NLM’s Unified Medical Language System (UMLS) is a very large ontology of biomedical and health data. In order to be used effectively for knowledge processing, it needs to be customized to a specific domain. In this paper, we present techniques to automatically discover domain-specific concepts, discover relationships between these concepts, build a context map from these relationships, link these domain concepts with the best-matching concept identifiers in UMLS using our context map and UMLS concept trees, and finally assign categories to the discovered relationships. This specific domain ontology of terms and relationships using evidential information can serve as a basis for applications in analysis, reasoning and discovery of new relationships. We have automatically built an ontology for the Nuclear Cardiology domain as a testbed for our techniques.
Read the paper:
Domain Ontology Construction from Biomedical Text
by Saurav Sahay, Baoli Li, Ernie Garcia, Eugene Agichtein, Ashwin Ram
International Conference on Artificial Intelligence (ICAI-07), Las Vegas, NV, June 2007
We propose a semi-supervised method to extract rule sentences from medical abstracts. Medical rules are sentences that give interesting and non-trivial relationship between medical entities. Mining such medical rules is important since the rules thus extracted can be used as inputs to an expert system or in many more other ways. The technique we suggest is based on paraphrasing a set of seed sentences and populating a pattern dictionary of paraphrases of rules. We match the patterns against the new abstract and rank the sentences.
Read the paper:
Detecting Medical Rule Sentences with Semi-Automatically Derived Patterns: A Pilot Study
by Shreekanth Karvaje, Bharat Ravisekar, Baoli Li, Ernie Garcia, Ashwin Ram
International Symposium on Bioinformatics Research and Applications ( ISBRA-07), Atlanta, GA, May 2007
Partitioning closely related genes into clusters has become an important element of practically all statistical analyses of microarray data. A number of computer algorithms have been developed for this task. Although these algorithms have demonstrated their usefulness for gene clustering, some basic problems remain. This paper describes our work on extracting functional keywords from MEDLINE for a set of genes that are isolated for further study from microarray experiments based on their differential expression patterns. The sharing of functional keywords among genes is used as a basis for clustering in a new approach called BEA-PARTITION. Functional keywords associated with genes were extracted from MEDLINE abstracts. We modified the Bond Energy Algorithm (BEA), which is widely accepted in psychology and database design but is virtually unknown in bioinformatics, to cluster genes by functional keyword associations.
The results showed that BEA-PARTITION and hierarchical clustering algorithm outperformed k-means clustering and self-organizing map by correctly assigning 25 of 26 genes in a test set of four known gene groups. To evaluate the effectiveness of BEA-PARTITION for clustering genes identified by microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle and have been widely studied in the literature were used as a second test set. Using established measures of cluster quality, the results produced by BEA-PARTITION had higher purity, lower entropy, and higher mutual information than those produced by k-means and self-organizing map. Whereas BEA-PARTITION and the hierarchical clustering produced similar quality of clusters, BEA-PARTITION provides clear cluster boundaries compared to the hierarchical clustering. BEA-PARTITION is simple to implement and provides a powerful approach to clustering genes or to any clustering problem where starting matrices are available from experimental observations.
Text Mining Biomedical Literature for Discovering Gene-to-Gene Relationships
by Ying Liu, Sham Navathe, Jorge Civera, Venu Dasigi, Ashwin Ram, Brian Ciliax, Ray Dingledine
IEEE/ACM Transactions on Computational Biology and Bioinformatics,2(4):380-384, Oct-Dec 2005
To facilitate the interpretation of large data sets generated by DNA microarray studies, we are 1) developing a text mining system to extract keywords from MEDLINE abstracts associated with individual gene names and 2) investigating several clustering algorithms to determine relationships between genes based on shared keywords. The basic mechanisms of our keyword extraction algorithm was described previously (Soc Neurosci Abstr 2001, 557.4). Recent progress in evaluating the performance of this algorithm through Precision-Recall calculations and in using extracted keywords to accurately cluster predefined groups of genes are reported here.
Evaluating Text-Mining Strategies for Interpreting DNA Microarray Expression Profiles
by Brian Ciliax, Ying Liu, Jorge Civera, Ashwin Ram, Sham Navathe, Ray Dingledine
Annual Meeting of the Society for Neuroscience (Soc Neurosci Abstr), Orlando, FL, September 2002