Order, please: taming the Web with semantics

As your official Molecular Biology and Genomics Special Interest Group blogger, I will report on the first session sponsored by this group. I attended presentations given by Kristi Holmes and Joan Bartlett on Sunday afternoon.

Connecting the dots: structured databases become interconnected in the semantic Web (click to enlarge)

So what exactly is the semantic Web? Kristi Holmes introduced us to the concept using a brilliant analogy with Star Wars in her first presentation, “An Introduction to the Semantic Web and Linked Open Data“. The semantic Web is all about leveraging the rich data available on the web, by connecting the data so it becomes more meaningful and accessible.

Technologically-speaking, we need computers to understand relationships between data, such as lineage. A subject (Anakin Skywalker) may exhibit a property (is the father of) towards another object (Luke Skywalker), and this relationship can be interpreted in the opposite direction as well. Languages such as XML and RDF provide the structure in which the computer will look for information (a.k.a. metadata). Another way to provide meaning to the data is to use specialized vocabularies, called ontologies, to describe objects and integrate logical relationships between the objects. A good example is the Gene Ontology used in NCBI databases.

Organizing the massive amount of data available implies its classification, and a five-star system determines data quality. There are naturally many types of data available from countless sources, from governmental reports to repositories to scientific workflows.

In her second presentation titled “Linked Open Data and Biomedical Research: A Survey of Current International Efforts“, Kristi summarized how research networking systems are needed to increase visibility and enhance utility, enabling us to find people, projects and resources. Collaborative tools allow the visualization of complex networks in a meaningful manner, so we can understand the big picture. Examples are ViVO, CTSAConnect and Open PHACTS. Some of you may also be familiar with GoPubMed, which uses Gene Ontology and MeSH to filter millions of abstracts from Medline in order to sort out results. Kristi invited the audience to find out what is happening on their campuses and how they can connect with research networks, as libraries are key players in outreach, education and training, technical support and relationships with vendors and providers.

Finally, Joan Bartlett has an ambitious project: the “Assessment of a User-Centered Ontology to Support the Selection of and Linking among Bioinformatics Resources“. Indeed, researchers have to choose between an overwhelming number of bioinformatics tools. Many are redundant, and there are no defined criteria available to assess these resources. Directories are useful, but propose little information that will help the user decide which tool is the most appropriate.

In order to make bioinformatics resources more accessible, Joan is developing a user-centered ontology that will take into account  the personal, environmental and resource factors that determine tool selection. The first part of her research focused on defining these factors, whereas the second step consisted in surveying scientists regarding the characteristics they rate as important in their selection of bioinformatics tools. Once the ontology is better defined, Joan intends to collaborate with existing bioinformatics directories to integrate her research findings so that in the end, users will find resources more effectively.