Menu

Ontotext

Linked Life Data sequel is here to dish out more value to life science research

June 30, 2011

Keeping the continuous quality improvement of the public Linked Life Data (LLD) service, the 0.7 was brought to the public today.

“We have spent a lot of time and resources in order to define more flexible and manageable update process methodology” - says Todor Primov of the LLD team – “With the current release of Linked Life Data public service, we have introduced the new data update process and have proved its readiness to provide quality of our Data as a Service (DaaS) based solution. The new LLD update routines will allow us to speed up the delivery of LLD regular updates and more flexibility in the integration of new data sources. Another important achievement for this Linked Life Data release is the significant improvement of the quality of the instance mappings and the inferred causal relations - we have significantly reduces the number of the false positive mappings and relations!”

A public RDF warehouse service to semantically integrate 27 diverse data sources, LLD currently congregates 1,028,154,006 entities, 4,650,877,794 explicit statements and 5,120,886,447 statements in total.


New functionality and improvements in the 0.7 release:

  • Update of all primary data sets, which have new releases.
  • Described a new data set - MetaCyc (as part of BioPax).
  • New instance mapping of MeSH codes in LHGDN data set to UMLS concepts.
  • New instance mapping of Freebase concepts to UMLS concepts.
  • Improved instance mappings between BioPax participants (protein/gene) and UniProt/EntrezGene entities. Now we distinguish unification relations (as exactMatch) from reference relations (as closeMatch) between the instances from the two data sets.
  • The entire set of PubMed articles were annotated with the new Semantic Biomedical Tagger v.1.1 (SBM 1.1). The annotation process generated more than half a billion high quality semantic annotations with concepts from more than 130 semantic types.
  • Improved generation (false positives removed) of causality relations between 14 different biomedical entities (Genes, Protein, Diseases, Symptoms, Drugs, Side Effects, Biological Processes, Molecular Functions, Cellular Localizations, Organisms, Organs, Cell Types, Cell Lines). Causality relations include relations of the following types:

 

Entity_Type_1

Causality_Relation_Type

Entity_Type_2

#_of_causality_relations

Disease

hasSymptom

UMLS concept

35505

Drug

hasSideEffect

UMLS concept

450082

Drug

hasTarget

Protein

15309

Drug

treat

Disease

8201

Gene

encode

Protein

6033749

Protein

expressedInOrganism

UMLS concept

22652496

Protein

expressedInAnatomicalSystem

UniProt keyword

275824

Protein

expressedInCellLine

UniProt keyword

16912

Protein

expressedInCellType

UniProt keyword

12728

Protein

binding

Protein

68150720

Protein

hasLocalization

GO term

5878896

Protein

hasMolecularFunction

GO term

13981106

Protein

participateInBiologicalProcess

GO term

7885200

ClinicalTrial

hasInclusionCriteria

UMLS concept

420797

ClinicalTrial

hasExclusionCriteria

UMLS concept

844390

   

TOTAL

126661915

 

  • Auto-complete index limited to UMLS concepts, LOD drugs and human proteins/genes
  • A new configuration of RelFinder tuned for exploring the causality relations in LLD
  • A set of 3 new examples to explore causal relations with RelFinder
  • Added 2 new example SPARQL queries combining causal relations and co-occurrence of concepts in unstructured texts.
  • Minor fixes of LLD data and end user interface.

 

The service is accessible at http://linkedlifedata.com
 

Tags: