Linked Data Semantic Repository: Statistics

This page provides statistics about the loading and inference of the datasets of the Linked Data Semantic Repository (LDSR).

Loading of multiple datasets into a single repository and materialization of all facts which could be logically inferred from them is the core of LDSR. The table below shows the statistics from loading of the datasets and materialization of the implicit facts in LDSR. The first column lists the datasets in the order in which they are loaded into the repository.

Dataset Named Graph Indexed Explicit
Triples
('000)
Indexed Inferred
Triples
('000)
All Indexed
Triples
('000)
Entities
('000 of nodes in the graph)
Implicit/explicit
ratio
Schemata and ontologies 17 15 32 7 0.9
DBpedia (SKOS categories) http://dbpedia.org/ 2,436 31,416 33,852 1,031 12.9
DBpedia (owl:sameAs) http://dbpedia.org 4,490 0 4,490 7,389 0.0
UMBEL http://umbel.org/umbel# 3,324 40,699 44,023 1,240 12.2
lingvoj http://lingvoj.org 20 855 874 18 43.4
CIA Factbookhttp://www4.wiwiss.fu-berlin.de/factbook 76 4 80 25 0.1
Wordnet http://wordnet.princeton.edu/ 1,943 9,296 11,239 842 4.8
Geonames http://www.geonames.org/ 77,382 130,504 207,886 33,383 1.7
DBpedia core http://dbpedia.org/ 438,127 127,555 565,682 89,067 0.3
Freebase http://freebase.com 414,654 35,836 450,490 109,689 0.1
MusicBrainz http://musicbrainz.org 45,492 263,044 308,536 15,562 5.8


The statistics of the indices of LDSR after loading and materialization looks as follows:

Total number after loadingValue (millions)
Indexed explicit statements 988
Indexed inferred statements 639
Indexed statements (explicit + inferred)1,627
Entities (nodes in the RDF graph) 258

Although BigOWLIM performs complete forward-chaining, not all inferrable triples are stored in its indices for the sake of better performance and space economy. Such example is the sameAs-optimization which allows BigOWLIM not to derive multiple "replicas" of one statement when one or more of its elements has owl:sameAs equivalents. Further, there are additional statements which can be retrieved from the repository; those are result of postprocessing and serve for the sake of better presentation of the data in LDSR. Here follows a summary:

Number of statements after post-processing Value
Added after post-processing (preferred labels and ranks of the nodes) 155,356,362
Indexed 1,782,541,506
"Compressed" through sameAs-optimization 2,243,716,193
Different retrievable statements (by pattern &st;?s, ?p, ?o, ?ng>) 4,026,257,699

The total number of the entities in the RDF graph, after post processing is 354,635,159.


Copyright © 2008-2010 Ontotext AD