This page provides statistics about the loading and inference of the datasets of the Linked Data Semantic Repository (LDSR).
Loading of multiple datasets into a single repository and materialization of all facts which could be logically inferred from them is the core of LDSR. The table below shows the statistics from loading of the datasets and materialization of the implicit facts in LDSR. The first column lists the datasets in the order in which they are loaded into the repository.
| Dataset | Named Graph | Indexed Explicit Triples ('000) |
Indexed Inferred Triples ('000) |
All Indexed Triples ('000) |
Entities ('000 of nodes in the graph) |
Implicit/explicit ratio |
| Schemata and ontologies | 11 | 7 | 18 | 6 | 0.6 | |
| DBpedia (SKOS categories) | http://dbpedia.org/ | 2,877 | 42,587 | 45,464 | 1,144 | 14.8 |
| DBpedia (owl:sameAs) | http://dbpedia.org | 5,544 | 566 | 6,110 | 8,464 | 0.1 |
| UMBEL | http://umbel.org/umbel# | 5,162 | 42,212 | 47,374 | 500 | 8.2 |
| lingvoj | http://lingvoj.org | 20 | 863 | 883 | 18 | 43.8 |
| CIA Factbook | http://www4.wiwiss.fu-berlin.de/factbook | 76 | 4 | 80 | 25 | 0.1 |
| Wordnet | http://wordnet.princeton.edu/ | 2,281 | 9,296 | 11,577 | 830 | 4.1 |
| Geonames | http://www.geonames.org/ | 91,908 | 125,025 | 216,933 | 33,382 | 1.4 |
| DBpedia core | http://dbpedia.org/ | 560,096 | 198,043 | 758,139 | 127,931 | 0.4 |
| Freebase | http://freebase.com | 463,689 | 40,840 | 504,529 | 94,810 | 0.1 |
| MusicBrainz | http://musicbrainz.org | 45,536 | 421,093 | 466,630 | 15,595 | 9.2 |
The statistics of the indices of LDSR after loading and materialization looks as follows:
| Total number after loading | Value (millions) |
| Indexed explicit statements | 1,177 |
| Indexed inferred statements | 881 |
| Indexed statements (explicit + inferred) | 2,058 |
| Entities (nodes in the RDF graph) | 283 |
Although BigOWLIM performs complete forward-chaining, not all inferrable triples are stored in its indices
for the sake of better performance and space economy. Such example is the sameAs-optimization which
allows BigOWLIM not to derive multiple "replicas" of one statement when one or more of its
elements has owl:sameAs equivalents. Further, there are additional statements
which can be retrieved from the repository; those are result of postprocessing and serve
for the sake of better presentation of the data in LDSR. Here follows a summary:
| Number of statements after post-processing | Value |
| Added after post-processing (preferred labels and ranks of the nodes) | 179,812,809 |
| Indexed | 2,237,550,383 |
| "Compressed" through sameAs-optimization | 7,760,929,834 |
| Different retrievable statements (by pattern <?s, ?p, ?o, ?g>) | 9,818,667,408 |
The total number of the entities in the RDF graph, after post processing is 404,796,665.