This page provides statistics about the loading and inference of the datasets of the FactForge.
Loading of multiple datasets into a single repository and materialization of all facts which could be logically inferred from them is the core of FactForge. The table below shows the statistics from loading of the datasets and materialization of the implicit facts in FactForge. The first column lists the datasets in the order in which they are loaded into the repository.
| Dataset | Named Graph | Indexed Explicit Triples ('000) |
Indexed Inferred Triples ('000) |
All Indexed Triples ('000) |
Entities ('000 graph nodes) |
Implicit/ explicit ratio |
| Schemata and ontologies | 11 | 7 | 18 | 6 | 0.6 | |
| DBpedia (SKOS categories) | http://dbpedia.org/ | 2,877 | 42,587 | 45,464 | 1,144 | 14.8 |
| DBpedia (owl:sameAs) | http://dbpedia.org | 5,544 | 566 | 6,110 | 8,464 | 0.1 |
| UMBEL | http://umbel.org/umbel# | 5,162 | 42,212 | 47,374 | 500 | 8.2 |
| lingvoj | http://lingvoj.org | 20 | 863 | 883 | 18 | 43.8 |
| CIA Factbook | http://www4.wiwiss.fu-berlin.de/factbook | 76 | 4 | 80 | 25 | 0.1 |
| WordNet | http://wordnet.princeton.edu/ | 2,281 | 9,296 | 11,577 | 830 | 4.1 |
| Geonames | http://www.geonames.org/ | 91,908 | 125,025 | 216,933 | 33,382 | 1.4 |
| DBpedia core | http://dbpedia.org/ | 560,096 | 198,043 | 758,139 | 127,931 | 0.4 |
| Freebase | http://freebase.com | 463,689 | 40,840 | 504,529 | 94,810 | 0.1 |
| MusicBrainz | http://musicbrainz.org | 45,536 | 421,093 | 466,630 | 15,595 | 9.2 |
The statistics of the indices of FactForge after loading and materialization looks as follows:
| Total number after loading | Value (millions) |
| Indexed explicit statements | 1,177 |
| Indexed inferred statements | 881 |
| Indexed statements (explicit + inferred) | 2,058 |
| Entities (nodes in the RDF graph) | 283 |
Although BigOWLIM performs complete forward-chaining, not all inferable triples are stored in its indices for the sake of better performance and space economy. Such example is the sameAs-optimization which allows BigOWLIM not to derive multiple "replicas" of one statement when one or more of its elements has owl:sameAs equivalents. Further, there are additional statements which can be retrieved from the repository; those are result of postprocessing and serve for the sake of better presentation of the data in FactForge. Here follows a summary:
| Number of statements after post-processing | Value |
| Added after post-processing (preferred labels and ranks of the nodes) | 179,812,809 |
| Indexed | 2,237,550,383 |
| "Compressed" through sameAs-optimization | 7,760,929,834 |
| Different retrievable statements (by pattern <?s, ?p, ?o, ?g>) | 9,818,667,408 |
The total number of the entities in the RDF graph, after post processing is 404,796,665.