This page provides an overview of the datasets included in the
Linked Data Semantic Repository (LDSR).
Several ontologies and schemata, used in the datasets, are also
included in LDSR and ensure proper interpretation of the semantics of the data. Statistics
about the size of the datasets and the amount of the inferred triples are presented
on the size page.
LDSR contains the following heterogeneous datasets:
- DBPedia is an RDF dataset derived from Wikipedia. It is designed and developed to provide coverage as full as possible of the factual knowledge that can be extracted from Wikipedia with a high level of precision. It serves as a hub for the LOD project.
- Geonames is a geographic database that covers 6 million of the most significant geographical features on Earth (e.g. countries, populated places, mountains, rivers, and bridges), characterised by coordinates and relations to other features (e.g.
parent
feature in which the feature is nested).
- UMBEL is a lightweight ontology structure. It is essentially a hierarchy of about 20,000 classes, derived from OpenCyc, the world's largest and most complete general knowledge base and commonsense reasoning engine and mapped to DBPedia. The classes range from general philosophical notions like TangibleThing to very specific classes like AbaCloth.
- Wordnet is a lexical knowledge base that covers about 150,000 English words. Wordnet defines the meanings of English words by grouping them into sets of synonyms, called synsets. Each synset expresses a distinct concept. The words linked to a given synset are synonyms with respect to the meaning of the lexical concept represented by this synset. A word can have multiple meanings, i.e. it can be associated with multiple synsets. The more general terms are associated with less general terms through hyponym-hypernym relations. The W3C's Wordnet RDF/OWL representation is used.
- CIA World Factbook represents a collection of structured data, including statistical, geographic, political, and other information about all countries;
- Lingvoj provides descriptions of the most popular human languages. Currently it contains information about more than 500 languages.
The connectivity in LDSR is ensured mostly by DBPedia, which provides links to GeoNames, lingvoj, and Wordnet, and by UMBEL which is linked to DBPedia.