Menu

Ontotext

Geo-spatial indexing in OWLIM

OWLIM-SE allows applications to efficiently make queries involving constraints such as 'nearby point' and 'within region'. Special-purpose indices allow such constraints to be evaluated very efficiently on top of large volumes of location-related data. For example, finding airports within 50 miles of London in the GeoNames dataset (92 million statements, describing more than 6 million geographic features all over the world) becomes 500 times faster when compared to the same query evaluated without the geo-spatial indices.

Geo-spatial geometry

Geo-spatial information concerns the geometry of points, shapes and distances relative to the surface of the Earth (or any spherical object). The most common way to locate a position is to specify two angles (usually in degrees) measured at the center of the Earth. The first angle (latitude) is the angle subtended between the position and the closest point on the equator. The second angle (longitude) is the rotation about the Earth's axis necessary to bring the position to some reference, usually the Prime Meridian. When using OWLIM-SE all angles are in decimal degrees with the latitude ranging from -90 to +90 degrees and the longitude ranging from -180 to +180 degrees.

Latitude/Longitude example

A great deal of location information is specified this way, for example, airports have a reference point given by latitude, longitude and altitude; political boundaries can be specified by polygons where each vertex is a 2-Dimensional latitude/longitude pair.

Geo-spatial RDF data

One of the datasets included in FactForge is GeoNames, which is a well-known, publicly available dataset that contains over 10 million geographical names from all over the whole World. It makes an important reference point in the Linked Open Data Cloud due to its global coverage and detailed classification of features. The location information uses the World Geodetic System 1984 ontology which gives schema definitions for latitude, longitude and altitude.

The position information in GeoNames is provided using standard RDF and can be queried in the normal way using SPARQL. However, without special extensions, using this data with geospatial constraints (such as searching for points between certain coordinates or within a certain distance from a reference point) can be computationally expensive and therefore extremely slow. For this reason, OWLIM-SE includes special support for 2-Dimensional geo-spatial data.

OWLIM geo-spatial extensions

OWLIM-SE provides special query constructions and extension functions for efficiently processing certain geo-spatial constraints. The special query forms make use of the RDF list syntax in order to express constraints reminiscent of a function call, e.g.omgeo:nearby (lat long distance)

However, the increase in performance comes from using a special index for the latitude/longitude position data that allows fast look ups and testing for being inside/outside circles and polygons. Consider some geo-spatial data that is structured according to the WGS84 ontology:

example:c geo:lat 52.2345
example:c geo:long 0.1582
example:l geo:lat 51.8201
example:l geo:long -0.4551
example:r geo:lat 51.4969
example:r geo:long -0.9934
... ... ...

If a query needs to select some points that lie within certain upper and lower limits of latitude and longitude, e.g. a box enclosing the three points shown in the above table, then one (very inefficient) approach would be as follows:

PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT ?link
WHERE {
  ?link geo-pos:lat ?lat .
  ?link geo-pos:long ?long .
  FILTER( ?lat > 51 && ?lat < 52.5 && ?long > -1 && ?long < 0.5 )
}

which forces OWLIM-SE to examine every entity in the repository that has latitude and longitude properties and evaluate the conjunction of comparisons in the filter expression. Needless to say, for large datasets this will take a long time.

By using the special geo-spatial constraints, OWLIM-SE can re-write queries to make use of an index specifically for geo-spatial data. A far more efficient version of the above query is as follows:

PREFIX omgeo: <http://www.ontotext.com/owlim/geo#>
SELECT ?link
WHERE {
  ?link omgeo:within( 51 -1 52.5 0.5 ) .
}

With this query, OWLIM-SE uses the geo-spatial index to find 2-Dimensional points within the bounding box (lat1, long1, lat2, long2). This version of the query is typically 500 times faster than the naive example given above. You can try this query on FactForge with this link.

Finding points within a circle

The statement pattern ?point omgeo:nearby(?lat ?long ?distance) is used to find points that lie within the circle specified by latitude, longitude and radial distance. The distance can have units of either miles (mi suffix) or kilometers (km suffix). If neither unit is specified then kilometers are assumed. This statement pattern will evaluate to true if the following constraints hold:

  • ?point geo:lat ?plat .
  • ?point geo:long ?plong .
  • Shortest great circle distance from ?plat, ?plong to ?lat, ?long <= ?distance

When evaluating the query, ?lat ?long ?distance must all be either constants or bound from previous statement patterns. The following query finds all airports within 50 miles of London, UK:

PREFIX geo-pos:<http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX omgeo:<http://www.ontotext.com/owlim/geo#>
PREFIX dbpedia:<http://dbpedia.org/resource/>
PREFIX dbp-ont:<http://dbpedia.org/ontology/>
PREFIX ff:<http://factforge.net/>
PREFIX om:<http://www.ontotext.com/owlim/> 

SELECT distinct ?airport ?label ?RR
WHERE {
        dbpedia:London geo-pos:lat ?latBase ;
                       geo-pos:long ?longBase .
        ?airport omgeo:nearby(?latBase ?longBase "50mi");
                 a dbp-ont:Airport ;
                 ff:preferredLabel ?label ;
                 om:hasRDFRank ?RR .
} ORDER BY DESC(?RR)

You can run this query against FactForge using this link.

Finding points within a 'rectangle'

The statement pattern ?point omgeo:within(?lat1 ?long1 ?lat2 ?long2) is used to find points bounded by the 'rectangle' specified by the lower-left corner (?lat1, ?long1) and the upper-right corner (?lat2, ?long2). The edges of the rectangle are lines of latitude and longitude, which appear as straight lines on a Mercator projection. Proper consideration is given for rectangles that span the +/-180 degree meridian. For example, the following query can be used to find tunnels in a rectangle roughly corresponding to Tirol, Austria (note the use of the GeoNames feature classification ontology):

PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX geo-ont: <http://www.geonames.org/ontology#>
PREFIX omgeo: <http://www.ontotext.com/owlim/geo#>
SELECT ?feature ?lat ?long
WHERE {
  ?link omgeo:within(45.85 9.15 48.61 13.18) .
  ?link geo-ont:featureCode geo-ont:R.TNL .
  ?link geo-ont:name ?feature .
  ?link geo-pos:lat ?lat .
  ?link geo-pos:long ?long .
}

You can run this query against FactForge using this link.

Finding points within a polygon

The statement pattern ?point omgeo:within(?lat1 ?long1 ... ?latn ?longn) is used to find points bounded by a polygon specified by a sequence of three or more latitude, longitude pairs. The polygon is closed automatically if the first and last vertices do not coincide. The vertices must be constants or bound values. The following query can be used to find the caves in the sides of cliffs lying within a polygon approximating the shape of England:

 

PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX geo-ont: <http://www.geonames.org/ontology#>
PREFIX omgeo: <http://www.ontotext.com/owlim/geo#>
SELECT ?feature ?lat ?long
WHERE {
  ?link omgeo:within( "51.45" "-2.59" 
                      "54.99" "-3.06"
                      "55.81" "-2.03"
                      "52.74"  "1.68"
                      "51.17"  "1.41" ) .
  ?link geo-ont:featureCode geo-ont:S.CAVE .
  ?link geo-ont:name ?feature .
  ?link geo-pos:lat ?lat .
  ?link geo-pos:long ?long .
}

 

You can run this query against FactForge using this link.

Computing distance

OWLIM-SE includes a SPARQL extension function for computing the distance between two points in kilometers that can be used in FILTER and ORDER BY clauses:

double omgeo:distance(?lat1, ?long1, ?lat2, ?long2)

For example, the following query finds all the airports within 80 miles of Bournemouth and filters out those that are more than 80 kilometers from Brize Norton. The results are ordered with the closest to Brize Norton first:

PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX geo-ont: <http://www.geonames.org/ontology#>
PREFIX omgeo: <http://www.ontotext.com/owlim/geo#>

SELECT distinct ?airport_name
WHERE {
  ?a1 geo-ont:name "Bournemouth" .
  ?a1 geo-pos:lat ?lat1 .
  ?a1 geo-pos:long ?long1 .
  ?airport omgeo:nearby(?lat1 ?long1 "80mi" ) .
  ?airport geo-ont:name ?airport_name .
  ?airport geo-ont:featureCode geo-ont:S.AIRP .
  ?airport geo-pos:lat ?lat2 .
  ?airport geo-pos:long ?long2 .
  ?a2 geo-ont:name "Brize Norton" .
  ?a2 geo-pos:lat ?lat3 .
  ?a2 geo-pos:long ?long3 .
  FILTER( omgeo:distance(?lat2, ?long2, ?lat3, ?long3) < 80)
}

You can run this query against FactForge using this link