There are a few ways to federate different data sources with GraphDB and Ontotext Platform. Two of them achieve the same goal. SPARQL repositories can be federated with either standard SPARQL tools or by using FedX. This leads to the logical question: what’s the tradeoff between the two of them.
Standard SPARQL federation is a manual tool. It is controlled with the SERVICE keyword. With it, you decide to execute a part of the query against a remote repository. You have to know the address of the remote repository and its password and username, if it is secured. You have to write the query manually and to know the model of the remote repository. The SERVICE keyword is treated as a subquery.
There are three ways to achieve standard SPARQL federation with GraphDB:
It is important to order your query properly. Remember that in SPARQL subqueries are executed first. Imagine a case where your query has two parts – one part returns 6 results, the other part returns 600. The two parts are connected. If you get the 600 results first and then narrow them down with the 6, you will be performing much more work than the inverse case. Therefore, you want the query that returns 6 results to execute first. Since subqueries are executed before the main query, this means you want to run the overall query on the repository with the 600 results, if possible.
FedX federation takes all the manual work away. You define a federated repository and say to which remote repositories you want to connect. You can provide it with credentials. The remote repository can be any SPARQL-enabled repository. This includes other federated repositories. You just write your query as you usually would, without worrying about the way in which data is partitioned across the repositories or about query structure.
The drawback of this ease of access is that FedX has to perform all the work that you usually do manually. There’s logic to decide which part of the query should be sent to which repository, to structure it properly to avoid the query ordering issues, then to transfer the data and join it. That could be a lot of work, and FedX might get it wrong, just like you sometimes write an unoptimized query. That’s why you can expect it to have worse performance than the manual execution of the same query.
To sum it up, if you have low-level access to the remote repositories and want to optimize your queries, you would usually prefer standard SPARQL federation with the SERVICE keyword. If you prefer ease of access and have a lot of users who are not familiar with the way in which the data is modelled, FedX is the better choice.
Did this help you solve your issue? Your opinion is important not only to us but also to your peers.