Provide consistent unified access to data across different systems by using the flexible and semantically precise structure of the knowledge graph model
Implement a Connected Inventory of enterprise data assets, based on a knowledge graph, to get business insights about the current status and trends, risk and opportunities, based on a holistic interrelated view of all enterprise assets.
Quick and easy discovery in clinical trials, medical coding of patients’ records, advanced drug safety analytics, knowledge graph powered drug discovery, regulatory intelligence and many more
Make better sense of enterprise data and assets for competitive investment market intelligence, efficient connected inventory management, enhanced regulatory compliance and more
Improve engagement, discoverability and personalized recommendations for Financial and Business Media, Market Intelligence and Investment Information Agencies,Science, Technology and Medicine Publishers, etc.
Connect and improve the insights from your customer, product, delivery, and location data. Gain a deeper understanding of the relationships between products and your consumers’ intent.
GraphDB Users Ask: Where Can We Deploy GraphDB And What Are Some Best Practices?
TESTED ON: GraphDB 10.2
June 22, 20234 mins. readGraphDB Q&As
ONTOTEXT ANSWER:
Oracle famously claims that 3 billion devices run Java. This means that there are 3 billion devices that can run GraphDB. This even includes every single Blu-ray player. You probably wound’t want to deploy your database on Blu-ray, but you can.
This great amount of flexibility means that you can deploy on any cloud or VM and any OS. Most of our users start out with testing on their own workstations. Working from your own machine is fine even for large datasets and complex computations – you can run a repository with hundreds of millions of statements on a workstation. Once you are ready to move to a more substantial deployment, the key question is what capabilities you need.
Let’s assume you want to run the database in a high-availability setup. In order to achieve this, we offer GraphDB enterprise with a cluster. The cluster has three key features to keep in mind:
Data replication – all your data is present on all GraphDB instances in the cluster. If one instance fails, all the others are perfectly capable of maintaining operation alone.
Leader elections – in the cluster, the leader is responsible for internal load balancing of the requests and for “testing” all updates before approving them. Elections are automated and internal.
From these factors, you can establish the following best practices:
You want all your workers to have the same hardware parameters since they will all do the same work on the same data.
For optimal reliability, you want all workers on separate instances, in separate availability zones.
The cluster operates best on an odd number of instances, to prevent election “deadlocks”.
We ship GraphDB with an optional external proxy which maintains information on the status of the cluster and always directs requests to the leader for less HTTP overheads. For extra resilience, you can deploy the external proxy in a cluster of 2 or more instances and load balance requests toward it.
The GraphDB cluster operates on two ports – the HTTP port and gRPC port. Internal traffic should be allowed on both. External traffic could be limited to the HTTP port only.
Regardless of your deployment model, some general advice is:
Have easy access to your logs. This can be done with tools such as datadog, prometheus, etc.
Do not panic! If there’s a node down, don’t start destroying and restarting the cluster, only take drastic measures if it’s inoperable and not healing automatically.
When deploying with SSO, a JWT decoder and the browser’s inspector tool go a long way.
When deploying on cloud infrastructure, there are some additional considerations to make:
You can use your cloud’s load-balancing capabilities instead of the external proxy. This can make your setup easier while sacrificing a bit of performance.
With the external proxy, the requests would be routed like this: Internet → Proxy → Leader → Overall cluster.
With the load balancer, the requests would be routed like this: Internet → Load balancer → Cluster → Leader → Overall cluster. That’s one extra hop if the load balancer does not redirect toward the current leader by chance.
You can also use the cloud provider’s load balancer for SSL termination. GraphDB can serve requests over HTTPS, but it may be easier to skip the configuration.
You could deploy GraphDB instances in different regions. In such a case, keep in mind that latency and bandwidth can become a big issue. There’s a lot of internal communication.
IOPS are often a problem with cloud providers and hard to analyze.
These steps are answers to the most common issues that arise when operating your cluster. However, there are many variables that make each deployment unique. Get in touch about your use case and we can ensure you are making the most of your GraphDB!
Ontotext answers questions from our GraphDB users. You can also check out the frequently asked questions on general topics about GraphDB. Or you can get quick answers on technical questions from the community as well as Ontotext experts using the graphdb tag on stack overflow.