In connection with Linked Open Data, the keyword Knowledge Graph is often mentioned. It is often not entirely clear what is meant by the term. A Knowledge Graph is, in short and generalized terms, a graph database that fulfills certain criteria (see infobox).
The definition of a Knowledge Graph is not clearly defined. Nevertheless, characteristics can be described that are aimed at the word “knowledge”. This allows us to specify more precisely how this knowledge can be extracted from a graph database, giving legitimacy to the word combination knowledge and graph.
The technology on which a Knowledge Graph is built is a graph database, which is why the data is laid out in a network structure. This makes the respective data model and thus the relationship of the data to each other intuitively comprehensible for both machines and humans. Real entities – these are uniquely determined objects that can be tangible (a restaurant) or intangible (an assessment) – and their relationships are described. A Knowledge Graph thus reflects the complexity of the world in the digital space.
A Knowledge Graph is described semantically. This means that meaning is ascribed to data via an ontology (like schema.org). This allows machines to understand what specific information is meant in each case, making the meaning of the data immediately clear.
A Knowledge Graph is smart because the labeling of the data according to an ontology in combination with the data created in the graph enables the derivation of new (implicit) information. Path queries can be used to draw connections to other data and their relationship can also be traced automatically.
Finally, a Knowledge Graph is alive in the sense that the ontology and also the relationship of the data to each other can be flexibly adapted and extended. Data can be dynamically updated and/or corrected.
A good starting point to explain what a graph database is, is to compare it with a relational database. Relational databases store information about trails, hotels, events, POIs or other tourism-related data in tables with rows and columns. The type of data storage in relational databases is thus comparable to a subway timetable with the departure times of the respective stops.
Orientation is limited to the tabular timetable and the tram network is difficult to identify even if all stop maps were available. The route network and interchanges are much easier to grasp via a visualization in network structure.
Similarly, the distinction between relational databases and graph databases is similar: a graph specializes in networking data. A relational database can also do this in principle, but queries across multiple tables require much more effort and are sometimes only possible via complicated detours. In a complex world, however, the relationship between data is increasingly important, which is why relational databases can reach their limits.
Relational databases vs. graph databases
The properties of the lines defined in the tables of relational databases (for instance altitude, degree of difficulty, etc. for hiking trails) can only be changed or added with difficulty. Graph databases work differently here: there is no predefined data model. Each data set is rather represented in so-called nodes and the relationship of the data to each other is visualized by means of connections (the edges). When new connections are added, the data model can be extended (see figure).
Data management in graph databases
Another advantage is that graph databases can process complex queries in a short time due to this form of data storage.
An important concept for graph databases that should satisfy the specifications of a Knowledge Graph is the Resource Description Framework (RDF). This translates as description framework for resources. The “resources” here are the data. In RDF, a record always consists of three elements, called a triple. Similar to a grammatically correct sentence, RDF must meet all three components: subject-predicate-object.
If “Berlin”, for example, is the subject, then “is the capital of” would be the predicate and “Germany” would be the object. Subject and object are the nodes of the network as described above and are also called resources or entities. The edges are the relationships that connect the nodes together, creating a data network. Now that it is defined that Berlin is the capital of Germany, the question “What is the capital of Germany?” could be answered by searching the data network with the help of an algorithm and returning the answer “Berlin”. With large amounts of data, AI systems can also be used to establish significantly more complex relationships of meaning.
In order for the data in RDF to be uniquely identifiable, and thus have meaning attributed to it, it must be provided with a unique reference. These sources are called Uniform Resource Identifier (URI). For instance, the term “Prince” can be differentiated within Wikidata(www.wikidata.org) into the singer (Q7542), the family name (Q16881414) or the title of nobility (Q2747456).
Finally, it is necessary to semantically mark up the data using an ontology (such as schema.org) so that it can be understood by machines. In RDF, the resource “Restaurant” would be able to be described with properties such as average rating, geodata or opening hours, which would then be represented as a triple marked within schema.org as follows: Restaurant (subject) – Rating (predicate) ratingValue: 4 (object).
Structure of a triple in RDF
Preparing data management for AI applications
Data is related to each other via the network structure of data storage using RDF. Since the data is uniformly identified in the network, interfaces become obsolete. If the Knowledge Graph is also open, the data can be used by everyone and applications no longer lie behind the paywalls of large players who only allow the expansion of the range of functions of digital services against payment.
Going from data to knowledge with graph databases
If data is available individually in digital form, then a meaning can be attributed to it via an ontology (markup language). This turns data into information, as individual data regarding a hotel, restaurant, etc. can be displayed in aggregated form. Information becomes knowledge when the information is related to each other. For example, if the geodata of a hotel is set in relation to a hiking trail, then travellers know where they can plan an overnight stay. Guests can understand data about applications because queries can be used to contextually evaluate the relationship of the data and present it in an interface. Thus, guests gain knowledge about different holiday situations and can classify them accordingly, which can lead to a change in behaviour(impact).
For data management, this means that data stored in relational databases can be semantically tagged using an ontology such as schema.org and then stored (in parallel) in a graph database. Graph databases then represent the individual pieces of information in a network using RDF. Applications can then be used by guests to access these data networks (see figure).
Complementary database systems
Relational databases and a parallel Knowledge Graph, which describes the relationship between the data, are not mutually exclusive. Rather, these systems can be viewed as complementary . If such a Knowledge Graph can be established in tourism, as is currently being developed by the GNTB, this will be an important step towards linking data at state and regional level.
At the DMO level, this means first and foremost that there must be agreement on the markup language (ontology) and that the data must be open, complete and up-to-date. The GNTB also prefers schema.org because this ontology represents a de facto standard and is therefore compatible with other (non-tourist) data. RDF query languages such as SPARQL (which can be programmed to search and extract the dataset) can be used to search and extract administrative data, in order to display all public toilets in a location and correlate them with data on walking routes for instance. The conceivable scenarios here are manifold and can lead to special tours, in which all public apple trees can be explored in order to always have a snack ready on the way.
Therefore, it can be stated that graph databases are a modern form of data management, which offer a wide range of possibilities for the development of digital services.
Eric Horster ist Professor an der Fachhochschule Westküste im Bachelor- und Masterstudiengang International Tourism Management (ITM) mit den Schwerpunktfächern Digitalisierung im Tourismus und Hospitality Management. Er ist Mitglied des dortigen Instituts für Management und Tourismus (IMT).
Elias Kärle ist Wissenschaftler an der Universität Innsbruck. In seiner Forschung beschäftigt er sich mit Knowledge Graphs, Linked Data und Ontologien. Als Vortragender referiert er meist zur Anwendung und Verbreitung semantischer Technologien im Tourismus.