Structuring tourism data for the semantic web.

Data for humans…but not (yet) for machines.

About 30 years ago, Tim Berners-Lee invented the World Wide Web (Web for short). In order to sort and link content of all kinds (text, images, audio, video files, etc.), he developed a system that identifies them by URLs. These URLs are still used today to link from one document to any number of others. This creates a network of documents or websites – the World Wide Web.

Data can be stored and published in different forms. A distinction is made between unstructured, semi-structured and structured data.

Unstructured data is that data which exists in such a way that humans can read it, but the structure itself is only revealed by human experiential knowledge. Examples include simple descriptive texts that aggregate any information on a particular topic.

Semi-structured data is when information is divided into individual fields, but these do not follow any de facto standard. In a figurative sense, they do not have a generally known “language”, so that the meaning of individual fields cannot be immediately understood by outsiders and information can also be available bundled in a continuous text, which would be separated in other markup languages.

Structured data to be understood by machines for the Semantic Web follows an ontology. This means that individual pieces of information are structured on the basis of a de facto standard. A widely used ontology to describe content on the web is schema.org.

Content on the web is generally easy for people to read. However, machines still reach their limits when it comes to interpreting the contents. This is mainly due to the fact that they are offered in an unstructured or semi-structured way (see infobox). For example, the description of a cycling tour can be a be “broken down” into its component parts and aspects such as: distance, duration, altitude, difficulty, etc. are taken into account in a list. In the same way, however, all this information could also be described and provided in a single coherent text.

The structure of the data for the Semantic Web

The structure of the data for the Semantic Web

Overcoming data silos

Data can be structured very heterogeneously and usually are in German tourism. Machines cannot easily decipher these differences. If data is also to be processed for machines, this requires uniform data labelling: each cycle path would have to be described in the same way.

Then the labeling logic would be immediately understandable and the data of different cycle paths can be combined from different data sources (data silos).

The idea of Linked Data

The idea of Linked Data

This idea of describing the data structure in a uniform way and then connecting it is called Linked Data. Tim Berners-Lee explained this further development of the web very clearly in a TED talk that is still groundbreaking today:

YouTube

By loading the video, you agree to YouTube's privacy policy
Learn more

Load Video

Linked data can therefore be used to combine information from different contexts. Descriptions of cycling tours would no longer have to end at administrative boundaries, but could be continued via a uniform structure of data for the guest.

Linked Data is the key

Currently, there is still a strong human orientation in the processing of data in tourism. When information about a bike path is filed, it is usually done with the goal of publishing it in a particular app or on a website for one’s guests. In principle, there is nothing wrong with that.

However, it will become increasingly important in the future to provide data in such a way that it can be used universally outside of a specific use case. For this purpose, it is important that they are described in a uniform way with the help of an ontology. In tourism, these are schema.org and an extended vocabulary. The latter is currently being developed by a consortium (DACH-KG) specifically for tourism.

Structuring data for the Semantic Web requires a consensus on the markup language. An established specification for describing data (also called ontology) is “schema.org”. Schema.org is an initiative of the major search engines Bing, Google, Yahoo! and Yandex. It provides a description system to provide data in a particular structure. This can also be referred to as annotating or marking up the data.

Within schema.org, there are schemas that can be used to describe different types of data (e.g. a hotel, an event, a POI, etc.). If these schemas are used by all data providers, then data can be related to each other and understood regardless of the use case – because there is a common structure. Schema.org can therefore be understood as a language for data.

There is talk of “interoperability”: Data can be further processed by humans and machines independent of the output channel and also independent of the context. This has two primary backgrounds:

  • In the course of developments in the field of artificial intelligence, machines will increasingly work independently with data and present new correlations. It is conceivable, for example, that cycling tours should indicate whether they are also suitable in winter. If the administrative level dataset provides information on clearance services, the two datasets could be correlated. In consequence, this means: It is hardly possible to anticipate in advance in which context the data can be used. They should therefore be provided in a context-independent manner.
  • Depending on the application context, however, the requirements for the output channel also change. It is foreseeable that data will no longer be displayed only on one output channel, but in the future always where the user needs it at the moment: On the smartphone, on the touchscreen in the tourist information, via verbal output using voice assistants, etc.

Changing data requirements due to diversified application areas

Changing data requirements due to diversified application areas

Data management as a task for the future

The Web is increasingly evolving from a web of linked documents to a web of linked records as a result of different requirements.

The transformation of the web

Changing data requirements due to diversified application areas

This adaptation of data management is highly relevant against the background of the development of the Internet of Things: By means of sensor technology, a great deal of context data on weather, time, states (empty or full, light or dark, etc.) will be available in the future. In combination with structured data on tourist POIs, events, etc., a wide range of applications can be created here. The vision here often goes in the direction of automated services that, depending on the holiday context (rain or shine, morning or evening, high or low season, etc.), make recommendations that suit both the situation and the guest in question.

At the latest now it becomes clear that a modern data management can be a central future task of the DMO. Specifically, this means that the focus of data management should be on the readability, interpretability, and usability of data for machines (and humans).

Eric Horster, West Coast University of Applied Sciences

Eric Horster

West Coast University of Applied Sciences

Eric Horster ist Professor an der Fachhochschule Westküste im Bachelor- und Masterstudiengang International Tourism Management (ITM) mit den Schwerpunktfächern Digitalisierung im Tourismus und Hospitality Management. Er ist Mitglied des dortigen Instituts für Management und Tourismus (IMT).

Mehr zur Person unter: http://eric-horster.de/

Elias Kärle, University of Innsbruck

Elias Kärle

University of Innsbruck

Elias Kärle ist Wissenschaftler an der Universität Innsbruck. In seiner Forschung beschäftigt er sich mit Knowledge Graphs, Linked Data und Ontologien. Als Vortragender referiert er meist zur Anwendung und Verbreitung semantischer Technologien im Tourismus.

Mehr zur Person unter: https://elias.kaerle.com/