ein schöner See

ThWIC Sonar

AI-based navigation support in the document and data lake on the topic of water
ein schöner See
Image: Susanne M Hoffmann (CC-BY 4.0)
This page has been machine translated.
Information

This project is part of the Thuringian Water Innovation Cluster de.

Brief description of the project

Publicly accessible documents and publications are valuable sources of information on findings, announcements and developments in the (subject) area|field of groundwater, surface water, drinking water and wastewater. However, given the abundance and heterogeneity of these sources, there is a risk of overlooking important information. In addition, the relevance of different sources and the requirements for the processing and presentation of content are highly dependent on the target group. The Thuringian Water Innovation Cluster therefore needs a semi-automatic monitoring and recommendation system that regularly monitors, collects and processes these sources of information. The aim of the "ThWIC Sonar" measure is to develop a complete system for water documents with a modular architecture that collects environmental information, processes it taxonomically and according to relevance, and proactively recommends it to different user groups.

ThWIC Sonar comprises two sub-projects. In the project "Language model and ontology integration", an AI system (a so-called language model) is being developed with which new text documents on the topic of water can be automatically classified and indexed. This indexing will contain links to existing machine-readable formal vocabularies (ontologies) and knowledge graphs that enable the automatic extraction of further information. During development, particular attention will be paid to a well-documented and transparent process in line with the FAIR principles, which require that data and processes should be findable, accessible, interoperable and reusable for third parties. In addition, a resource-saving process is being developed that requires as little training data as possible and thus minimises energy consumption when creating the model.

In the sub-project "Validation of a framework for the integration of environmental information into an information hub and relevance-based information distribution", AI-based technologies are used to develop intelligent information management with a high degree of personalisation and make it available to cluster stakeholders. To this end, data and information relating to the topic of water is monitored from various data sources and merged into an information hub ("Sonar-Hub") and made accessible. Contextual factors from user behaviour are methodically linked to the output of the language models or ontologies and taken into account in a recommendation algorithm. In addition to the semantic link, the water topics are dynamically correlated. The self-learning algorithm adapts to the information needs of different target groups and their behaviour on the hub and thus provides the information dynamically and based on relevance for each user.