How can documentation be used to better understand and semantically align data? In an increasingly data-driven world, the ability to connect structured data with semantic concepts is essential for science, industry, and public administration. Existing semantic mapping methods often rely on structural analysis or machine learning, but struggle to interpret domain-specific terms accurately.

This dissertation presents a new approach that systematically leverages the often overlooked resource of data documentation. By combining Natural Language Processing and Large Language Models, it extracts precise mappings from natural language descriptions, automatically linking ontologies with data structures. Core contributions include the VC-SLAM benchmark corpus, the DocSemMap pipeline achieving 60% accuracy in direct concept mapping, and the integration of LLMs to enhance documentation and perform mapping tasks.

The research demonstrates that documentation-based approaches can play a decisive role in improving data integration and interpretation, opening new possibilities for systems that rely on deep semantic contextualization.