FOR SHARE - JULY SNAPSHOT - Data Models and Formats - Functional description

Created by

Geraud Guilloud

Last updated: 3 July 2023, 09:56

Early draft

Please be aware that the Data Spaces Blueprint content shared in these pages are a very early draft published on 2023-07-01. The current draft is incomplete and the content might still change.

SAVE-THE-DATE 01-10/09/2023: We will welcome your feedbacks to future improve the Data Spaces Blueprint during the Public consultation that will open on September the 1^st 2023 until September the 10^th. Please mark these dates in your calendar and get ready!

Title: [Title of the document]

Publisher: [Name of the publisher]

Copyright: [Copyright information]

Consortium: [Name of the consortium]

Contact: [Contact information]

Author(s): [Name(s) of the author(s)]

Expert Group: [Name of the expert group]

Reviewers: [Name(s) of the reviewers]

Created On: [Date the document was created]

Last Updated: [Date the document was last updated]

1. Introduction of the building block

This specification outlines the technical requirements and functionalities of the data models building block that encompasses vocabularies and vocabulary hubs for data spaces. The building block focuses on facilitating standardized data representation, integration, and collaboration between organizations.

Semantic interoperability is of great importance for a data space. A data space requires participants to be able to understand each other in order to provide value to the data. This requires a common language for the level of semantic interoperability, which is required to be able to automatically understand and exchange digital resources. To semantically annotate the data being shared, a data space initiative requires domain-specific vocabularies that express the semantics, e.g., an ontology with a shared conceptualisation of a particular domain of knowledge.

This building block allows:

the interchange of data by providing the information about the shared data structure.
the systematic and automated publication of data assets by providing accurate descriptions of their structure.
systematic and automated search by using standardized data structure descriptions.
data integration and mapping of data elements from different sources by using a common vocabulary.
future-proofing and extending unified information models to accommodate new data elements, relationships, and attributes as the domain or industry evolves.

In order to allow for the features above to succeed, this building block distinguishes three key concepts and provides their necessary functionalities:

Vocabularies (artifacts): common language to facilitate semantic interoperability in a data space, incl. ontologies, data models, schema specifications, mappings and API specifications that can be used to annotate and describe data sets and data services.
Vocabulary provider (role): an entity that is responsible for providing (creating, publishing, maintaining) the vocabularies. In the context of a data space this role is often fulfilled collectively by business communities and delegated to some sort of standards development organisation (SDO).
Vocabulary hub (component): component providing facilities for publishing, editing, browsing and maintaining vocabularies and related documentation.

Figure X: Key components of the data models and formats building block

Interrelationships: The following diagram depicts an overview of how the building block relates to other building blocks in the DSSC framework. This includes any dependencies, shared functionalities, and potential conflicts with other building blocks:

Figure X: Interrelationships between building blocks of the DSSC framework

2. Purpose of the Building Block

In today’s data-driven landscape, organizations across various industries are dealing with vast amounts of data generated from diverse sources, including sensors, applications, and databases. To harness the full potential of this data, organizations need robust and efficient solutions to achieve seamless interoperability across data models, schemas, and terminology. The absence of standardized approaches to represent, integrate, and interpret data hinders effective collaboration, data exchange, and decision making.

Purpose and problem: Organizations often face challenges when attempting to share and integrate data due to differences in data models, schemas, and terminologies. This lack of standardization hampers interoperability and collaboration, leading to inefficiencies, data inconsistencies, and missed opportunities. Manual efforts to align and map data between organizations are time-consuming, error-prone, and hinder scalability.

To address these challenges, the proposed building block presents a specification of a solution that encourages the adoption of shared vocabularies and provides a vocabulary hub for streamlined vocabulary management and discovery. By establishing a common language and a centralized repository of standardized terms, organizations can overcome semantic barriers and achieve better data interoperability, enabling seamless data integration, exchange, and collaboration.

Use cases: Describe the potential use cases of the building block. It involves identifying and explain the various ways in which the building block can be used to solve specific problems or achieve certain goals.

The building block addresses the needs of organizations engaged in data sharing and collaboration across different domains. It caters to various use cases where standardized data representation and semantic interoperability are critical, including but not limited to:

Industry Consortia: Consortia or industry groups that collaborate on developing industry-specific standards and data models.
Data Integration: Facilitating the integration of disparate data sources and formats into a unified data space, enabling data harmonization and consolidation.
Research Collaborations: Cross-organizational research projects involving data integration and analysis.
Regulatory Compliance: Ensuring compliance with data standards and regulations by harmonizing vocabularies across multiple organizations.

Suggested solution: Provide a suggested solution or approach to address the problem. It is important to be clear and concise, and to provide enough details to allow key stakeholders to understand how the approach of the solution will address the problem.

To address the challenges of data sharing and integration caused by varying data models and terminologies, the proposed solution consists of a data models building block that incorporates vocabularies and vocabulary hubs for data spaces. This solution promotes standardized data representation, semantic interoperability, and collaboration between organizations.

The key components of the suggested solution are as follows:

Vocabulary Management: The building block provides robust functionality for creating, managing, and updating vocabularies within data spaces. Organizations can define terms, concepts, relationships, and metadata, ensuring a clear and consistent understanding of data semantics.
Vocabulary Hub: The solution includes a centralized vocabulary hub, serving as a repository for shared vocabularies. Organizations can register their vocabularies within the hub, enabling easy discovery, reuse, and integration. The vocabulary hub provides search capabilities based on domain, keywords, or metadata, facilitating efficient retrieval of relevant vocabularies.
Versioning and Traceability: The building block supports version control for vocabularies, ensuring backward compatibility and traceability. Organizations can track changes, updates, and evolution of vocabularies over time, enabling proper management and governance of vocabulary lifecycles.
Interoperability and Integration: The solution follows established semantic web standards, such as RDF, OWL, and SKOS, to ensure interoperability with existing systems and frameworks. It integrates seamlessly with other data sharing components, such as data repositories, data exchange protocols, and data governance frameworks, through well-defined interfaces or APIs.
Scalability and Performance: The building block is designed to handle large volumes of vocabularies and vocabulary hub registrations efficiently. Caching and indexing mechanisms are implemented to optimize search and retrieval operations, ensuring high performance even with growing vocabularies.
Security and Access Control: The solution incorporates robust security measures to safeguard vocabularies and vocabulary hubs against unauthorized access, tampering, or data breaches. Access control mechanisms allow organizations to define fine-grained permissions for managing and updating vocabularies, ensuring data privacy and integrity.

Key stakeholders: Identify all stakeholders involved in the development, implementation, and usage of the building block to ensure their needs and expectations are taken into account.

The key stakeholders involved in the development and utilization of this building block include:

Data Architects and Modelers: Responsible for designing and implementing data models and schemas within their organizations.
Data Engineers and Developers: Involved in the technical implementation of data sharing solutions, including integration with the vocabulary building block.
Data Stewards and Governance Teams: Responsible for ensuring data quality, compliance, and managing the overall data governance framework.
Domain Experts: Individuals with deep domain knowledge who contribute to the development and refinement of vocabularies within their respective fields.
Data Consumers: Individuals or systems that utilize shared data and rely on standardized vocabularies for accurate interpretation and analysis.

The collaboration of these stakeholders is essential to ensure the successful adoption and realization of the building block's objectives.

When to use the building block? When not?

	Scenario 1	Scenario 2	Scenario 3
Some attributes			Not recommended
More attributes	Mandatory	Use this building block is recommended
Even more attributes			Not recommended

3. Definitions and Vocabulary

In this section, all terms and vocabulary used throughout the technical specification are defined and explained, which is a subset of the DSSC Glossary and thus fully aligned with the content of the DSSC Glossary.

For terms that are included in the DSSC Glossary, please refer to that section for their definitions. If new terms are introduced, they should be provided with a criterion as its description. The idea is that different people (in the context of DSSC and data space initiatives) will make the same judgments when they use criteria in some (relevant) context - i.e. they would agree on whether or not something is (not) an instance (example) of the term. By following this process, it is also possible to propose generic terms to the DSSC Glossary that were not initially included. The DSSC Glossary team will analize and approve/reject the proposed terms.

Term	Description
Lorem ipsum	Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean lacus leo, mattis vel mollis in, congue in urna. Mauris congue at neque non lacinia.
…	…

4. Conceptual Model

This section should provide a high-level conceptual model of the data space building block, including all the concepts and relations between them. A conceptual model can be used to clarify and communicate these technical specifications. It establishes a shared understanding among key stakeholders. The conceptual model can be seen as a highlighted subset of the DSSC conceptual model, optionally enriched with BB-specific concepts. It may function as input for the conceptual model expert group.

A conceptual model represents (a part of) a single mental model, and typically consists of specifications for:

concepts, i.e. (named) classes of entities that have similar properties, e.g. 'data space', 'party', 'person', etc.
properties, i.e. (named) characteristics that can be attributed a value to every instance of a specific concept. For example, the concept 'person' can have a property named 'gender', with values such as 'M' (for male) or 'F' (female).
relations between such concepts, i.e. meaningful (characteristic) links that can exist between (instances of) these concepts, e.g. '[party] is the governance authority of [data space]' or '[person] is the mother of [person]'.
constraints, i.e. logical expressions composed of concepts, their properties and/or relations, that are expected to be (made) true. For example:

o every person has precisely one other person as its mother, and that person has gender 'F' (this is always true);

o every data space must have precisely one party that is its governance authority (if this is not true, it must be made true).

Figure : the Concepts and Relations within a Conceptual Model

To define a concept, its properties, or a relation between concepts, a criterion should be provided as a description in the glossary table.

**[Illustrative image or figure describing the Conceptual Model]**

REMARK: The source file of the CM can be found under attachments!

Term	Criterion
Vocabulary Provider	…
Data	…
written in	…

5. Functionality

This section provides a detailed description of the functions of the key components that this data space building block requires, including its intended features and capabilities. It clearly describe what the building block is expected to do and its anticipated input/output interactions with other building blocks.

To ensure that different components can work together seamlessly, it is important to state that the interactions with other building blocks are based on assumptions. At a later stage, these assumptions will be tested with the respective expert groups.

The key components of the suggested solution and their functionalities:

Vocabularies

Vocabulary functionalities pertain to a single vocabulary for a domain-specific purpose or an across domain purpose. Here are some key functionalities to consider for vocabularies in data spaces:

Domain Understanding: The vocabulary should provide a deep understanding of a particular domain and its concepts, relationships, and terminologies.
Scope and Purpose Definition: It should clarify the boundaries, scope, and purpose of the vocabulary, which involves the determination of specific aspects, entities, and relationships that need to be captured within a vocabulary. For example:
- A vocabulary can be designed for accurate and consistent data exchange in a particular domain (e.g., Energy Domain)
- A vocabulary can be designed for mapping of data elements from different sources by using a common vocabulary. Or for catalogue purposes, like, DCAT-AP
Consistency and Coherence: A vocabulary should be internally consistent and coherent. The aspects, relationships, and properties should align with each other and adhere to a logical structure.
Reusability: The vocabulary should be designed to be reusable and extensible. This enables future expansion or integration with other domains, individuals or systems. This also involves the use of clear and unambiguous language in the vocabulary. Define with precise semantics, allowing users to interpret and use them accurately.
Versioning and Traceability: The vocabulary should provide relevant information about version control for vocabularies, ensuring backward compatibility and traceability
Formalism: The vocabulary should adhere to open metamodel standards, formal language for expressing the vocabulary. For example,
- RDF (Resource Description Framework)
- OWL (Web Ontology Language)
- SKOS (Simple Knowledge Organization System),
- JSON schema for JSON-oriented data models
- XML Schema, Schematron for XML oriented data models
- CSVW for CSV oriented tabular data
- XSLT, R2RML, RML, YARRRML, CSVW for data transformation specifications

This is depending on the requirements of the domain and its intended applications. Using sematic open standards enables seamlessly integration with other data sharing components, such as data repositories, data exchange protocols, and data governance frameworks, through well-defined interfaces or APIs.

Documentation: Documentation of the vocabulary thoroughly, including the motivation, design decisions, version control information, and guidelines for its usage. This documentation helps users understand and reuse the vocabulary effectively.

Vocabulary Provider

Providing vocabularies: The main function of a vocabulary provider is to provide a vocabulary that aligns with the interests of specific stakeholders. For the functional requirements about vocabularies, please refer to the above section.
Domain Understanding: As a provider, it is of great importance to gain a deep understanding of the target domain and its concepts, relationships, and terminologies. Engage with domain experts to ensure accuracy and completeness.
Alignment with Standards: Align, as much as possible, the vocabulary with relevant standards and best practices in the field. This ensures compatibility, interoperability, and future-proofing of the vocabulary. This entails initially examining existing vocabularies to determine if they align with the intended purpose of the vocabulary and the relevant stakeholders' interests.
- For instance, in the flexible staffing industry in the Netherlands, the vocabulary is derived from international standards such as HR Open, as it already encompasses a significant part of their needs. This approach allows for the reuse of standards, facilitating smooth integration with other domains, individuals, or systems.
Open for Iterative Development: the developed of vocabularies is ongoing, allowing for feedback, refinement, and validation from domain experts and end-users. Regularly update and improve the vocabulary based on real-world usage and emerging needs.

Vocabulary Hub

Publish Vocabularies Expressed in Any Open Format from Any Provider:
- Acceptance of Multiple Formats: The Vocabulary Hub should support a wide range of open standards for vocabularies, such as RDF, OWL, SKOS, JSON-LD, or any other relevant standards. For example,
  - Domain-specific vocabularies
    - SCSN
  - Across-domain vocabularies
    - Using open semantic standards.
  - Meta-specifications
    - DCAT-AP
- Provider Independence: Vocabulary providers should be able to publish their vocabularies regardless of the tools or platforms they used to create them. The hub should accept vocabularies from different providers without imposing specific restrictions.
Browse Vocabularies:
- Search and Discovery: The Vocabulary Hub should provide a search functionality to browse and discover vocabularies based on keywords, categories, or other metadata. Users should be able to explore the available vocabularies and access their details.
- Metadata Display: The hub should present relevant metadata about each vocabulary, such as its title, description, creator, version, date, and any additional information provided by the vocabulary provider. This helps users evaluate and understand the vocabularies.
Maintain Vocabularies:
- Vocabulary Management: The Vocabulary Hub should allow vocabulary providers to maintain and update their published vocabularies. Providers should be able to make revisions, upload new versions, or retire outdated versions of their vocabularies.
- Versioning and Change Tracking: The hub should support version control for vocabularies, enabling providers to manage different versions and track changes made to each version over time.
Configure Vocabularies for Semantic Interoperability within a Data Space:
- Vocabulary Integration: The Vocabulary Hub should provide functionality to configure and integrate vocabularies within a data space. This involves mapping and aligning the concepts, terms, or entities from different vocabularies to ensure semantic interoperability. For example, the alignment between domain-specific vocabularies and vocabularies for publishing purposes, like DCAT-AP
- Vocabulary Linking: The hub should allow users to link or reference vocabularies to establish connections between related or complementary concepts. This promotes the reuse and combination of vocabularies for richer knowledge representation and data integration.

The Vocabulary Hub acts as a central hub for publishing, browsing, maintaining, and configuring vocabularies expressed in various open formats

In this section, a table is used to define the details of each function within a data space building block. Each row of the table represents a single function, and the columns represent various aspects of the function, such as its description, interactions with other building blocks, constraints or limitations, and dependencies.

UC-1	<Use case name>
Primary Actor(s)	< primary actors that participate in this use case>
Trigger	<Condition/action that initiates/starts the use-case>
Pre-conditions	<Condition assumed to be true before the first step>
Post-conditions	<Condition after the use case is successfully executed >
Main Success Scenario	<visit STARTING-POINT Step Step Make sure GOAL-ACHIEVED>
Extensions	If Condition, then Alternative Steps <List any extended steps/ scenarios that occur, other than the main success scenario.>
Dependencies	<relations to other building blocks>
Special Requirements	<Any system related special requirements needed to fulfill the use case>
Open Questions	<Notes and questions>

6. Open Challenges – Future Roadmap

This section highlights any open challenges or future directions for the data space building block, including potential improvements. Potential challenges for a building block may arise in its interactions with other building blocks, making compliance a crucial consideration. This section should provide an overview of potential challenges and opportunities for future development of the building block.

7. Conclusions

Short statements about the current state-of-the-art, how this matches with the defined requirements and functionality (gap analysis), and some final remarks.

COPIED FROM OTHER DOCUMENTS (TO BE REMOVED IF NOT USED)

Candidate reference implementations and examples that could be mentioned are.

Vocabularies

Smart Data Models: Smart Data Models is a collaborative initiative lead by FIWARE together with IUDX, TMForum and OASC to gather a set of open-licensed, standardized, and extensible data models that enable the creation of interoperable applications and services in the context of the multiple sectors, including smart Cities, Smart energy, Smart Logistics, Smart Health, Smart Robotics, Smart Environment, Smart Water, Smart Agrifood, Smart Destination, Smart Aeronautics, Smart Manufacturing and others industrial domains.

Vocabulary providers

Vocabulary hubs

IDSA Vocabulary Hub: IDS Vocabulary Hubs give the developer of domain-specific vocabularies the tools and functions to create, improve, and publish their terms.
Semantic Treehouse: TNO’s implementation of a vocabulary hub. It is an online community platform for data models and vocabularies and gives vocabulary providers the tools they need to facilitate semantic interoperability in their data space. https://www.semantic-treehouse.nl/
VoCol. https://github.com/vocol/vocol

Future outlook

automatic integration of the semantic into the metadata store of the data space (ideally with just one reference to the data model structure)
automatic integration for the publication (ideally with just one reference to the data model structure)
Service providing examples compliant with the data structure but with fake values