EUCAIM Glossary

Version 2.0, last updated on 19/01/2024

The aim of this glossary is to establish agreed definitions of commonly used terms to harmonise the understandings and works of the project. This living document will be updated regularly with the addition of new terms and modifications of the definitions, if needed. .

This glossary was created by the EUCAIM consortium partners, based on the European project HealthyCloud glossary. When possible, terms are aligned with the Proposal for a Regulation on the European Health Data Space (EHDS).

We encourage feedback on our definitions! Please get in touch with us if you have any comments on glossary definitions, or if you want to suggest additional items. There is a contact form to reach out included at the bottom of this page.

Glossary Table of Contents

Glossary

Access Committee (AC)

EUCAIM Governing Body that controls the access to the Atlas of Cancer Images. It reviews the evaluation reports to provide a final decision about the acceptance/rejection of data and/or tools (by data providers), and the R&D requests (by data users-researchers). The body will ensure responsible and secure access to the infrastructure data and services, promoting valuable research while upholding ethical and privacy standards

Acceptance process

The Management Board (MB) makes a decision of acceptance or rejection within a period of 60 days, supported by the internal governance bodies on ethics and legal compliance and taking into consideration the indications from the Access Committee and Steering Committee.

Advisory Boards (AB)

The Advisory Boards (AB) are a group of external experts established during the course of  the project to advise the Management Board on technical, ethical and related legal issues as well as on exploitation and regulatory matters. These boards will involve participants that are not part of the consortium members, in order to provide a fresh-eye, unbiased view on the decision making of the rest of boards. Even after the project concludes, the AB is envisioned to continue to provide external, unbiased advice on any decision-making regarding the day to day operations of the infrastructure, both at the technical and legal level.           

Administrative Project Coordinator (AdmCo)

The administrative project coordinator is responsible for the mediation between the project consortium and the funding authority, the European Commission (EC). Acting as the main point of contact with the EC, the AdmCo is responsible for the overall administrative and financial management of the EUCAIM project. The administrative coordinator is also tasked with the technical review of deliverables and milestones and financial reporting. Finally, it is currently envisioned that the AdmCo will oversee all managerial aspects of the Central Hub Office, with the overall purpose of supporting the implementations of the activities planned in the periodic strategic plan for the maintenance of the infrastructure.

Aggregated data

Statistical data about several individuals that have been combined to show general trends or values within the data [1].

Analysis Platform

Component within the federated processing infrastructure for executing tasks related to data analysis, including AI training and inference, while upholding data privacy and regulatory compliance. It provides a user interface, comprising both a dashboard and API, where users can initiate experiments, monitor processes, and retrieve results.

Anonymization

The processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject. Removing personally identifiable information (i.e., all directly and indirectly identifying information) so as to definitely not allow the identification of the data subjects [2]. The methods used to anonymize the data depend on the context and the technology used; this process must take into account the recommendations of the Data Protection Authorities of each EU Member State.

Annotation Hackathons

Workshops organised to collect necessary metadata from available tools following the recommendations and standards of the ELIXIR infrastructure. These events focus on enhancing the description and registration of software tools and modules to be utilised in the EUCAIM infrastructure.

Atlas of cancer images

Data and service environment of the federation for aiding cancer research (during the project as well as after its conclusion for future utilisation). The Atlas of cancer images includes de-identified images both from the Central Hub, as well as federated nodes, plus the data from the European research repositories.

Biometric data

Personal data resulting from specific technical processing relating to the physical, physiological or behavioural characteristics of a natural person, which allow or confirm the unique identification of that natural person, such as facial images or fingerprint data [3] [4].

Budget for Open Call

Available funds allocated for new beneficiaries. These beneficiaries will receive funding under the same co-funding conditions as consortium partners (i.e. 50% of the budget; a total budget of €3,600,000 has been included with the COO EIBIR, and the maximum amount per grant is €200,000).

Calibration

In prediction models, calibration refers to the concordance between predicted and observed probabilities.

Central Dashboard

Website intended for Data Users-Researchers who want to use EUCAIM data for analysis in the context of research and innovation. Starting from the dashboard, data users can see the metadata of datasets in a public catalogue, are able to register into the platform and search for metadata (e.g., disease, imaging modalities, age groups), can request access to data (if needed), can apply processing tools to the data, can obtain analysis results, and may inform the providers of interesting results obtained for their consideration. In addition, the dashboard will also guide data providers to the documentation page and a request form.

Central Hub

Infrastructure comprising the Central Repository, Central Dashboard, and the services and tools provided by the EUCAIM platform.

Central Hub Office

The Central Hub Office (CHO) is responsible for all functions necessary in accordance with the infrastructure´s statutes, the needs of its ordinary functioning and compliance with the legal requirements for an entity of its nature. The CHO will comprise experts in cloud infrastructure maintenance, technical support, legal matters, fundraising and project management, IPR, dissemination and promotional actions, as well as administrative and financial management.

Central Repository/Central Storage

The Central Repository or Central Storage is the central service to hold data, make data available for use, and organise data in a logical manner. In EUCAIM imaging data, clinical data and genetic data could be deposited in the central repository following the data management procedures in place.

Chief Security Officer (CSO)

Person responsible for a company’s physical and digital security. The CSO provides executive leadership and oversees the identification, assessment and prioritisation of risks, directing all efforts concerned with the security of the organisation. The CSO works to stay ahead of security issues (e.g., security breaches), solve problems and ensure the organisation runs smoothly. Additionally, CSO expertise is required to implement safeguards and reporting risk management mechanisms for regulation compliance [5].

Clinical Data

Clinical data encompasses a diverse set of information integral to the EUCAIM infrastructure, extending beyond imaging data to include a range of critical details relevant to medical research and healthcare. This category involves comprehensive clinical information that accompanies the images, providing contextual insights into patients’ health conditions. It covers various aspects, such as mutation status, results from biological samples, quality of life assessments, quality of care metrics, and health-related costs.

Cloud

Network of computing facilities providing remote data storage and processing services through the internet [6].

Cloud computing

Paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with administration on-demand [7].

Collaboration Agreement (ColA)

Document that expresses the willingness of the Parties to collaborate by establishing an overarching framework to facilitate interaction and exchange of information between the parties.

Common Data Model (CDM)

A CDM is a standardised framework that defines both the structure and semantics of diverse datasets using ontologies, coding systems, and formal documentation. Clinical data standards provide a common structure and content of observational data, enabling interoperability and more efficient analyses that can produce reliable evidence. Within the EUCAIM project, two potential candidates for the CDM have been identified: HL7 FHIR and OHDSI-OMOP.

Consent

An individual’s agreement, e.g. to participate in research, undergo a healthcare procedure, to personal data processing. Within the context of personal data, the GDPR defines consent as: “Any freely given, specific, informed and unambiguous indication of the data subject’s wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to him or her”.   [8]).

Data

Data can be defined as the recorded factual material that is commonly accepted in the scientific community as information that is required to support research findings [9] [10]. Refers to any digital representation of acts, facts or information and any compilation of such acts, facts or information, including in the form of sound, visual or audiovisual recording [2]. There are four major categorical types of data for where the data comes from: observational; experimental; simulated and derived [11]. Data is information available for processing. Specifically, the types of data that the EUCAIM infrastructure is interested in collecting are imaging data (radiological and nuclear medicine cancer images of any modality, segmentation masks with the annotations made and histopathological images) and other clinical data (clinical information accompanying the images, mutations status, biological sample results, quality of life, quality of care and health costs).

Data access

The processing of data by a data user, which was provided by a data provider, in accordance with specific technical, legal, or organisational requirements, without necessarily implying the transmission or downloading of such data (see Personal data) [2]. Three data access conditions are offered in EUCAIM: authorisation to download the datasets; authorisation to access, view and process them in-situ; or authorisation to remotely process the datasets from a federated node without the ability to access and visualise data, even remotely.

Data access right

The ability, right or permission to act on data in a defined location [12]. Data access in EUCAIM will be limited to authorised individuals or organisations based on specific permissions or roles, and always upon request.

Data altruism

Voluntary sharing of data on the basis of the consent by data subjects to process personal data pertaining to them, or permissions of other data holders to allow the use of their non-personal data without seeking a reward, for purposes of general interest, such as scientific research purposes or improving public services [2].

Data centric health research computational infrastructure

Infrastructure that provides data as a service. This infrastructure includes services, such as data visualisation, hosting and processing of data. In particular, it can process health-related sensitive data. Technological infrastructures for data analysis, exploitation and/or processing.

Data collection

This term will not be used in the EUCAIM framework. Please refer to Dataset.

Data concerning health

Personal data related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about his or her health status.

Data curator

A person who is responsible for the quality and FAIRness of the health-related data, and to make sure the value of the data is discovered and accessible. This role also considers the possibility of enriching data when increasing its quality. Importantly, data curators might play a role regarding being processors, e.g. responsible for the data at hand.

Data discoverability

The ability or a mechanism to browse and locate available data relevant to a specific user’s purpose (e.g., research project) in a non-targeted search. Data is more discoverable if the datasets have a metadata catalogue, and the metadata catalogue is publicly accessible. Discoverability is related to findability from the FAIR principles.

Data Federation Framework (DFF)

System that enables integration and querying of distributed data sources without physically centralising the data, maintaining autonomy of the individual databases. Also, provides a unified interface for querying and retrieving information from several sources, promoting interoperability and facilitating real-time decision-making.

Data governance

Assembly of policies and processes, coordination aspects, data usage and accessibility principles and data management procedures for a certain health data infrastructure to ensure legal compliance, consistency and good data quality throughout the different stages of the data life cycle.

Data harmonisation

The process of removing systematic differences between images acquired from different scanners (i.e., inter-scanner variability) via statistical methods. Such techniques enable multi-center datasets and derive greater power from results than when centres work independently. Given the high economic costs of imaging, multi-center collaboration is the most feasible way to acquire large imaging datasets [13].

Data mapping

The process of matching fields from multiple datasets into a centralised database. It is required to transfer, ingest, process, and manage data [14].

Data holder

A Data Holder refers to any natural or legal person, including entities, bodies, and research organisations in the health or care sectors, as well as European Union institutions, bodies, offices, and agencies, who has the right, obligation, or capability to make certain data available for research purposes. This may include registering, providing, restricting access to, or exchanging the data. Examples of Data Providers include data repositories, regional biobanks, clinical centres, cancer screening programs, public entities, pharmaceutical companies, data altruism initiatives, and publication repositories. These infrastructures may host one or more datasets for discovery and retrieval, and the exposure and access to data in the Dashboard will be provided at the dataset level.

Data Protection Task Force

Body that plays the role of the Data Protection Officer (DPO) during both the project execution and beyond. It will monitor internal compliance, inform, and advise on data protection obligations, provide advice regarding Data Protection Impact Assessments and act as a contact point for all the partners and data subjects (the results of this task being documented in D3.6 – Data Management Plan). During the project execution phase, the main representatives of this task force will involve the DPOs of each consortium partner. Upon project end, the members of this board may need to be re-elected.

Data quality

The degree to which a set of inherent characteristics of data fulfils requirements [15].

Notes: The requirements are defined by the purpose of the processing and hence data quality can be viewed in other words also as a “fitness for purpose”. The purpose can be any use of the data, including primary use or secondary use.

For the purpose of data protection, data quality refers to a set of principles laid down in Article 5 of the GDPR and Article 4 of Regulation (EU) 2018/1725, namely [16]:

  • Lawfulness, fairness and transparency
  • Purpose limitation
  • Data minimization
  • Accuracy
  • Storage limitation
  • Integrity and confidentiality

Data sharing

Provision of data by a data provider to a data user for the purpose of joint or individual use of the shared data, based on conditions of use, directly or through an intermediary [2].

Data sharing agreement

Agreement between two or more parties that outline which data will be shared and how the data can be used.

Data sovereignty

Data stored outside of an organisation’s host country are still subject to the laws of the country where the data are stored [17].

Data steward

A person who has an administrative role; and does not really use the data. Data Stewardscreate guidelines to make data FAIR and advice on how to do it. They might have direct responsibility on the data at hand (processors) or not.

Data subject

As defined in the GDPR, in the case of data processing, a data subject is a person who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person [18].

Data Transfer Agreement

Agreement established between organisations that governs the transfer of one or more data sets from the owner/provider to a third party.

Data User-Researcher

Any person or entity that wants to explore the public catalogue and eventually request access to data and process them using either the tools available in the platform or their own AI tools. This data access request by the Data User-Researcher should be made through a Research and Development (R&D) project that will be evaluated by the Access Committee.

Dataset

Dataset refers to a specific set of imaging and accompanying clinical information, published by a single Data Provider and created for a particular purpose or study. A dataset is described by a set of common metadata elements related to the imaging and clinical information, dataset creation, access rights and terms of use. Data Users-Researchers will be able to request access at the Dataset-level.

De-identification

General term for any process of removing the association between a set of identifying data and the data subject [19].

De facto anonymization

Sometimes also referred to as relative anonymization. De-identification operations by which so many identifiers are removed and further techniques to reduce personal reference (e.g. randomization or generalisation) are applied that re-identification with reasonable efforts in accordance with the current state of art is no longer possible and the personal reference is eliminated. In terms of legal or GDPR compliance this concept must include the definition of a closed environment for processing that will exclude any external attack. An “attacker” here is a third party (i.e. neither the data provider nor the data user) who accesses the original data sets accidentally or intentionally.

Demonstration Experiments

Computational experiments performed with selected platforms to showcase their capabilities. In the context of the EUCAIM project, these experiments demonstrate the functionality of federated learning platforms and distributed analysis tools in solving specific data analysis challenges. The outcomes contribute to understanding technical issues in a real distributed scenario.

ELIXIR Tools Platform

Centralised resource for accessing and discovering tools in the life sciences, promoting collaboration and efficiency in research. It includes recommendations for software registration in bio.tools ELIXIR registry, packaging with Biocontainers, and participation in services like OpenEBench for software quality monitoring.

Ethical and Legal Board (ELB)

Body in charge of ensuring that no EU rule is violated, while ensuring that the research conducted is up to the accepted EU standards. In this context, the term “Ethics” refers to questions of legal and regulatory compliance that constitute a part of the governance process. In EU-funded projects, ethics is deemed a transversal issue and Ethics Advisory Board a key oversight mechanism to ensure understanding of the Ethics Appraisal Procedure, proper implementation of the Ethics Requirements, addressing specific issues such as Privacy and Data Protection Impact Assessments or Artificial Intelligence  and ensuring ethics compliance in general. The ELB will act as a contact point for guidance on ethical issues that may arise during project execution and beyond project end, working in close connection with any party saddled with ethics-related responsibilities. During the project execution, the ELB will be chaired by the WP3 leaders and composed of legal experts in the participating entities. Beyond project end, the members of this board may be reselected based on availability.

EUCAIM Federation / European Federation of Cancer Images

The entity as a whole, which encompasses both the central and federated components (the central repository functions as another node within the federation). The term “federation” encompasses the overall scope of EUCAIM, involving the governing bodies and orchestration of all nodes, whether central or federated. It constitutes the collective framework for coordination and governance.

EUCAIM Platform

The overarching framework that combines the distributed data throughout the federation, including both central and federated components, with the services facilitating their use. The platform serves as the integrated infrastructure, providing access to images and associated services within the EUCAIM Federation.

EUCAIM Infrastructure

The collective technical foundation supporting the EUCAIM Federation. It includes both central and federated components, forming the backbone that enables data distribution, access, and associated services.

European Digital Infrastructure Consortium (EDIC)

The Digital Decade policy programme 2030 establishes a new legal framework for multi-country projects, the  European Digital Infrastructure Consortium. It is a new instrument to help Member States speed up and simplify the setup and implementation of multi-country projects. A minimum of three Member States who want to use a European Digital Infrastructure Consortium to set up a multi-country project will submit an application to the Commission. Following the examination of Member States’ application, the Commission will, if it concludes that all requirements provided for in the decision are satisfied, adopt a decision establishing the European Digital Infrastructure Consortium. Each consortium will have its own legal personality, governing body, statutes, and seat in a participating Member State [20].

External partners:

In the context of an EU project, an external partner typically refers to an organisation or entity that is not a part of the original consortium, but is engaged or involved in the project in some way (e.g. via Open Call). External partners may include organisations, institutions, companies, or individuals who collaborate with the consortium members to contribute and broaden the project’s objectives, outcomes, or activities becoming part of the consortium and, therefore, the internal partners.    

External use case

Any use case conducted by external partners (see also definition for “use case”).

Federated Catalogue

Metadata catalogue that stores the clinical and imaging metadata within the different federated nodes of the Atlas of Cancer Images, as a federated search endpoint compliant with the EUCAIM federated query requirements.

Federated data analysis

Federated data analysis describes an analysis that is performed on multiple (often geographically) separated datasets. During this analysis, the data is not exchanged and can stay, for example, behind a given institution’s firewall. Only the interim results of a local analysis are exchanged between the data-hosting sites [21]. The aggregated non-identifiable results from each local analysis are pooled and returned to the data user.

Federated learning

This is a specific case of federated data analysis, for machine learning purposes. It is a learning technique that allows users to collectively reap the benefits of shared models trained from rich datasets. The learning task is conducted across multiple separate sites coordinated centrally. Each site has a local training dataset which is never shared. Instead, each site computes an update to the current global model maintained centrally, and only this updated model is communicated [22].

Federated Node/Local Node

Infrastructure deployed in a Data Provider that meets the hardware and software requirements of the EUCAIM project and that has been configured and connected to the federated network, being able to access Hyper-ontology compliant federation-exposed collections (under research project approval) and execute federated processing, including federated learning.

Federated Processing (FP) Infrastructure

Technical framework designed to facilitate federated analysis, which involves processing data without centralising it. In the context of the EUCAIM project, FP Infrastructure enables the execution of tasks, including AI training and inference, while keeping data decentralised at their original sites, adhering to specific regulatory frameworks.        

FAIR Principles

Principles to define the Findability, Accessibility, Interoperability, and Reuse of resources for humans and computers at the source. For example, the principles emphasise machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data [23].

  • Findable: Data and supplementary materials have sufficiently rich metadata and a unique and persistent identifier.
  • Accessible: Metadata and data are understandable to humans and machines. Data is deposited in a trusted repository.
  • Interoperable: Metadata uses a formal, accessible, shared, and broadly applicable language for knowledge representation.
  • Reusable: Data and collections have a clear usage licence and provide accurate information on provenance [24].

Filing system

Any structured set of personal and non-personal data which are accessible according to specific criteria, whether centralised, decentralised or dispersed on a functional or geographical basis [25].

Governing Body

The party that encompasses the board of EUCAIM and can decide the approval, comment, or refusal of an application of data access, and data or tool provisioning, supported by legal, ethical, and technical boards.

Genetic Data

Personal data relating to the inherited or acquired genetic characteristics of a natural person which give unique information about the physiology or the health of that natural person and which result, in particular, from an analysis of a biological sample from the natural person in question [26].

Health data

Personal data related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about his or her health status [27].

Health information

All organised and contextualised data on population health and health service activities and performance, individual or aggregated, that improves health promotion, prevention, care, cure and policy-making [28].

Health Information System (HIS)

A health information system is the total of resources, stakeholders, activities and outputs enabling evidence-informed health policy-making. The health information system manages all types of health data, from EHRs to imaging data and population health data. HIS activities include data collection, interpretation (analysis and synthesis), health reporting, and knowledge translation, i.e. stimulating and enhancing the uptake of health information into policy and practice. Health information system governance relates to the mechanisms and processes to coordinate and steer all elements of a health information system [29].

Hospital

Within the EUCAIM project framework, affiliated hospitals constitute a distinct subset of Data Providers. In this case, hospitals will not expose their data warehouses to the federation, which may already exist or may have been created for EUCAIM. Instead, they will be approached individually each time there is a new research project on specific clinical cases. If they choose to participate in the project, hospitals will prepare the necessary anonymised datasets within their data warehouses, and these datasets will be shared with the federation through a federated node or by uploading them to the Central Storage. Therefore, hospitals will only expose metadata for specific datasets from projects in which they have chosen to participate upon request.

Hyper-ontology:

Hyper-ontology refers to the creation of a comprehensive and unified semantic representation of fundamental knowledge defined in the local AI4HI projects. The main purpose of the Hyper-ontology is to facilitate the development of validated clinical decision-making systems that support diagnosis, treatment, and predictive medicine, to benefit citizens. The Hyper-ontology serves as a standardised framework that enables interoperability between different projects using OMOP and FHIR standards. It allows for the expression of federated queries, enabling the analysis of distributed data sources. Additionally, a subset of the Hyper-ontology will be dedicated to describe datasets.

Internal use case

Any use case conducted by external partners (see also definition for “use case”).

Imaging study

Defined as the utilisation of a variety of imaging techniques to acquire visual representations used as tools for screening, detection and monitor of cancer.

Legal entity

A company or organisation that has legal rights and responsibilities. From a legal point of view, it has its own personality and full capacity to fulfil its purposes. In legal relations, it is the holder of rights and obligations. It can be created directly by the law or in accordance with the provisions of the law.

Machine learning

A subset of AI techniques based on the use of statistical and mathematical modelling techniques to define and analyse data. Such learned patterns are then applied to perform or guide certain tasks and make predictions [30].

Management Board (MB)

The operational body responsible for the monitoring of the technical progress of the project, quality assurance, and the ad-hoc coordination of scientific and technological activities. It comprises the Administrative Project Coordinator, the Scientific Coordinator (SCo) (chair), and all Work Package leaders (WPLs).

Upon project end, the MB is also envisioned to be in charge of any decision making regarding any technical implementations and quality control of all operations regarding the day-to-day functioning of the infrastructure, including the coordination of scientific activities around it.

Marketplace

Centralised platform within the EUCAIM federation that facilitates the exchange and distribution of processing tools, services and applications developed by Tool Providers. It serves as a repository where Tool Providers can contribute their tools for federated processing or data preprocessing purposes to be used by Data Users-Researchers.

Mediator

A component responsible for connecting to the central infrastructure, translating the federated query to the site’s Structured Query Language (SQL) for sites providing OHDSI OMOP-CDM compliant data, Clinical Query Language (CQL) for sites providing FHIR compliant data), aggregating (and optionally obfuscating) the results, and finally returning the aggregated results to the central components. The Mediator acts as a sort of middleware and is deployed at the site of each Data Junction.

Memorandum of Understanding (MoU)

A memorandum of understanding (MoU) is a type of agreement between parties. It expresses a convergence of will between the parties, indicating an intended common line of action. It is often used either in cases where parties do not imply a legal commitment or in situations where the parties cannot create a legally enforceable agreement.

Message Broker

A software intermediary that enables seamless communication between various components within the federated processing system. This broker functions as a central hub, overseeing the exchange, routing, and delivery of messages. Specifically, it plays a key role in coordinating tasks related to federated analysis, ensuring the secure and efficient flow of information between the Analysis Platform and distributed nodes.

Metadata

A set of data that defines and describes a resource (e.g., data, dataset, sample…) so that it can be understood, discovered and reused. There are different levels of metadata. Since metadata can be used to describe different aspects of data, we can group metadata properties in terms of quality, availability, provenance, processing, among others. Then there are metadata catalogues that can be used to describe the available datasets. Metadata is important to make data understandable, and can contribute to increase the findability, accessibility, interoperability and reusability of the data. Metadata can be collected or compiled in repositories to improve the  level of compliance with FAIR principles of the datasets.

Metadata harvesting

Automated collection of metadata descriptions from different sources to create useful aggregations of metadata and related services [31].

Negotiator

The Negotiator is a specialised tool integrated into the EUCAIM Dashboard and designed to facilitate the exchange of documents and information between User-Researchers and the Access Committee. On the one hand, the Negotiator allows users to submit requests for data or software to one or several holders as selected in a previous discovery step in the EUCAIM catalogue. On the other hand, the Negotiator also allows users to build new research projects by facilitating the negotiation with a specific EUCAIM network of contacts according to their objectives and needs. In both cases, the negotiation mechanism allows the Access Committee and, ultimately the Data or Software Holder itself, where appropriate, to (a) to obtain more information from the requestor to better understand the reason of the request and the requested data in this broadcast mode, (b) to enter a negotiation process with the requester, or (c) to step back from a request in case thinking of not being able to fulfil what was requested for some reason.

Non-personal data

All data other than personal data. Note that non-personal data could be inextricably linked with personal data or be used in order to obtain inferences of persons’ qualities; in such case, GDPR and national data protection laws must apply [2].

Open Call

An open call for extending the real-world use cases will be launched. The open call is only for new beneficiaries to join the consortium. This open call will follow the guidelines stated in the call with respect to publication and openness and will pursue: i) the onboarding of new data providers, increasing the geographic dimensions, data modalities or cancer targets; and ii) the uptake of new trustworthy AI algorithms trained on the data of the repository. The open call will be published by the beginning of the second project period, in compliance with the terms and conditions stated by the European Commission (see Budget for Open Call).

Open data

Data that is freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. Open licence is a licence agreement which contains provisions that allow other individuals to reuse another creator’s work, giving them four major freedoms. Without a special licence, these uses are normally prohibited by copyright law or commercial licence. Most free licences are worldwide, royalty-free, non-exclusive, and perpetual (see copyright durations). Free licences are often the basis of crowdsourcing and crowdfunding projects [32] 

Open science

The movement to make scientific research (including publications, data, physical samples, and software) and its dissemination accessible to all levels of an inquiring society, amateur or professional. Open science is transparent and accessible knowledge that is shared and developed through collaborative networks. It encompasses practices such as publishing open research, campaigning for open access, encouraging scientists to practise open-notebook science, and generally making it easier to publish and communicate scientific knowledge [33].

Orchestrator

Daemon that operates at each data node within the federated processing infrastructure. It connects to the Message Broker to obtain assigned tasks and initiates the execution of software required for the federated processing. The Orchestrator interacts with the local execution infrastructure, ensuring the proper execution of tasks on each data node.

Personal data

Personal Data means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person” [34].

Name and the social security number are two examples of personal data which relate directly to a person. However, the definition extends further and also encompasses for instance e-mail addresses and the office phone number of an employee. Other examples of personal data can be found in information on physical disabilities, in medical records and in an employee’s evaluation.

Personal data which is processed in relation to the work of the data subject remain personal/individual in the sense that they continue to be protected by the relevant data protection legislation, which strives to protect the privacy and integrity of natural persons. As a consequence, data protection legislation does not address the situation of legal persons (apart from the exceptional cases where information on a legal person also relates to a physical person).

Personal data breach

A breach of security leading to the accidental or unlawful destruction, loss, alteration, unauthorised disclosure of, or access to, personal data transmitted, stored or otherwise processed [35].

Platform Manager

A technical expert or team of experts that will operate the core services of the EUCAIM platform. The Platform Manager is responsible for managing and maintaining the underlying infrastructure of the central storage, including servers, databases, and other resources. The Platform Manager manages user accounts and access permissions, deploys applications, and services, uploads new applications to the marketplace (provided by Tool Providers), and ensures their proper integration into the platform. They support the orchestration of federated processing, working with Data Providers/Holders/Controllers and Tool Providers to integrate metadata, tools, and services, and ensuring that Data User-Researchers queries are properly executed. As a team of experts, it is possible to have multiple platform managers assigned to different roles such as security and data privacy, administration, development, system management, etc. Additionally, the Platform Manager provides user support, responds to inquiries, provides documentation, and troubleshoots issues that arise with the platform.

Platform User Roles

A user profile identified by a name and the user stories that (s)he could do, which determine the access permissions and any other authorised activity required to perform the actions in the user stories.

Primary use of data

The use of any data for the purpose for which it was originally collected.

Privacy-preserving learning techniques

AI-based merging techniques that help preserve patients’ privacy. The building blocks of privacy-preserving machine learning are federated learning, homomorphic encryption, and differential privacy. They can use tricks from cryptography and statistics [36].

Processing (personal and non-personal)

Any operation or set of operations which is performed on data or on datasets, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction [37].

Profiling

Any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular to analyse or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements [38].

Pseudonymization

The processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person [39].

Public catalogue

Metadata catalogue available to anonymous users, offering a limited view of the catalogue and basic search options. This catalogue stores metadata, offering the Data User-Researchers basic descriptive information about the available datasets and data access conditions.

Repository

A storage for digital information, typically organised in the form of a catalogue of datasets, that can be searchable and can provide access to the data under given conditions.

Responsible AI

AI that is designed, developed, evaluated, and monitored by employing an appropriate code of conduct and appropriate methods to achieve technical, clinical, ethical, and legal requirements (e.g., efficacy, safety, fairness, robustness, transparency) [40].

Research Communities

Groups or entities with a common research goal, typically formed through the course of already finalised,currently ongoing or newly emerging projects, that would like to make use of EUCAIM’s research environment to continue the research their original project facilitated in the first place. With this, the community taking part in that project (e.g. consortium) will need to agree to transfer the data collected (together with the tools developed through the project lifespan where applicable) to EUCAIM’s Central Repository. The expectation is that the Research Community (RC) will remain connected via EUCAIM and will as a result be able to continue and further expand the work done in the scope of such a project via EUCAIM. In return, EUCAIM will include the related datasets in its catalogue, providing the RCs with a secure and highly interoperable environment and enabling them to initiate new projects within the EUCAIM infrastructure, while establishing new collaborations with other partners connected to EUCAIM.

Research software

As it includes individual pieces of software (e.g. tools), analytical workflows (composition of two or more individual tools and eventually other workflows), platforms (e.g. for federated learning) and other auxiliary software that is important to carry on the scientific activities expected in the project.

Restriction of processing (personal and non-personal data)

As defined by the GDPR, methods by which to restrict the processing of data could include, inter alia, temporarily moving the selected data to another processing system, making the selected personal data unavailable to users, or temporarily removing published data from a website. In automated filing systems, the restriction of processing should in principle be ensured by technical means in such a manner that the personal data are not subject to further processing operations and cannot be changed. The fact that the processing of data is restricted should be clearly indicated in the system [41].

Secondary use of data/data re-use

Secondary use refers to using data for a different purpose than the one it was originally collected for (i.e. than the primary use).

According to the European Data Governance Act 2020 ‘re-use’ means the use by natural or legal persons of data held by public sector bodies, for commercial or non- commercial purposes other than the initial purpose within the public task for which the data were produced, except for the exchange of data between public sector bodies purely in pursuit of their public tasks [2].

Clinical definition: Secondary use of health data applies personal health information (PHI) for uses outside of direct health care delivery [42].

Secure processing environment

The physical or virtual environment and organisational means to provide the opportunity to re-use data in a manner that allows for the operator of the secure processing environment to determine and supervise all data processing actions, including to display, storage, download, export of the data and calculation of derivative data through computational algorithms [2].

Sensitive data

Information that is regulated by law due to possible risk for plants, animals, individuals and/or communities and for public and private organisations. Sensitive personal data include information related to racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership and data concerning the health or sex life of an individual. These data could be identifiable and potentially cause harm through their disclosure [43].

Service Level Agreement (SLA)

Document that establishes the terms and conditions for integrating a local node into the EUCAIM Federation. It will define the level of service, access to data and processing resources, technical interoperability requirements, and support supplied by the providers. The SLA will also outline service availability targets, constraints, and contact points for addressing any issues or inquiries related to the services within the federation.

Scientific Coordinator (SCo)

The Scientific Coordinator (SCo) of the project is the person who leads the Central Hub operations in all scientific and technical aspects and provides strategic scientific guidance. The Scientific Coordinator is a central figure in conflict resolution and decision-making in the project management bodies and plays a central role in the monitoring of the Project’s overall progress and strategic plans.

Stakeholder

A person such as an institution/hospital/research community who is involved with an organisation, society, etc. and therefore has responsibilities towards it and an interest in its success”

Steering Committee (SC)

The Steering Committee is the highest-level decision-making body of the infrastructure and project consortium. It currently consists of one representative of each project partner entity, being chaired by the Scientific Coordinator. The members of the SC are required to be duly authorised to deliberate, negotiate and decide on all matters which fall under the responsibility of the Steering Committee as laid out in the Infrastructure Statutes.

During the project duration, the SC will discuss and decide on major modifications of the consortium membership (e.g., entry of new partners, withdrawal of partners), as well as on the work plan, project budget, intellectual property rights, etc. A more detailed description of these matters are listed in the Article 6.3.1 of the project’s Consortium Agreement.

Upon project end, the SC is envisioned to have the last word in the decision-making of any unresolved matter at a lower level (e.g. Technical board, Access Committee). In this context, the SC will be convened ad-hoc by its Chair – the Scientific Coordinator. It is expected that each project partner should be represented at the meeting by its designated representative or by their proxy if the former is not available.

Synthetic data

The concept of synthetic data generation is to take an original data source (dataset) and create new, artificial data, with similar statistical properties from it.

Keeping the statistical properties means that anyone analysing the synthetic data, a data analyst for example, should be able to draw the same statistical conclusions from the analysis of a given dataset of synthetic data as he/she would if given the real (original) data.

The use of synthetic data is growing in many fields: from training of artificial intelligence models within the health sector to computer vision, image recognition and robotics fields [44].

Sustainability

The ability of an entity, service or process to be maintained continuously over time [45].

Technical Board (TB)

A committee that provides technical guidance, supervision and control to the project, as part of the project governance structure.

The TB is first tasked with the review of the potential engagement of tools and service providers to EUCAIM. Technical partners have the responsibility to adopt a responsible research and innovation attitude when designing and developing their solutions, by following the guides and requirements of the ethical committees, with the lead and support of the Data Protection Task Force and Ethics Advisory Board.

Technical Showrooms:

Workshops that provide insights into platforms and tools relevant to distributed and federated data analysis. These sessions aim to understand the capabilities and potential fit of different tools and platforms within the overall EUCAIM infrastructure.

Testing Data:

Data used for providing an independent evaluation of the trained and validated AI system in order to confirm the expected performance of that system before its placing on the market or putting into service [4].

Tiers of technical data compliance

To accommodate different levels of data compliance with the DFF, three technical tiers have been established. These tiers are scalable and allow data to be upgraded as the datasets are used in new research projects. Each tier offers increased visibility and usability of the data within the EUCAIM community.

  • Low compliance: public metadata catalogue search
  • Medium level of compliance: federated query functionality
  • Fully compliance: distributed and federated processing

Tool provider

The Tool Provider refers to any entity (startups, enterprises, research institutions, government agencies, non-profit organisations) that would like to contribute with processing tools, services, or applications they have developed to the EUCAIM’s marketplace for use in the federated processing purposes of the platform.

Tools

A digital or computerised resource that assists, enhances or executes an action or process [46].

Training Data

Data used for training machine learning algorithms (e.g., an artificial intelligence (AI) system) through fitting its learnable parameters [4].

Trustworthy AI

AI with proven characteristics such as efficacy, safety, fairness, robustness, transparency, which enable relevant stakeholders such as citizens, clinicians, health organisations and authorities to rely on it and adopt it in real-world practice [40].

Use case

A use case would refer to a description of a specific scenario or situation in which EUCAIM is intended to be used to address a particular problem related to cancer data for clinical purposes. It would outline the steps and interactions involved in the process, and the expected outcomes of applying EUCAIM in a real-world setting. Examples of use cases in this context could include the development of AI-based diagnosis tools or the use of federated data sources to improve patient outcomes. Use cases will be used to drive the design of the EUCAIM infrastructure and help define requirements, test functionality, and validation process for a clinical improvement.

User actions

Specific tasks or interactions that users can perform within the platform. These actions are related to the technical use of the platform and are specific to each user role. These actions are initiated by users to achieve specific objectives within the context of the User Stories, that describe the situation or scenario in which these actions take place.

User story

Descriptions of full interactions of a User Role with the EUCAIM platform, described in natural language. User Stories define in general terms the needs, restrictions, performance limitations, desired features, innovation capabilities and business models for the repository.

User’s Library

Area of the Dashboard where the authenticated Data Users-Researchers can add and remove the references of collections selected from the User’s Catalogue (either filtered using the federated query mechanism or not), request access to them, and view and manage their approved or rejected access requests.

User’s view of the Catalogue / User’s Catalogue

Metadata catalogue available only to authenticated users, offering a completed view of the catalogue and advanced search and filtering options on the multiple sources executing federated queries.

Validation Data

Data used for providing an evaluation of the trained AI system and for tuning its non-learnable parameters and its learning process [4].

The EDPB has stressed that consent to participate in research and consent to data processing are distinct and separate consents. Should a reference to exceptions to consent be included?

EUCAIM Glossary Team

  • Leonor Cerdá Alberich (HULAFE)
  • Pedro Mallol Roselló (HULAFE)
  • Ignacio Gómez-Rico Junquero (HULAFE)
  • Irene Marín Radoszynski (HULAFE)
  • David Vallmanya Poch (HULAFE)
  • Hanna Leisz (DKFZ)
  • Ricard Martinez (UV)
  • Janos Meszaros (KUL)
  • Sara Zullino (EATRIS)
  • Ignacio Blanquer (UPV)
  • Esther Bron (Erasmus MC / Health-RI)
  • Carles Hernández (BSC)
  • Laura Portell (BSC)
  • Katrine Riklund (UMU)
  • Valia Kalokyri (FORTH)
  • Luis Marti-Bonmati (HULAFE)

Do you have Feedback on our glossary?

Get in touch with the EUCAIM Glossary Team!

If you have any comments or suggestions on this glossary, we would very much like to  hear from you!

You can use the form below to reach out to the EUCAIM Glossary Team and send us you feedback.

Join the EUCAIM Consortium

Open Call for New Beneficiaries

We’re inviting new partners to enhance our pan-European infrastructure for cancer images and artificial intelligence.

Whether you’re a data holder with valuable cancer images or an innovator developing AI tools for precision medicine, this is your chance to contribute to a groundbreaking project.

Apply by 10 June 2024!

Attend the Open Call Webinar

We’re hosting a webinar with more details on how to apply to the Open Call on April 26 from 10:00 – 11:30 CEST

Our open Call for new collaborators
launches in April 2024

Opportunities for data holders & AI developers to contribute await! Let‘s join forces to enhance cancer diagnosis and treatment

Be the first to know and apply!

SAVE THE DATE
March 14, 10:00-11:30 aM CET

DISCOVER THE CANCER IMAGE EUROPE PLATFORM

TECHNICAL DEMONSTRATION WEBINAR

Explore the potential for AI-driven cancer care advancements!
Learn how to access and utilize our federated cancer image repository. The webinar is for AI Innovators & Data Providers interested in the platform and will feature an introduction to EUCAIM & Cancer Image Europe and a demonstration of data exploration & access.

Survey Invitation

Join Leading Experts In Shaping AI In Cancer

EUCAIM is looking for your feedback! We have recently published a Stakeholder Survey in order to reach out to potential end-users and stakeholders. We believe that your insights could significantly contribute to understanding the expectations of potential users and identifying the essential aspects that stakeholders find crucial for future engagement and collaboration with the platform.

Therefore, we would like to invite you to participate in the Stakeholder Survey about the Cancer Image Europe platform.

Completing the survey will take approximately 10 minutes. Your participation is crucial to the success of this project, and we deeply appreciate your expertise in shaping the future of cancer imaging and treatment.