Access Committee (AC)
EUCAIM Governing Body that controls the access to the Atlas of Cancer Images. It reviews the evaluation reports to provide a final decision about the acceptance/rejection of data and/or tools (by data providers), and the R&D requests (by data users-researchers). The body will ensure responsible and secure access to the infrastructure data and services, promoting valuable research while upholding ethical and privacy standards.
Acceptance process
The Management Board (MB) makes a decision of acceptance or rejection within a period of 60 days, supported by the internal governance bodies on ethics and legal compliance and taking into consideration the indications from the Access Committee and Steering Committee.
Advisory Boards (AB)
The Advisory Boards (AB) are a group of external experts established during the course of the project to advise the Management Board on technical, ethical and related legal issues as well as on exploitation and regulatory matters. These boards will involve participants that are not part of the consortium members, in order to provide a fresh-eye, unbiased view on the decision making of the rest of boards. Even after the project concludes, the AB is envisioned to continue to provide external, unbiased advice on any decision-making regarding the day to day operations of the infrastructure, both at the technical and legal level.
Administrative Project Coordinator (AdmCo)
The administrative project coordinator is responsible for the mediation between the project consortium and the funding authority, the European Commission (EC). Acting as the main point of contact with the EC, the AdmCo is responsible for the overall administrative and financial management of the EUCAIM project. The administrative coordinator is also tasked with the technical review of deliverables and milestones and financial reporting. Finally, it is currently envisioned that the AdmCo will oversee all managerial aspects of the Central Hub Office, with the overall purpose of supporting the implementations of the activities planned in the periodic strategic plan for the maintenance of the infrastructure.
Aggregated data
Aggregated data is pooled data. Statistical data about several individuals that have been combined to show general trends or values within the data [1].
Artificial Intelligence Act
Proposed legislation by the European Union (EU) aimed at regulating Artificial Intelligence (AI) technologies within the EU. The act seeks to ensure ethical, transparent, and accountable AI practices while fostering innovation and competitiveness [49].
Analysis Platform
Component within the federated processing infrastructure for executing tasks related to data analysis, including AI training and inference, while upholding data privacy and regulatory compliance. It provides a user interface, comprising both a dashboard and API, where users can initiate experiments, monitor processes, and retrieve results.
Anonymization
The processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject. Irreversible removal of personally identifiable information (i.e., all directly and indirectly identifying information) definitely not allowing the identification of the data subjects [2]. The methods used to anonymize the data depend on the context and the technology used (such as DICOM tags removal and facial erasing); this process must take into account the recommendations of the Data Protection Authorities of each EU Member State.
Annotation Hackathons
Workshops organised to collect necessary metadata from available tools following the recommendations and standards of the ELIXIR infrastructure. These events focus on enhancing the description and registration of software tools and modules to be utilised in the EUCAIM infrastructure.
Atlas of Cancer Images
Data and service environment of the federation for aiding cancer research (during the project as well as after its conclusion for future utilisation). The Atlas of cancer images includes de-identified images both from the Central Hub, as well as federated nodes, plus the data from the European research repositories.
Biometric data
Personal data resulting from specific technical processing relating to the physical, physiological or behavioural characteristics of a natural person, which allow or confirm the unique identification of that natural person, such as facial images or fingerprint data [3] [4].
Budget for Open Call
Available funds allocated for new beneficiaries. These beneficiaries will receive funding under the same co-funding conditions as consortium partners (i.e. 50% of the budget; a total budget of €3,600,000 has been included with the COO EIBIR, and the maximum amount per grant is €200,000).
Calibration
In prediction models, calibration refers to the concordance between predicted and observed probabilities.
Central Dashboard
Website intended for Data Users-Researchers who want to use EUCAIM data for analysis in the context of research and innovation. Starting from the dashboard, data users can see the metadata of datasets in a public catalogue, are able to register into the platform and search for metadata (e.g., disease, imaging modalities, age groups), can request access to data (if needed), can apply processing tools to the data, can obtain analysis results, and may inform the providers of interesting results obtained for their consideration. In addition, the dashboard will also guide data providers to the documentation page and a request form.
Central Hub
Infrastructure comprising the Central Repository, Central Dashboard, and the services and tools provided by the EUCAIM platform.
Central Hub Office (CHO)
The Central Hub Office is responsible for all functions necessary in accordance with the infrastructure´s statutes, the needs of its ordinary functioning and compliance with the legal requirements for an entity of its nature. The CHO will comprise experts in cloud infrastructure maintenance, technical support, legal matters, fundraising and project management, IPR, dissemination and promotional actions, as well as administrative and financial management.
Central Repository/Central Storage
The Central Repository or Central Storage is the central service to hold data, make data available for use, and organise data in a logical manner. In EUCAIM imaging data, clinical data and genetic data could be deposited in the central repository following the data management procedures in place.
Chief Security Officer (CSO)
Person responsible for a company’s physical and digital security. The CSO provides executive leadership and oversees the identification, assessment and prioritisation of risks, directing all efforts concerned with the security of the organisation. The CSO works to stay ahead of security issues (e.g., security breaches), solve problems and ensure the organisation runs smoothly. Additionally, CSO expertise is required to implement safeguards and reporting risk management mechanisms for regulation compliance [5].
Clinical Data
Clinical data encompasses a diverse set of information integral to the EUCAIM infrastructure, extending beyond imaging data to include a range of critical details relevant to medical research and healthcare. This category involves comprehensive clinical information that accompanies the images, providing contextual insights into patients’ health conditions. It covers various aspects, such as mutation status, results from biological samples, quality of life assessments, quality of care metrics, and health-related costs.
Cloud
Network of computing facilities providing remote data storage and processing services through the internet [6].
Cloud computing
Paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with administration on-demand [7].
Collaboration Agreement (ColA)
Document that expresses the willingness of the Parties to collaborate by establishing an overarching framework to facilitate interaction and exchange of information between the parties.
Common Data Model (CDM)
A CDM is a standardised framework that defines both the structure and semantics of diverse datasets using ontologies, coding systems, and formal documentation. Clinical data standards provide a common structure and content of observational data, enabling interoperability and more efficient analyses that can produce reliable evidence. Within the EUCAIM project, two potential candidates for the CDM have been identified: HL7 FHIR and OHDSI-OMOP.
Consent
An individual’s agreement, e.g. to participate in research, undergo a healthcare procedure, to personal data processing.
Within the context of personal data, the General Data Protection Regulation (GDPR) defines consent as: “Any freely given, specific, informed and unambiguous indication of the data subject’s wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to him or her” [8].
Consortium Agreement (ConA)
The Consortium Agreement, specifies the rights and obligations of the project partners, it also establishes the relations between the partners themselves.
Data
Data can be defined as the recorded factual material that is commonly accepted in the scientific community as information that is required to support research findings [9] [10]. Refers to any digital representation of acts, facts or information and any compilation of such acts, facts or information, including in the form of sound, visual or audiovisual recording [2]. There are four major categorical types of data for where the data comes from: observational; experimental; simulated and derived [11]. Data is information available for processing. Specifically, the types of data that the EUCAIM infrastructure is interested in collecting are imaging data (radiological and nuclear medicine cancer images of any modality, segmentation masks with the annotations made and histopathological images) and other clinical data (clinical information accompanying the images, mutations status, biological sample results, quality of life, quality of care and health costs).
Data access
The processing of data by a data user, which was provided by a data provider, in accordance with specific technical, legal, or organisational requirements, without necessarily implying the transmission or downloading of such data (see Personal data) [2]. Three data access conditions are offered in EUCAIM: authorisation to download the datasets; authorisation to access, view and process them in-situ; or authorisation to remotely process the datasets from a federated node without the ability to access and visualise data, even remotely.
Data access right
The ability, right or permission to act on data in a defined location [12]. Data access in EUCAIM will be limited to authorised individuals or organisations based on specific permissions or roles, and always upon request.
Data Act
A forthcoming regulatory proposal within the European Union aiming to establish uniform guidelines for accessing product or associated service data by end-users of interconnected products or services. This regulation encompasses crucial provisions delineating the prerequisites for data space interoperability (Article 28) and mandates governing the implementation of data sharing agreements through smart contracts (Article 30), thus facilitating seamless data exchange and fostering digital innovation across the EU [51].
Data altruism
Voluntary sharing of data on the basis of the consent by data subjects to process personal data pertaining to them, or permissions of other data holders to allow the use of their non-personal data without seeking a reward, for purposes of general interest, such as scientific research purposes or improving public services [2].
Data Annotation
A process within the realm of data science and machine learning, involving the labelling or tagging of data points with informative metadata to enhance their interpretability and utility for computational algorithms. This method facilitates the training and optimization of machine learning models by providing context and structure to raw data, enabling more accurate analysis and predictive capabilities.
Data Centric Health Research Computational Infrastructure
Infrastructure that provides data as a service. This infrastructure includes services, such as data visualisation, hosting and processing of data. In particular, it can process health-related sensitive data. Technological infrastructures for data analysis, exploitation and/or processing.
Data collection
This term will not be used in the EUCAIM framework. Please refer to Dataset.
Data Concerning Health
Personal data related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about his or her health status.
Data Curator
A person who is responsible for the quality and FAIRness of the health-related data, and to make sure the value of the data is discovered and accessible. This role also considers the possibility of enriching data when increasing its quality. Importantly, data curators might play a role regarding being processors, e.g. responsible for the data at hand.
Data discoverability
The ability or a mechanism to browse and locate available data relevant to a specific user’s purpose (e.g., research project) in a non-targeted search. Data is more discoverable if the datasets have a metadata catalogue, and the metadata catalogue is publicly accessible. Discoverability is related to findability from the FAIR principles.
Data Federation Framework (DFF)
System that enables integration and querying of distributed data sources without physically centralising the data, maintaining autonomy of the individual databases. Also, provides a unified interface for querying and retrieving information from several sources, promoting interoperability and facilitating real-time decision-making.
Data governance
Assembly of policies and processes, coordination aspects, data usage and accessibility principles and data management procedures for a certain health data infrastructure to ensure legal compliance, consistency and good data quality throughout the different stages of the data life cycle.
Data Governance Act
European regulation crafted to establish a structured framework fostering European data spaces and fostering trust among stakeholders within the data market (DGA). Enacted in June of 2022, its provisions came into effect in September 2023, heralding a new era of data governance and collaboration in the European landscape [50].
Data harmonisation
The process of removing systematic differences between images acquired from different scanners (i.e., inter-scanner variability) via statistical methods. Such techniques enable multi-center datasets and derive greater power from results than when centres work independently. Given the high economic costs of imaging, multi-center collaboration is the most feasible way to acquire large imaging datasets [13].
Data intermediation service
Service aimed at fostering commercial engagements for the purpose of facilitating data sharing among an indeterminate cohort of data subjects, data holders, and data users. This facilitation is achieved through various technical, legal, or alternative means, with a particular emphasis on upholding the rights of data subjects concerning personal data. This definition excludes the following categories:
- Services engaging in the aggregation, enrichment, or transformation of data from data holders to augment its value significantly, subsequently licensing the resultant data for use by data users without forging direct commercial relationships between data holders and users.
- Services primarily focused on intermediating copyright-protected content.
- Services exclusively employed by a single data holder to facilitate data usage or utilised by multiple legal entities within a confined consortium, such as supplier-customer relationships or contractual collaborations, especially those aimed at sustaining the functionalities of interconnected Internet of Things (IoT) devices.
Data sharing services extended by public sector entities lacking the intent to establish commercial ties (DGA – Article 2 (10) [48].
Data mapping
The process of matching fields from multiple datasets into a centralised database. It is required to transfer, ingest, process, and manage data [14].
Data Holder (DH)
A Data Holder refers to any natural or legal person, including entities, bodies, and research organisations in the health or care sectors, as well as European Union institutions, bodies, offices, and agencies, who has the right, obligation, or capability to make certain data available for research purposes. This may include registering, providing, restricting access to, or exchanging the data. Examples of Data Providers include data repositories, regional biobanks, clinical centres, cancer screening programs, public entities, pharmaceutical companies, data altruism initiatives, and publication repositories. These infrastructures may host one or more datasets for discovery and retrieval, and the exposure and access to data in the Dashboard will be provided at the dataset level.
Data Protection Task Force
Body that plays the role of the Data Protection Officer (DPO) during both the project execution and beyond. It will monitor internal compliance, inform, and advise on data protection obligations, provide advice regarding Data Protection Impact Assessments and act as a contact point for all the partners and data subjects (the results of this task being documented in D3.6 – Data Management Plan). During the project execution phase, the main representatives of this task force will involve the DPOs of each consortium partner. Upon project end, the members of this board may need to be re-elected.
Data quality
The degree to which a set of inherent characteristics of data fulfils requirements [15].
Notes: The requirements are defined by the purpose of the processing and hence data quality can be viewed in other words also as a “fitness for purpose”. The purpose can be any use of the data, including primary use or secondary use.
For the purpose of data protection, data quality refers to a set of principles laid down in Article 5 of the GDPR and Article 4 of Regulation (EU) 2018/1725, namely [16]:
- Lawfulness, fairness and transparency
- Purpose limitation
- Data minimization
- Accuracy
- Storage limitation
- Integrity and confidentiality
Data recipient
An individual or entity, whether legal or natural, engaged in activities pertinent to their trade, business, craft, or profession, distinct from the user of a product or associated service, to whom the custodian of data furnishes information. This includes third parties to whom data is disclosed upon request by the user to the data holder, or in compliance with obligations delineated by Union law or national legislation implementing Union directives.
Data sharing
Provision of data by a data provider to a data user for the purpose of joint or individual use of the shared data, based on conditions of use, directly or through an intermediary [2].
Data Sharing Agreement (DSA)
Agreement between two or more parties that outline which data will be shared and how the data can be used.
Data sovereignty
Data stored outside of an organisation’s host country are still subject to the laws of the country where the data are stored [17].
Data space
A distributed system delineated by a governance framework, facilitating secure and reliable data transactions among participants, with a focus on upholding trust and data sovereignty. A data space typically consists of one or more infrastructures and supports various use cases.
Data steward
A person who has an administrative role; and does not really use the data. Data Stewardscreate guidelines to make data FAIR and advice on how to do it. They might have direct responsibility on the data at hand (processors) or not.
Data subject
As defined in the GDPR, in the case of data processing, a data subject is a person who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person [18].
Data transaction
The outcome of an interaction between two participants, aimed at sharing, accessing, exchanging, or processing data.
Explanatory Text
A data transaction denotes the sharing of data among involved participants, encompassing technical, financial, legal, and organisational arrangements required to facilitate the availability of a dataset from Participant A to Participant B. The physical transfer of data may or may not occur concurrently with the data transaction.
Data Transfer Agreement
Agreement established between organisations that governs the transfer of one or more data sets from the owner/provider to a third party.
Data User-Researcher
Any person or entity that wants to explore the public catalogue and eventually request access to data and process them using either the tools available in the platform or their own AI tools. This data access request by the Data User-Researcher should be made through a Research and Development (R&D) project that will be evaluated by the Access Committee.
Dataset
Dataset refers to a specific set of imaging and accompanying clinical information, published by a single Data Provider and created for a particular purpose or study. A dataset is described by a set of common metadata elements related to the imaging and clinical information, dataset creation, access rights and terms of use. Data Users-Researchers will be able to request access at the Dataset-level.
De-identification
General term for any process of removing the association between a set of identifying data and the data subject (22). De-identification refers to the removal of identifiers (e.g. name, address, National Registration Identity Card number) that directly identify an individual. De-identification is sometimes mistakenly equated to anonymisation, however it is only the first step of anonymisation. A de-identified dataset may easily be re-identified when combined with data that is publicly or easily accessible [54].
De facto anonymization
Operations by which personal identifiers are removed and further techniques to reduce personal reference (e.g. randomization or generalisation) are applied so that re-identification with reasonable efforts in accordance with the current state of art is no longer possible. In terms of GDPR compliance this concept requires that data is kept in a closed secure environment that will exclude any external attack. An “attacker” here is a third party (i.e. neither the data provider nor the data user) who accesses the original data sets accidentally or intentionally.
Demonstration Experiments
Computational experiments performed with selected platforms to showcase their capabilities. In the context of the EUCAIM project, these experiments demonstrate the functionality of federated learning platforms and distributed analysis tools in solving specific data analysis challenges. The outcomes contribute to understanding technical issues in a real distributed scenario.
ELIXIR Tools Platform
Centralised resource for accessing and discovering tools in the life sciences, promoting collaboration and efficiency in research. It includes recommendations for software registration in bio.tools ELIXIR registry, packaging with Biocontainers, and participation in services like OpenEBench for software quality monitoring.
ETL process
The three-phase process where data is extracted, transformed (cleaned, sanitised, scrubbed) and loaded into an output data container.
Ethical and Legal Board (ELB)
Body in charge of ensuring that no EU rule is violated, while ensuring that the research conducted is up to the accepted EU standards. In this context, the term “Ethics” refers to questions of legal and regulatory compliance that constitute a part of the governance process. In EU-funded projects, ethics is deemed a transversal issue and Ethics Advisory Board a key oversight mechanism to ensure understanding of the Ethics Appraisal Procedure, proper implementation of the Ethics Requirements, addressing specific issues such as Privacy and Data Protection Impact Assessments or Artificial Intelligence and ensuring ethics compliance in general. The ELB will act as a contact point for guidance on ethical issues that may arise during project execution and beyond project end, working in close connection with any party saddled with ethics-related responsibilities. During the project execution, the ELB will be chaired by the WP3 leaders and composed of legal experts in the participating entities. Beyond project end, the members of this board may be reselected based on availability.
EUCAIM Federation / European Federation of Cancer Images
The entity as a whole, which encompasses both the central and federated components (the central repository functions as another node within the federation). The term “federation” encompasses the overall scope of EUCAIM, involving the governing bodies and orchestration of all nodes, whether central or federated. It constitutes the collective framework for coordination and governance.
EUCAIM Platform
The overarching framework that combines the distributed data throughout the federation, including both central and federated components, with the services facilitating their use. The platform serves as the integrated infrastructure, providing access to images and associated services within the EUCAIM Federation.
EUCAIM Infrastructure
The collective technical foundation supporting the EUCAIM Federation. It includes both central and federated components, forming the backbone that enables data distribution, access, and associated services.
European Data Innovation Board
A distinguished expert group convened pursuant to the mandates outlined in the Data Governance Act (DGA), entrusted with advising the European Commission on the dissemination of exemplary methodologies. Its focal areas encompass data intermediation, data altruism, and the judicious utilisation of public data not amenable to open data practices. Additionally, the EDIB is tasked with orchestrating the harmonisation of cross-sectoral interoperability standards, thus presenting proposals for harmonised guidelines governing European data spaces, as stipulated in Article 30 of the DGA. Further, the EDIB is slated to acquire expanded competencies under the auspices of the Data Act [52].
European Digital Infrastructure Consortium (EDIC)
The Digital Decade policy programme 2030 establishes a new legal framework for multi-country projects, the European Digital Infrastructure Consortium. It is a new instrument to help Member States speed up and simplify the setup and implementation of multi-country projects. A minimum of three Member States who want to use a European Digital Infrastructure Consortium to set up a multi-country project will submit an application to the Commission. Following the examination of Member States’ application, the Commission will, if it concludes that all requirements provided for in the decision are satisfied, adopt a decision establishing the European Digital Infrastructure Consortium. Each consortium will have its own legal personality, governing body, statutes, and seat in a participating Member State [20].
External partners
In the context of an EU project, an external partner typically refers to an organisation or entity that is not a part of the original consortium, but is engaged or involved in the project in some way (e.g. via Open Call). External partners may include organisations, institutions, companies, or individuals who collaborate with the consortium members to contribute and broaden the project’s objectives, outcomes, or activities becoming part of the consortium and, therefore, the internal partners.
External use case
Any use case conducted by external partners (see also definition for “use case”).
Federated Catalogue
Metadata catalogue that stores the clinical and imaging metadata within the different federated nodes of the Atlas of Cancer Images, as a federated search endpoint compliant with the EUCAIM federated query requirements.
Federated data analysis
Federated data analysis describes an analysis that is performed on multiple (often geographically) separated datasets. During this analysis, the data is not exchanged and can stay, for example, behind a given institution’s firewall. Only the interim results of a local analysis are exchanged between the data-hosting sites [21]. The aggregated non-identifiable results from each local analysis are pooled and returned to the data user.
Federated learning
This is a specific case of federated data analysis, for machine learning purposes. It is a learning technique that allows users to collectively reap the benefits of shared models trained from rich datasets. The learning task is conducted across multiple separate sites coordinated centrally. Each site has a local training dataset which is never shared. Instead, each site computes an update to the current global model maintained centrally, and only this updated model is communicated [22].
Federated Node/Local Node
Infrastructure deployed in a Data Provider that meets the hardware and software requirements of the EUCAIM project and that has been configured and connected to the federated network, being able to access Hyper-ontology compliant federation-exposed collections (under research project approval) and execute federated processing, including federated learning.
Federated Processing (FP) Infrastructure
Technical framework designed to facilitate federated analysis, which involves processing data without centralising it. In the context of the EUCAIM project, FP Infrastructure enables the execution of tasks, including AI training and inference, while keeping data decentralised at their original sites, adhering to specific regulatory frameworks.
FAIR Principles
Principles to define the Findability, Accessibility, Interoperability, and Reuse of resources for humans and computers at the source. For example, the principles emphasise machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data [23].
- Findable: Data and supplementary materials have sufficiently rich metadata and a unique and persistent identifier.
- Accessible: Metadata and data are understandable to humans and machines. Data is deposited in a trusted repository.
- Interoperable: Metadata uses a formal, accessible, shared, and broadly applicable language for knowledge representation.
- Reusable: Data and collections have a clear usage licence and provide accurate information on provenance [24].
Filing system
Any structured set of personal and non-personal data which are accessible according to specific criteria, whether centralised, decentralised or dispersed on a functional or geographical basis [25].
Governing Body
The party that encompasses the board of EUCAIM and can decide the approval, comment, or refusal of an application of data access, and data or tool provisioning, supported by legal, ethical, and technical boards.
Genetic Data
Personal data relating to the inherited or acquired genetic characteristics of a natural person which give unique information about the physiology or the health of that natural person and which result, in particular, from an analysis of a biological sample from the natural person in question [26].
Health data
Personal data related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about his or her health status [27].
Health information
All organised and contextualised data on population health and health service activities and performance, individual or aggregated, that improves health promotion, prevention, care, cure and policy-making [28].
Health Information System (HIS)
A health information system is the total of resources, stakeholders, activities and outputs enabling evidence-informed health policy-making. The health information system manages all types of health data, from EHRs to imaging data and population health data. HIS activities include data collection, interpretation (analysis and synthesis), health reporting, and knowledge translation, i.e. stimulating and enhancing the uptake of health information into policy and practice. Health information system governance relates to the mechanisms and processes to coordinate and steer all elements of a health information system [29].
Hospital
Within the EUCAIM project framework, affiliated hospitals constitute a distinct subset of Data Providers. In this case, hospitals will not expose their data warehouses to the federation, which may already exist or may have been created for EUCAIM. Instead, they will be approached individually each time there is a new research project on specific clinical cases. If they choose to participate in the project, hospitals will prepare the necessary anonymised datasets within their data warehouses, and these datasets will be shared with the federation through a federated node or by uploading them to the Central Storage. Therefore, hospitals will only expose metadata for specific datasets from projects in which they have chosen to participate upon request.
Hyper-ontology
Hyper-ontology refers to the creation of a comprehensive and unified semantic representation of fundamental knowledge defined in the local AI4HI projects. The main purpose of the Hyper-ontology is to facilitate the development of validated clinical decision-making systems that support diagnosis, treatment, and predictive medicine, to benefit citizens. The Hyper-ontology serves as a standardised framework that enables interoperability between different projects using OMOP and FHIR standards. It allows for the expression of federated queries, enabling the analysis of distributed data sources. Additionally, a subset of the Hyper-ontology will be dedicated to describe datasets.
Internal use case
Any use case conducted by external partners (see also definition for “use case”).
Imaging study
Defined as the utilisation of a variety of imaging techniques to acquire visual representations used as tools for screening, detection and monitor of cancer.
Legal entity
A company or organisation that has legal rights and responsibilities. From a legal point of view, it has its own personality and full capacity to fulfil its purposes. In legal relations, it is the holder of rights and obligations. It can be created directly by the law or in accordance with the provisions of the law.
Licensing Agreement
In Europe, the licence is generally considered as a contract between a Licensor (the author of the software) and a Licensee (the user of the software, who can then use it according to the licence terms). Note that if the Licensee does not agree to the licence terms, he/she normally does not have the right to use, copy, change or distribute the software. If the Licensee does this without agreeing to the licence terms, he/she is violating copyright law.
Machine learning
A subset of AI techniques based on the use of statistical and mathematical modelling techniques to define and analyse data. Such learned patterns are then applied to perform or guide certain tasks and make predictions [30].
Management Board (MB)
The operational body responsible for the monitoring of the technical progress of the project, quality assurance, and the ad-hoc coordination of scientific and technological activities. It comprises the Administrative Project Coordinator, the Scientific Coordinator (SCo) (chair), and all Work Package leaders (WPLs).
Upon project end, the MB is also envisioned to be in charge of any decision making regarding any technical implementations and quality control of all operations regarding the day-to-day functioning of the infrastructure, including the coordination of scientific activities around it.
Marketplace
Centralised platform within the EUCAIM federation that facilitates the exchange and distribution of processing tools, services and applications developed by Software Providers. It serves as a repository where Software Providers can contribute their tools for federated processing or data preprocessing purposes to be used by Data Users-Researchers.
Mediator
A component responsible for connecting to the central infrastructure, translating the federated query to the site’s Structured Query Language (SQL) for sites providing OHDSI OMOP-CDM compliant data, Clinical Query Language (CQL) for sites providing FHIR compliant data), aggregating (and optionally obfuscating) the results, and finally returning the aggregated results to the central components. The Mediator acts as a sort of middleware and is deployed at the site of each Data Junction.
Memorandum of Understanding (MoU):
A memorandum of understanding (MoU) is a type of agreement between parties. It expresses a convergence of will between the parties, indicating an intended common line of action. It is often used either in cases where parties do not imply a legal commitment or in situations where the parties cannot create a legally enforceable agreement.
Message Broker
A software intermediary that enables seamless communication between various components within the federated processing system. This broker functions as a central hub, overseeing the exchange, routing, and delivery of messages. Specifically, it plays a key role in coordinating tasks related to federated analysis, ensuring the secure and efficient flow of information between the Analysis Platform and distributed nodes.
Metadata
A set of data that defines and describes a resource (e.g., data, dataset, sample…) so that it can be understood, discovered and reused. There are different levels of metadata. Since metadata can be used to describe different aspects of data, we can group metadata properties in terms of quality, availability, provenance, processing, among others. Then there are metadata catalogues that can be used to describe the available datasets. Metadata is important to make data understandable, and can contribute to increase the findability, accessibility, interoperability and reusability of the data. Metadata can be collected or compiled in repositories to improve the level of compliance with FAIR principles of the datasets.
Metadata harvesting
Automated collection of metadata descriptions from different sources to create useful aggregations of metadata and related services [31].
Negotiator
The Negotiator is a specialised tool integrated into the EUCAIM Dashboard and designed to facilitate the exchange of documents and information between User-Researchers and the Access Committee. On the one hand, the Negotiator allows users to submit requests for data or software to one or several holders as selected in a previous discovery step in the EUCAIM catalogue. On the other hand, the Negotiator also allows users to build new research projects by facilitating the negotiation with a specific EUCAIM network of contacts according to their objectives and needs. In both cases, the negotiation mechanism allows the Access Committee and, ultimately the Data or Software Holder itself, where appropriate, to (a) to obtain more information from the requestor to better understand the reason of the request and the requested data in this broadcast mode, (b) to enter a negotiation process with the requester, or (c) to step back from a request in case thinking of not being able to fulfil what was requested for some reason.
Non-personal data
All data other than personal data. Note that non-personal data could be inextricably linked with personal data or be used in order to obtain inferences of persons’ qualities; in such case, GDPR and national data protection laws must apply [2].
Open Call
An open call for extending the real-world use cases will be launched. The open call is only for new beneficiaries to join the consortium. This open call will follow the guidelines stated in the call with respect to publication and openness and will pursue: i) the onboarding of new data providers, increasing the geographic dimensions, data modalities or cancer targets; and ii) the uptake of new trustworthy AI algorithms trained on the data of the repository. The open call will be published by the beginning of the second project period, in compliance with the terms and conditions stated by the European Commission (see Budget for Open Call).
Ontology Requirements Specification (ORS)
ORS refers to the activity of collecting the requirements that the ontology should fulfil (e.g., reasons to build the ontology, target group, intended uses), ontology requirements (e.g., groups of competency questions) and possibly reach through a consensus process.
Open data
Data that is freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. Open licence is a licence agreement which contains provisions that allow other individuals to reuse another creator’s work, giving them four major freedoms. Without a special licence, these uses are normally prohibited by copyright law or commercial licence. Most free licences are worldwide, royalty-free, non-exclusive, and perpetual (see copyright durations). Free licences are often the basis of crowdsourcing and crowdfunding projects [32].
Open science
The movement to make scientific research (including publications, data, physical samples, and software) and its dissemination accessible to all levels of an inquiring society, amateur or professional. Open science is transparent and accessible knowledge that is shared and developed through collaborative networks. It encompasses practices such as publishing open research, campaigning for open access, encouraging scientists to practise open-notebook science, and generally making it easier to publish and communicate scientific knowledge [33].
Orchestrator
Daemon that operates at each data node within the federated processing infrastructure. It connects to the Message Broker to obtain assigned tasks and initiates the execution of software required for the federated processing. The Orchestrator interacts with the local execution infrastructure, ensuring the proper execution of tasks on each data node.
Permission
In the realm of data governance, “permission” denotes the authorization granted to data users, empowering them with the right to process non-personal data. (DGA Art. 2(6)) []
Personal data
According to Article 3 (1) of Regulation (EU) 2018/1725: “‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person” [34].
Name and the social security number are two examples of personal data which relate directly to a person. However, the definition extends further and also encompasses for instance e-mail addresses and the office phone number of an employee. Other examples of personal data can be found in information on physical disabilities, in medical records and in an employee’s evaluation.
Personal data which is processed in relation to the work of the data subject remain personal/individual in the sense that they continue to be protected by the relevant data protection legislation, which strives to protect the privacy and integrity of natural persons.
As a consequence, data protection legislation does not address the situation of legal persons (apart from the exceptional cases where information on a legal person also relates to a physical person).
Personal data breach
A breach of security leading to the accidental or unlawful destruction, loss, alteration, unauthorised disclosure of, or access to, personal data transmitted, stored or otherwise processed [35].
Platform Manager
A technical expert or team of experts that will operate the core services of the EUCAIM platform. The Platform Manager is responsible for managing and maintaining the underlying infrastructure of the central storage, including servers, databases, and other resources. The Platform Manager manages user accounts and access permissions, deploys applications, and services, uploads new applications to the marketplace (provided by Software Providers), and ensures their proper integration into the platform. They support the orchestration of federated processing, working with Data Providers/Holders/Controllers and Software Providers to integrate metadata, tools, and services, and ensuring that Data User-Researchers queries are properly executed. As a team of experts, it is possible to have multiple platform managers assigned to different roles such as security and data privacy, administration, development, system management, etc. Additionally, the Platform Manager provides user support, responds to inquiries, provides documentation, and troubleshoots issues that arise with the platform.
Platform User Roles
A user profile identified by a name and the user stories that (s)he could do, which determine the access permissions and any other authorised activity required to perform the actions in the user stories.
Primary use of data
The use of any data for the purpose for which it was originally collected.
Privacy-preserving learning techniques
AI-based merging techniques that help preserve patients’ privacy. The building blocks of privacy-preserving machine learning are federated learning, homomorphic encryption, and differential privacy. They can use tricks from cryptography and statistics [36].
Processing (personal and non-personal)
Any operation or set of operations which is performed on data or on datasets, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction [37].
Profiling
Any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular to analyse or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements [38].
Pseudonymization
The processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person [39].
Public catalogue
Metadata catalogue available to anonymous and authenticated users, offering the visualisation of the datasets metadata, with basic centralised filtering/faceted search options. This catalogue stores metadata, offering the Data User-Researchers basic descriptive information about the available datasets and their data access conditions.
Repository
A storage for digital information, typically organised in the form of a catalogue of datasets, that can be searchable and can provide access to the data under given conditions.
Responsible AI
AI that is designed, developed, evaluated, and monitored by employing an appropriate code of conduct and appropriate methods to achieve technical, clinical, ethical, and legal requirements (e.g., efficacy, safety, fairness, robustness, transparency) [40].
Research Communities (RCs)
Groups or entities with a common research goal, typically formed through the course of already finalised,currently ongoing or newly emerging projects, that would like to make use of EUCAIM’s research environment to continue the research their original project facilitated in the first place. With this, the community taking part in that project (e.g. consortium) will need to agree to transfer the data collected (together with the tools developed through the project lifespan where applicable) to EUCAIM’s Central Repository. The expectation is that the Research Community (RC) will remain connected via EUCAIM and will as a result be able to continue and further expand the work done in the scope of such a project via EUCAIM. In return, EUCAIM will include the related datasets in its catalogue, providing the RCs with a secure and highly interoperable environment and enabling them to initiate new projects within the EUCAIM infrastructure, while establishing new collaborations with other partners connected to EUCAIM.
Research software
As it includes individual pieces of software (e.g. tools), analytical workflows (composition of two or more individual tools and eventually other workflows), platforms (e.g. for federated learning) and other auxiliary software that is important to carry on the scientific activities expected in the project.
Restriction of processing (personal and non-personal data)
As defined by the GDPR, methods by which to restrict the processing of data could include, inter alia, temporarily moving the selected data to another processing system, making the selected personal data unavailable to users, or temporarily removing published data from a website. In automated filing systems, the restriction of processing should in principle be ensured by technical means in such a manner that the personal data are not subject to further processing operations and cannot be changed. The fact that the processing of data is restricted should be clearly indicated in the system [41].
Secondary use of data/data re-use
Secondary use refers to using data for a different purpose than the one it was originally collected for (i.e. than the primary use).
According to the European Data Governance Act 2020 ‘re-use’ means the use by natural or legal persons of data held by public sector bodies, for commercial or non- commercial purposes other than the initial purpose within the public task for which the data were produced, except for the exchange of data between public sector bodies purely in pursuit of their public tasks [2].
Clinical definition: Secondary use of health data applies personal health information (PHI) for uses outside of direct health care delivery [42].
Secure processing environment
The physical or virtual environment and organisational means to provide the opportunity to re-use data in a manner that allows for the operator of the secure processing environment to determine and supervise all data processing actions, including to display, storage, download, export of the data and calculation of derivative data through computational algorithms [2].
Sensitive data
Information that is regulated by law due to possible risk for plants, animals, individuals and/or communities and for public and private organisations. Sensitive personal data include information related to racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership and data concerning the health or sex life of an individual. These data could be identifiable and potentially cause harm through their disclosure [43].
Service Level Agreement (SLA)
Document that establishes the terms and conditions for integrating a local node into the EUCAIM Federation. It will define the level of service, access to data and processing resources, technical interoperability requirements, and support supplied by the providers. The SLA will also outline service availability targets, constraints, and contact points for addressing any issues or inquiries related to the services within the federation.
Scientific Coordinator (SCo)
The Scientific Coordinator (SCo) of the project is the person who leads the Central Hub operations in all scientific and technical aspects and provides strategic scientific guidance. The Scientific Coordinator is a central figure in conflict resolution and decision-making in the project management bodies and plays a central role in the monitoring of the Project’s overall progress and strategic plans.
Stakeholder
A person such as an institution/hospital/research community who is involved with an organisation, society, etc. and therefore has responsibilities towards it and an interest in its success.
Steering Committee (SC)
The Steering Committee is the highest-level decision-making body of the infrastructure and project consortium. It currently consists of one representative of each project partner entity, being chaired by the Scientific Coordinator. The members of the SC are required to be duly authorised to deliberate, negotiate and decide on all matters which fall under the responsibility of the Steering Committee as laid out in the Infrastructure Statutes.
During the project duration, the SC will discuss and decide on major modifications of the consortium membership (e.g., entry of new partners, withdrawal of partners), as well as on the work plan, project budget, intellectual property rights, etc. A more detailed description of these matters are listed in the Article 6.3.1 of the project’s Consortium Agreement.
Upon project end, the SC is envisioned to have the last word in the decision-making of any unresolved matter at a lower level (e.g. Technical board, Access Committee). In this context, the SC will be convened ad-hoc by its Chair – the Scientific Coordinator. It is expected that each project partner should be represented at the meeting by its designated representative or by their proxy if the former is not available.
Synthetic data
The concept of synthetic data generation is to take an original data source (dataset) and create new, artificial data, with similar statistical properties from it.
Keeping the statistical properties means that anyone analysing the synthetic data, a data analyst for example, should be able to draw the same statistical conclusions from the analysis of a given dataset of synthetic data as he/she would if given the real (original) data.
The use of synthetic data is growing in many fields: from training of artificial intelligence models within the health sector to computer vision, image recognition and robotics fields [44].
Sustainability
The ability of an entity, service or process to be maintained continuously over time [45].
Technical Board (TB)
A committee that provides technical guidance, supervision and control to the project, as part of the project governance structure.
The TB is first tasked with the review of the potential engagement of tools and service providers to EUCAIM. Technical partners have the responsibility to adopt a responsible research and innovation attitude when designing and developing their solutions, by following the guides and requirements of the ethical committees, with the lead and support of the Data Protection Task Force and Ethics Advisory Board.
Technical Showrooms
Workshops that provide insights into platforms and tools relevant to distributed and federated data analysis. These sessions aim to understand the capabilities and potential fit of different tools and platforms within the overall EUCAIM infrastructure.
Testing Data
Data used for providing an independent evaluation of the trained and validated AI system in order to confirm the expected performance of that system before its placing on the market or putting into service [4].
Tiers of technical data compliance
To accommodate different levels of data compliance with the DFF, three technical tiers have been established. These tiers are scalable and allow data to be upgraded as the datasets are used in new research projects. Each tier offers increased visibility and usability of the data within the EUCAIM community.
- Low compliance: public metadata catalogue search
- Medium level of compliance: federated query functionality
- Fully compliance: distributed and federated processing
Software Provider (SP)
The Software Provider refers to any entity (startups, enterprises, research institutions, government agencies, non-profit organisations) that would like to contribute with processing tools, services, or applications they have developed to the EUCAIM’s marketplace for use in the federated processing purposes of the platform.
Tools
A digital or computerised resource that assists, enhances or executes an action or process [46].
Training Data
Data used for training machine learning algorithms (e.g., an artificial intelligence (AI) system) through fitting its learnable parameters [4].
Trustworthy AI
AI with proven characteristics such as efficacy, safety, fairness, robustness, transparency, which enable relevant stakeholders such as citizens, clinicians, health organisations and authorities to rely on it and adopt it in real-world practice [40].
Use case
A use case would refer to a description of a specific scenario or situation in which EUCAIM is intended to be used to address a particular problem related to cancer data for clinical purposes. It would outline the steps and interactions involved in the process, and the expected outcomes of applying EUCAIM in a real-world setting. Examples of use cases in this context could include the development of AI-based diagnosis tools or the use of federated data sources to improve patient outcomes. Use cases will be used to drive the design of the EUCAIM infrastructure and help define requirements, test functionality, and validation process for a clinical improvement.
User actions
Specific tasks or interactions that users can perform within the platform. These actions are related to the technical use of the platform and are specific to each user role. These actions are initiated by users to achieve specific objectives within the context of the User Stories, that describe the situation or scenario in which these actions take place.
User story
Descriptions of full interactions of a User Role with the EUCAIM platform, described in natural language. User Stories define in general terms the needs, restrictions, performance limitations, desired features, innovation capabilities and business models for the repository.
User’s Library
Area of the Dashboard where the authenticated Data Users-Researchers can add and remove the references of collections selected from the User’s Catalogue (either filtered using the federated query mechanism or not), request access to them, and view and manage their approved or rejected access requests.
Validation Data
Data used for providing an evaluation of the trained AI system and for tuning its non-learnable parameters and its learning process [4].