Data Framework for AI-Based Diagnostics

A robust data framework requires collective stewardship across the data lifecycle to ensure equitable, transparent and impactful AI-driven diagnostics for all communities.

Across many countries, particularly in low and middle income settings, diagnostic systems remain fragmented, uneven and difficult to scale. Health data are often stored across multiple platforms, collected in inconsistent formats and governed by unclear rules on access, privacy and quality. Almost half of the world’s population continues to lack timely and accurate diagnosis, and frontline health workers have to rely on incomplete, siloed or poor quality data, making it difficult to deliver equitable care. Artificial intelligence (AI)-based tools can help, but only if the data supporting them are trustworthy, representative and well governed.

Without a robust data framework, AI-based tools face predictable risks[1]: models trained on incomplete or unrepresentative data can become biased[2], results can vary across population subgroups, outputs cannot be reproduced, and confidence in AI tools erodes among clinicians, programme managers and patients. Weak storage, versioning and governance structures make it difficult to track how data evolve, what version of a dataset was used to train a model, or how decisions about data access are made. These gaps directly undermine reliability, safety and fairness, three elements that are essential if AI-based diagnostic tools are to be useful in real-world health systems.

Therefore, a diagnostic data framework is essential, as AI-based tools cannot function effectively without good data. By establishing clear rules, technical practices and governance mechanisms, a diagnostic data framework ensures that AI based diagnostics are not only technically robust but also equitable, trustworthy and sustainable across the health systems that need them most.

This Data Framework for AI-Based Diagnostics is designed to guide ministries of health, implementing partners and developers of digital health solutions by providing a comprehensive, structured blueprint to support the ethical, equitable and technically sound implementation of AI technologies in healthcare diagnostics. The framework addresses the complete data lifecycle, including data collection, annotation, validation, sharing, monitoring and reuse, while embedding the governing principle of privacy, interoperability and inclusivity.

While the framework is agnostic, it is designed to be localized and adapted to national regulations and ground realities. It can be adopted as a national reference architecture, or it can be used to stress-test discrete investments, such as a tuberculosis (TB) screening pilot study, against minimum requirements for lawful, ethical and operationally feasible data use. Thus, this Data Framework for AI-Based Diagnostics aligns with widely recognized global principles for the use of trustworthy health data and responsible AI, including the World Health Organization’s guidance on ethics, UNESCO’s recommendation on ethics[3] and governance of AI for health, and the OECD Recommendation on Health Data Governance (World Health Organization, 2021[4]; OECD 2016[5]; OECD 2022[6]).

This framework is rooted in global standards and principles, such as FAIR (Findable, Accessible, Interoperable, and Reusable)[7], and affords the inclusion of ethical data governance norms, including data privacy, consent, equity and responsible reuse. While technically comprehensive, the framework is also grounded in a human-centred approach that prioritizes data equity, diversity and transparency.

The framework is built on six pillars that describe the lifecycle of health data, including data collection, management, annotation and monitoring. This applies to various types of health data, including electronic medical records, electronic health records, personal health records, laboratory results and genomics information. Each pillar operates under three guiding principles: ensuring equitable access, protecting privacy and maintaining a secure, ethical and scalable foundation. The framework provides guidance on:

Governing principles: covering secure data storage, infrastructure, governance and privacy, aligned with legal and ethical norms.

Data collection: promoting demographic and clinical diversity and inclusive data types, and establishing standardized processes for digitization, terminology and language localization.

Data cleaning and Validation: implementing automated pipelines for data quality checks, error reporting and integration with health data systems[8].

Data annotation and structuring: enabling diverse machine-learning approaches through appropriate data structuring, such as the creation of interoperable formats, annotation tools and semantic mapping.

Data integration and harmonization: encouraging multimodal data merging using application programming interfaces (APIs) and the FAIR principles to improve data utility and comparability.

Data sharing and reuse: ensuring responsible access, licensing and version control to support open science while protecting sensitive information.

Continuous monitoring and feedback: tracking data drift, maintaining model performance and fostering feedback loops to support ongoing improvement.

This framework serves as a strategic reference for shaping national and institutional approaches to AI in diagnostics. Although developed for AI-based diagnostics, its principles are broadly applicable across sectors and geographies, particularly in advancing data justice and digital health equity.

The governing principles align actions with core values, protect stakeholders’ rights and build trust in technology.

The governing principles of the data framework for AI-based diagnostics are designed to ensure data quality, security, ethical use and long-term sustainability (Figure 1).

These principles include robust storage and versioning strategies that combine hybrid (cloud and local) storage, encryption, automated backups and role-based access control and monitoring. Version control must extend beyond files: health systems should maintain a reproducibility log that links each dataset release (identifier and version) to the preprocessing pipeline, label set and model artefacts used, and records the reason for each material change (e.g. new source facilities, re-annotation or recalibration). Such discipline is essential because diagnostic datasets evolve continuously as new facilities come online, devices change and annotation rules mature. Without systematic versioning, it becomes impossible to trace how these changes impact model behaviour. Appropriate storage and version control also support auditability, enable safer model updates and ensure that results can be reproduced or rolled back when needed. In practice, this creates a transparent data lineage that strengthens both regulatory confidence and operational trust.

Data governance and privacy are equally critical. In this data framework, governance refers to the explicit allocation of decision rights, accountabilities and documentation across the data lifecycle, i.e. who may access, transform, link or release data; under what criteria; and with what auditable record. Privacy protection should be treated as risk management rather than a binary state of “anonymous” versus “identifiable”: de identification[9] reduces but does not eliminate risk, particularly when datasets are linked or when models are vulnerable to inference attacks. Importantly, de identification is reversible in principle, because indirect identifiers may still allow re identification when combined with other data, whereas anonymization requires the removal or alteration of information to the extent that re identification is no longer reasonably possible. Consequently, programmes should select controls proportional to context, guided by applicable national law and ethical review and subject to periodic reassessment[10][11][12].

Privacy threat modelling and residual risk management
Because de-identification[9] is not a guarantee of privacy protection, particularly for free text and high-dimensional data, programmes should undertake a documented privacy threat model before data linkage, external sharing or model training. At a minimum, the threat model should specify plausible attackers, attack surfaces (e.g. linkage, membership inference, model inversion) and residual risks, and it should justify the technical and procedural safeguards selected (e.g. controlled access environments, query logging, differential privacy where appropriate, and contractual prohibitions on re-identification)[13][11]).

Operational governance architecture (minimum viable set-up)
High-level principles only become operational when decision rights and escalation pathways are explicit. At a minimum, countries and implementing partners should document a governance architecture that covers: (i) who approves data collection and linkage, (ii) who authorizes dataset release and external access, (iii) who owns model-change decisions, and (iv) how incidents are investigated and disclosed. A pragmatic “minimum viable” governance set-up includes:

Data Steward (or Data Stewardship Committee) to set fitness-for-use criteria and approve major data transformations.
Data Access Committee to review requests, enforce purpose limitation and ensure that approvals are time-bound and auditable.
Clinical Safety and Quality Lead to assess patient-safety risks, workflow impacts and escalation procedures for model failure.
Information Security Lead to oversee access controls, secure environments and incident response.
Ethics/Legal review function to ensure that consent, public-interest justification and data protection obligations are satisfied.
Monitoring and Evaluation function to own post-deployment performance, drift monitoring and periodic equity reviews.

Infrastructure readiness, including reliable connectivity, data centres, hardware, power supply, supply chain logistics and technical support, ensures the operability and scalability of AI systems in real-world healthcare settings.

Together, these governing principles, summarized in Figure 1, establish a secure, compliant and equitable environment for developing and deploying AI in diagnostics, particularly in resource-limited settings.

FIGURE 1 The governing principles of a data framework for AI-based diagnostics.

Click on the wheel for further details

"Without trustworthy data foundations, AI in health cannot be safe, inclusive, or impactful." WHO, Ethics and Governance of AI for Health (2021).

Having established the foundational governing principles, the following section presents the core structure of the data framework, outlining how these principles shape the processes and safeguards that structure the entire data lifecycle.

A data framework is a structured set of standards, processes, roles and technical specifications that determine how health data are generated, curated, protected, linked and used across the data lifecycle. In relation to AI-based diagnostics, such frameworks are particularly consequential because model performance and equity are dependent on upstream data quality and representativeness, as well as downstream monitoring and governance[4][14].

Therefore, the core objective of a data framework for AI-based diagnostics is twofold: (i) to make datasets fit for their intended analytical use, i.e. training, validation and deployment monitoring, and (ii) to render decisions made about data access and model change transparent, reviewable and accountable[5][13].

This framework draws on the FAIR principles to increase the discoverability and reusability of data while maintaining appropriate safeguards for sensitive, health-related information[7].

It also incorporates the CARE principles for Indigenous Data Governance (Collective benefit, Authority to control, Responsibility and Ethics) to foreground legitimacy, community benefit and respect for rights, particularly where datasets comprise data from Indigenous or otherwise marginalized communities[15].

Finally, the framework treats post-deployment oversight as a core design requirement. Because clinical environments, case-mix and workflows change over time, model calibration and performance can drift in ways that are not reliably captured by periodic audits alone; therefore, monitoring strategies must combine performance surveillance with input-data drift detection and pre-specified updating triggers[16][17][18].

A well-structured data framework is not merely a technical solution, it is a governance instrument that embeds equity, safety and transparency into routine operations. The intent is to enable responsible innovation while reducing the risk that AI systems amplify existing inequities or create new, harmful modes of failure[4][3].

To enhance global adoption and in-country adaptation during its dissemination, the framework is designed to be globally applicable while allowing principled local adaptation. Adaptation should not dilute governance; rather, it should translate core requirements into context-appropriate artefacts (e.g. national data standards, data-sharing agreements and standard operating procedures) consistent with domestic law, institutional capacity and public expectations[5][3][4].

Countries are encouraged to use the country adaptation worksheet (Annex B) to: (i) map existing assets and gaps across the six pillars, (ii) define minimum national metadata and coding standards, (iii) specify data access and retention rules, and (iv) agree governance responsibilities across ministries, implementers and vendors. Where national digital health strategies exist, the worksheet can be used as a harmonization tool to ensure that AI-based diagnostic initiatives strengthen rather than fragment health information systems.

FIGURE 2 Data framework for AI-based diagnostics

Click on the wheel for further details

This framework is built around six interconnected pillars, each representing a critical stage in the AI data lifecycle. Each component is aligned with a comprehensive set of guiding principles and technical standards drawn from globally recognized data governance models, including the FAIR principles[7], CARE principles[15], WHO AI ethics guidelines[4] and national health data standards. The framework spans critical areas (such as infrastructure, storage and versioning, data governance, and privacy) and integrates key technical processes (including data curation and validation) with current best practices, while also empowering practitioners and users to seamlessly incorporate data equity into their operations.

Equity in AI-based diagnostics is not achieved by intent alone; it is an empirical property that must be deliberately designed, measured and governed. A minimum expectation is that programmes pre-specify which population subgroups must be adequately represented in the data (for training and for post-deployment monitoring), and that they report subgroup performance with sufficient granularity to detect clinically meaningful disparities[19][2].

Subgroup planning should occur at the data collection design stage, not retrospectively. If a subgroup cannot be adequately represented, this limitation should be treated as an implementation risk. Mitigation may require targeted data acquisition, reweighting strategies, constrained use or additional clinical safeguards. Minimum subgroups to plan for (to be adapted to the local epidemiology and context) include:

Sex and gender (where recorded), including pregnant/post-partum status if clinically relevant.
Age strata (e.g. paediatric, adolescent, adult, older adult) aligned with clinical decision thresholds.
Geography and facility type (urban/rural, primary/secondary/tertiary, public/private) as a proxy for workflow and equipment variation.
Key risk groups for the diagnostic pathway (e.g. HIV status for TB screening, prior TB history, immunosuppression, and comorbidities that change radiographic presentation).
Socioeconomic and access proxies where ethically and legally permissible (e.g. insurance status, deprivation indices, distance-to-facility).
Programmes should report subgroup-based sensitivity, specificity, positive predictive value (or yield) and calibration metrics where applicable. They should also describe the actions taken if disparities are observed (e.g. recalibration, stratified thresholds, workflow changes or constrained deployment).

These use cases illustrate the application of the data framework, providing a model that can be scaled and adapted across diverse diagnostic contexts.

These use cases illustrate how the data framework can be operationalized in practical settings, bridging the gap between abstract principles and real-world diagnostic needs. By demonstrating standardized processes, from data collection and annotation to integration and sharing, they provide clear guidance on how AI-based diagnostic tools, such as computer-aided detection (CAD) for chest X-rays[23], can be developed, validated and deployed responsibly. They also emphasize the importance of using reference standards, such as microbiological reference standards (MRS) and composite reference standards (CRS), to ensure reliability, while aligning with global coding systems, for example, ICD-11 and SNOMED CT, to support interoperability.

By empowering users, policymakers, programme managers, clinicians and AI developers, these use cases serve as actionable blueprints. They enable health systems to make informed decisions on data governance, quality assurance and ethical reuse, while also equipping developers and researchers with well-structured datasets for innovation. Ultimately, they foster equitable and trustworthy AI[11][12] applications that strengthen diagnostics, improve patient outcomes and create scalable solutions adaptable to diverse contexts.

Use Case 1: CAD chest X-ray

Background and rationale

Chest X-ray is widely used as a triage test in diagnostic pathways for respiratory diseases, including TB screening. CAD can support scale-up, but its effectiveness depends on local calibration, data quality and workflow integration[4]. This use case illustrates how the data framework can be operationalized for a CAD chest X-ray programme through three linked workflows: community and facility screening, threshold calibration, and model training and evaluation.

Training, evaluation and release management for CAD chest X-ray: The CAD chest X-ray workflow use case illustrates how AI-based diagnostic tools can be implemented end-to-end within real-world health systems. It shows how chest X-ray data move from acquisition and quality assurance through AI model analysis, structured reporting and clinical integration, while highlighting the critical roles played by data standards, governance and interoperability at each step. By mapping the workflow to the data framework, this use case demonstrates how strong data foundations enable safe, reliable and scalable deployment of CAD tools in routine diagnostic settings.

Step 1

Data input

Image acquisition → Digital X-ray (DICOM format from X-ray machines/PACS or via API (DICOMweb, FHIR))

Alternative digital chest X-ray formats (JPEG, PNG)

Metadata Capture → patient ID, demographics (with de-identification/controlled access for research use)

Step 2

Data processing

De-identification (remove personally identifiable information from DICOM tags and overlays)

Quality check (image orientation, noise, resolution)

Preprocessing (normalization, resizing, lung segmentation, artifact removal), annotation (COCO dataset)

Step 3

CAD model

Feature extraction (deep CNN layers)

Classification (e.g. TB vs Non-TB, pneumonia, nodules)

Localization (bounding boxes, heatmaps)

Scoring (abnormality likelihood score, 0–100)

Step 4

Output

Structured report (findings + score)

Visualization overlay (heatmaps/bounding boxes)

Standard formats (DICOM-SR or HL7 FHIR for EMR integration)

Step 5

Data sharing

With clinicians → PACS/RIS, EMR/EHR

With researchers → anonymized data plus CAD outputs via secure APIs

Options: CSV, PDF, API-based (FHIR/DICOMweb), dashboards (cloud/offline)

Monitoring and evaluation

Accuracy tracking (sensitivity, specificity, area under the curve over time)

Error reporting (false positives/negatives logged)

Performance drift monitoring

Continuous learning (feedback and training)

Audit compliance and ethical monitoring, e.g. WHO and US Food and Drug Administration (FDA) guidelines

FIGURE 3 Overview of TB testing with CAD chest X-ray

Figure 3 ishows an end-to-end training and evaluation pipeline for CAD chest X-ray, spanning dataset curation and annotation, model development, internal and external validation, and packaging for deployment. For auditability and safe implementation, each model release should be accompanied by (i) a model card describing intended use, contraindicated uses, performance (overall and by subgroup), thresholding and limitations; (ii) a reproducibility log linking dataset and preprocessing versions to trained artefacts; and (iii) a monitoring plan specifying drift metrics, alert thresholds and update governance.

Standards for disease categorization (TB vs. non-TB)

For interoperability and comparability, TB-control programmes should use standardized coding where feasible (e.g. ICD for diagnosis coding), alongside nationally approved TB case definitions. Coding schemes should be documented, versioned, and mapped if any changes occur.

Minimum TB categorization should record (i) bacteriologically confirmed vs. clinically diagnosed TB, (ii) disease site and (iii) treatment status using national programme categories.

Annotation approaches

Annotations define the ground truth for training and validating AI algorithms. There are various levels, including:

Image level: Entire image labelled as TB positive, TB negative, or other abnormality.
Region level: Bounding boxes around cavities, infiltrates, nodules or effusions.
Pixel level (segmentation): Precise masks of lung fields and lesions for detailed model training.

Reference standards for validation

AI outputs should be evaluated against a clearly defined reference standard. When microbiological confirmation is feasible, an MRS based on culture and/or a nucleic acid amplification test (NAAT) provides high specificity but may miss paucibacillary disease.

When microbiological testing is incomplete, a pre-specified CRS may incorporate clinical and radiographic criteria and treatment response; a CRS can improve sensitivity but may reduce specificity.

Protocols should document verification patterns and adjudication rules to reduce bias and ensure the interpretability of performance estimates.

Storage and sharing of chest X-ray data

Chest X-rays and annotations are stored securely in Picture Archiving and Communication Systems (PACS) or cloud-based repositories.

Standard formats: DICOM for images; DICOM-SEG or JSON for annotations.
De-identification: Removal of patient identifiers before storage or sharing.
Security: Encryption, role-based access, and audit trails.
Sharing mechanisms: Secure transfer protocols, federated learning environments, and data access committees for governance.
Reuse: Ensure datasets follow FAIR principles.

PACS (Picture Archiving and Communication System)

A PACS enables efficient storage, retrieval and management of medical images:

Integration: Connects radiology equipment, hospital information systems and AI pipelines.
Functions:
Stores DICOM images.
Allows radiologists and AI systems to access, view and annotate images remotely.
Provides version control and access logs.
Role in AI Framework:
Acts as the backbone for image data management.
Supports integration with AI models by supplying preprocessed images

Use Case 2: Implementation In Indonesia

The application of the Data Framework for AI-based Diagnostics in Indonesia provides a real-world example of how its principles can be operationalized within a large scale national screening ecosystem. Indonesia’s Digital Health Transformation Strategy (DHTS) 2.0 (draft, unpublished) and the SATUSEHAT interoperability platform have created fertile ground for structured, lifecycle based data governance, yet health data remain fragmented across facilities, programmes and digital tools. Implementing the framework in this environment demonstrates how its pillars (collection, cleaning and validation, annotation and structuring, integration, storage and versioning, sharing and reuse, and continuous monitoring and feedback) can be translated into actionable processes that strengthen data quality and accountability at a national scale. The framework aligns with the Personal Data Protection Law (UU PDP 2022) (Republik Indonesia, 2022), the Ministry of Health’s Data Governance Policy, and international standards such as the WHO Ethics and Governance of AI for Health (2021)[4].

Indonesia’s Free Health Check programme (Cek Kesehatan Gratis, CKG) represents a particularly relevant use case. The programme spans newborns, children, adolescents, adults and older people, generating diverse datasets across growth assessment, developmental screening, infectious disease risk, mental health, cardiometabolic profiling, cancer screening and geriatric evaluation. The integration of large language models (LLMs) into this workflow further underscores the need for rigorous data governance: LLM enabled diagnostic support is entirely dependent on high quality, standardized and well governed data streams. The accompanying Indonesian documents, specifically the Framework for Diagnostic Data Management in Indonesia, the Technical Guidance for LLM based Diagnostic Applications, and the Specific Standards for LLM enabled Free Health Check (Data Standard Framework for AI-based Diagnostics), translate the overarching framework into concrete national policies, data models, validation rules, workflow designs and governance mechanisms.

Together, these materials demonstrate how the general framework can be adapted to a country context that combines a highly diverse population, varying degrees of digital readiness and ambitious national interoperability goals. They offer detailed implementation guidance, from mapping screening forms to interoperable data models and constructing a Free Health Check programme, to orchestrating LLM based tools, enforcing safety and privacy controls and calibrating risk thresholds for local populations. Readers interested in the operational, architectural and regulatory dimensions of the Indonesian implementation are encouraged to explore these documents for a full view of how the framework supports safe, equitable and context appropriate AI assisted diagnostics at a national scale.

Embed these principles into practice to ensure AI-based diagnostics are ethical, equitable and impactful across all health systems.

As AI becomes more embedded in health diagnostics, a comprehensive and equitable data framework is no longer optional. This framework must encompass robust principles for data governance, privacy, storage, infrastructure and representation, and ensure that these principles are operationalized throughout the data lifecycle. The success of AI-based tools in healthcare depends not just on their algorithmic precision but also on the diversity, quality and ethical handling of the data that power them.

Ensuring inclusivity in data collection, harmonizing data across systems, embedding ethical oversight and creating feedback mechanisms are critical to achieving equitable diagnostic outcomes. As highlighted in recent research, including a 2024 Lancet Digital Health article on responsible deployment of AI, the future of AI in health must balance technological advancement with social accountability, contextual relevance and transparency[17][20].

Furthermore, aligning with global frameworks, such as FAIR and CARE, and leveraging trusted standards including SNOMED CT, HL7 FHIR and WHO ICD-11, reinforces both interoperability and data equity. Embedding thoughtful data sharing and reuse practices guided by ethical licensing and local custodianship ensures that data continue to serve populations beyond their initial use.

Policymakers, technologists and implementers must collaboratively advance these principles to foster innovation while preventing harm and reducing systemic bias[2]. This framework is not a one-size-fits-all solution; it must be adapted to local health priorities, digital capacities and governance structures.

This framework for AI-based diagnostics thus serves not only as a technical guide but as a strategic policy tool to build health systems that are digitally empowered, inclusive and resilient. By embedding these practices, we can close the digital divide and realize the full potential of AI to transform global health.

As countries and implementers move from data readiness to actual deployment of AI, the next critical frontier is the development and adoption of rigorous AI evaluation and reporting standards. A strong data foundation enables AI to be built responsibly, but only structured evaluation frameworks can ensure that AI systems perform reliably, safely and equitably in real world conditions. Emerging standards, such as CONSORT AI, SPIRIT AI, DECIDE AI, TRIPOD AI, CLAIM and the forthcoming STARD AI, provide complementary guidance for different stages of the AI lifecycle, from model development to prospective clinical evaluation (see Table 1). Integrating these frameworks into national policies and procurement processes will be essential to validate performance, monitor drift, strengthen accountability and build trust among clinicians, regulators and the public. Ultimately, aligning robust data governance with internationally recognized AI evaluation standards will enable health systems to adopt AI with confidence, ensuring that innovation translates into measurable, equitable improvements in diagnosis and care.

Evidence aim	Recommended reporting standard(s)	Minimum AI-specific elements to report
Diagnostic accuracy studies of an AI “index test”	STARD-AI; STARD 2015	Dataset curation, reference standard, model version, threshold selection, subgroup performance, failure analysis
Prediction model development and validation (including Machine-Learning)	TRIPOD+AI	Data provenance, handling of missingness, model updating plans, calibration, external validation, transparency of feature definitions
Studies of AI for use in medical imaging (cross-cutting checklist)	CLAIM (2024 update)	Data splits, ground truth, annotation protocol, robustness checks, code and model availability or access conditions
Early-stage live clinical evaluation in real-world workflows	DECIDE-AI	Human factors, workflow integration, safety monitoring, real-world performance and drift, mitigation plan
Clinical trial protocols and trial reports with an AI component	SPIRIT-AI; CONSORT-AI	Description of the AI intervention, intended use, input requirements, handling of updates, oversight and adverse event processes

Table 1

Please find below supplementary materials:

Data Framework for AI-Based Diagnostics

Technical guidance for diagnostic applications leveraging Large Language Models (LLMS)

Specific standards for diagnostic applications leveraging Large Language Models (LLMS) in the free Health Check programme

Framework for Diagnostic Data Management in Indonesia

The following artefacts are recommended to translate interoperability principles into implementable specifications. They can be developed nationally (where feasible) or adapted from existing digital health architectures:

A national health data dictionary and minimum dataset (MDS) for the diagnostic domain, including metadata requirements and permissible values.
HL7 FHIR implementation guides (profiles, extensions and examples) for diagnostic workflows, including imaging and laboratory results, referrals and follow-up.
Terminology governance: value sets and mappings for core clinical concepts (e.g. ICD-11 for diagnosis coding; SNOMED CT, where licensed; RadLex for radiology descriptors, where applicable).
API specifications for core integrations (EHR/LIS/RIS/PACS, CAD engines, reporting dashboards), including authentication and audit logging.
Person and facility identity specifications (e.g. master patient index rules, facility registries) and linkage quality monitoring procedures.
Data exchange test suites and conformance checks to validate interoperability before scale-up.

This worksheet is intended to support ministries of health and implementing partners in translating the data framework into locally governed requirements. It can be completed iteratively as programmes mature.

Domain	Key questions	Evidence/artefacts
Governance and privacy	Who authorizes access, linkage and release? What legal basis and consent model applies?	Data access policy, ethics approvals, DPIA/AIA, data sharing agreement
Data collection	Which subgroups must be represented for training and for monitoring?	Sampling plan, facility list, minimum metadata, subgroup coverage report
Data quality	What minimum data quality thresholds are required before model training or sharing?	Data quality report, exception log, corrective action plan
Annotation	What is the reference standard and adjudication process?	Annotation protocol kit, annotator training records, agreement statistics
Integration	Which standards and mappings are mandatory?	FHIR profiles, terminology mapping, API specifications, conformance test results
Monitoring	What are the drift triggers and update governance?	Monitoring plan, drift dashboards, model-change log, incident reports

Completed worksheets should be archived with programme documentation and referenced in procurement, evaluation and governance decisions.

National Institute of Standards and Technology. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0) (NIST AI 100-1). https://doi.org/10.6028/NIST.AI.100-1
Norori, N., Hu, Q., Aellen, F. M., Faraci, F. D., & Tzovara, A. (2021). Addressing bias in big data and AI for health care: A call for open science. Patterns, 2(10), 100347. https://doi.org/10.1016/j.patter.2021.100347
United Nations Educational, Scientific and Cultural Organization. (2021). Recommendation on the ethics of artificial intelligence. UNESCO. https://unesdoc.unesco.org/ark:/48223/pf0000380455
World Health Organization. (2021). Ethics and governance of artificial intelligence for health: WHO guidance. World Health Organization. https://www.who.int/publications/i/item/9789240029200
Organisation for Economic Co-operation and Development. (2016). Recommendation of the Council on Health Data Governance (OECD/LEGAL/0433). https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0433
Organisation for Economic Co-operation and Development. (2022). Health data governance for the digital age: Implementing the OECD Recommendation on Health Data Governance. OECD Publishing. https://www.oecd.org/en/publications/2022/05/health-data-governance-for-the-digital-age_5c42de41.html
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., ... Mons, B. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18
Why AI Data Quality Is Key To AI Success | IBM.
Sarkar, A. R., Chuang, Y.-S., Mohammed, N., & Jiang, X. (2024). De-identification is not enough: A comparison between de-identified and synthetic clinical notes. Scientific Reports, 14, 29669. https://doi.org/10.1038/s41598-024-81170-y

https://doi.org/10.1056/NEJMp1714229

Char DS, Shah NH, Magnus D. Implementing Machine Learning in Health Care - Addressing Ethical Challenges. N Engl J Med. 2018 Mar 15;378(11):981-983.PMID: 29539284; PMCID: PMC5962261. https://www.nejm.org/doi/10.1056/NEJMp1714229
Schwabe, D., Becker, K., Seyferth, M. et al. The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review. npj Digit. Med. 7, 203 (2024). https://doi.org/10.1038/s41746-024-01196-4
Das, A., Jha, D., Sanjotra, J., Susladkar, O., Sarkar, S., Rauniyar, A., Tomar, N., Sharma, V., & Bagci, U. (2024). Ethical Framework for Responsible Foundational Models in Medical Imaging. ArXiv. https://arxiv.org/abs/2406.11868
AI Risk Management Framework | NIST. https://www.nist.gov/itl/ai-risk-management-framework
Matheny ME, Whicher D, Thadaney Israni S. Artificial Intelligence in Health Care: A Report From the National Academy of Medicine. JAMA. 2020 Feb 11;323(6):509-510. PMID: 31845963. https://doi.org/10.1001/jama.2019.21579
Carroll, S. R., Garba, I., Figueroa-Rodríguez, O. L., Holbrook, J., Lovett, R., Materechera, S., ... Hudson, M. (2020). The CARE principles for Indigenous data governance. Data Science Journal, 19, 43. https://doi.org/10.5334/dsj-2020-043
Davis, S. E., Greevy, R. A., Lasko, T. A., Walsh, C. G., & Matheny, M. E. (2020). Detection of calibration drift in clinical prediction models to inform model updating. Journal of Biomedical Informatics, 112, 103611.https://doi.org/10.1016/j.jbi.2020.103611
Jenkins, D. A., Martin, G. P., Sperrin, M., Riley, R. D., Debray, T. P. A., Collins, G. S., & Peek, N. (2021). Continual updating and monitoring of clinical prediction models: Time for dynamic prediction systems? Diagnostic and Prognostic Research, 5(1), 1. https://doi.org/10.1186/s41512-020-00090-3
Kore, A., Abbasi Bavil, E., Subasri, V., Abdalla, M., Fine, B., Dolatabadi, E., & Abdalla, M. (2024). Empirical data drift detection experiments on real-world medical imaging data. Nature Communications, 15(1), 1887. https://doi.org/10.1038/s41467-024-46142-w
Lekadir, K., et al., & FUTURE-AI Consortium. (2025). FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ, 388, e081554. https://doi.org/10.1136/bmj-2024-081554
Kahn, M. G., Callahan, T. J., Barnard, J., Bauck, A. E., Brown, J., Davidson, B. N., ... Zozus, M. N. (2016). A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. eGEMs, 4(1), 1244. https://doi.org/10.13063/2327-9214.1244
Tejani, A. S., Klontzas, M. E., Gatti, A. A., Mongan, J. T., Moy, L., Park, S. H., Kahn, C. E., Jr., & CLAIM 2024 Update Panel. (2024). Checklist for Artificial Intelligence in Medical Imaging (CLAIM): 2024 Update. Radiology: Artificial Intelligence, 6(4),e240300. https://doi.org/10.1148/ryai.240300
Filling gaps in trustworthy development of AI: Incident sharing, auditing, and other concrete mechanisms could help verify the trustworthiness of actors. https://arxiv.org/pdf/2112.07773
World Health Organization. (2021). Determining the local calibration of computer-assisted detection (CAD) thresholds and other parameters: A toolkit to support the effective use of CAD for TB screening and detection. World Health Organization. https://wkc.who.int/resources/publications/i/item/determining-the-local-calibration-of-computer-assisted-detection-(cad)-thresholds-and-other-parameters
Rivera, S. C., Liu, X., Chan, A.-W., Denniston, A. K., Calvert, M. J., & SPIRIT-AI and CONSORT-AI Working Group. (2020). Guidelines for clinical trial protocols for interventions involving artificial intelligence: The SPIRIT-AI extension. The Lancet Digital Health, 2(10), e549–e560. https://doi.org/10.1016/S2589-7500(20)30219-3

Executive summary ▼