Back to home
    Guide11 min read

    How to Monitor AI Systems Under the EU AI Act

    Once an AI system is live, the law stops asking "was it built correctly?" and starts asking "is it still behaving correctly?". Under the EU AI Act, monitoring a deployed AI system is a continuing legal obligation, not an optional extra. This guide covers what to monitor, who is responsible, how to scale the effort to the system's risk level, and a distinction that trips up most teams: monitoring an AI system is not the same as monitoring the people who use it.

    Monitoring Is a Legal Obligation, Not an Add-On

    Several articles of the AI Act, read together, make ongoing monitoring mandatory for high-risk AI systems:

    • Post-market monitoring (Article 72) — Providers must actively and systematically collect, document, and analyse data on how the system performs throughout its lifetime, to confirm it stays compliant.
    • Automatic logging (Articles 12 & 19) — High-risk systems must record events automatically over their lifetime, and the logs must be kept for at least six months.
    • Deployer monitoring (Article 26) — Organisations that use a high-risk system must monitor its operation against the instructions for use and flag risks or serious incidents.
    • Accuracy and robustness over the lifecycle (Article 15) — Declared performance levels must hold consistently in production, not just on launch day.
    • Human oversight (Article 14) — Oversight must remain effective while the system is in use, with the ability to interpret, override, and stop it.
    • Serious-incident reporting (Article 73) — Incidents that cause serious harm must be reported to authorities immediately, and no later than 15 days after becoming aware.

    The Ten Dimensions of AI-System Monitoring

    Professionally, monitoring an AI system means watching ten dimensions across its lifetime. You do not need a data-science team to populate them — most can be read from logs and reviews a compliance owner already controls.

    DimensionWhat you watchA KPI you can actually report
    Performance vs. baselineLive accuracy or quality against the figure declared at deploymentObserved accuracy vs. declared (e.g. 92% → 88.5%, −3.5pp)
    Data & concept driftWhether inputs — and the input-to-outcome relationship — have shifted from trainingDrift score per key feature vs. threshold; % out-of-distribution inputs
    Data qualityIntegrity of the data entering the system before the model sees it% of records passing validation; null/stale rate; records quarantined
    Reliability & latencyWhether the service is available and fast enough as an operational systemUptime % vs. SLA; p95 latency; technical error rate
    Bias & fairnessWhether outcomes differ unjustifiably across protected or sensitive groupsSelection-rate disparity (80% rule); error-rate gap across groups
    Security & abuseAttacks and misuse: prompt injection, adversarial inputs, data leakageDetected jailbreak attempts; guardrail trigger rate; blocked leaks
    Human oversightThat a person can and does meaningfully oversee and override outputsOverride rate; % of high-impact decisions human-reviewed; escalations
    Incidents & near-missesHarmful or wrong outcomes, and the ones that were caught in timeIncidents by severity; time to detect/resolve; % reported within deadline
    Change & versioningAn auditable history of every change to the deployed systemLive version + promotion date; % of changes through the approval gate; rollbacks
    Periodic reviewA scheduled re-attestation that the system is still fit for its intended use% of reviews on time; days overdue; systems with a valid current sign-off

    Who Monitors What: Provider vs. Deployer

    Responsibility is split between the organisation that builds the system (the provider) and the organisation that uses it (the deployer). Many companies are both — they buy a model and embed it in their own product.

    The provider builds the monitoring capability into the system, runs the post-market monitoring plan, keeps the logs under its control, declares the accuracy metrics in the instructions for use, and reports serious incidents to the authorities.

    The deployer monitors day-to-day operation against the instructions for use, keeps its own logs for at least six months, informs the provider and authorities of risks or incidents, and — for public bodies and certain services — completes a fundamental-rights impact assessment before going live.

    Monitor in Proportion to Risk

    The AI Act is risk-based, and so is the monitoring effort. Do not apply the full apparatus to every system — match the depth to the risk and to your size:

    • High-risk (Annex III) systems — the full stack: a documented post-market monitoring plan, automatic logging, human oversight, drift and performance KPIs, and serious-incident reporting.
    • Limited-risk / transparency-only systems — a lighter touch: basic usage tracking, a named oversight contact, and a periodic review — not the full apparatus.
    • SMEs and small mid-caps — the Act expressly allows simplified, proportionate documentation and quality management. Scale depth to risk and company size, rather than maximising everywhere.

    The Digital Omnibus (provisional agreement, May 2026) deferred high-risk obligations to 2 December 2027 (Annex III) and 2 August 2028 (Annex I), and replaced the mandatory monitoring-plan template with a flexible framework. The lesson for any monitoring set-up: build a configurable baseline you can re-point as guidance lands, not a one-off built to a fixed form. (These changes are agreed but pending final publication, so treat the dates as near-final rather than settled.)

    System Monitoring Is Not Employee Monitoring

    Monitoring an AI system is not the same as monitoring the people who use AI. The first watches the model and its outputs and is governed by the AI Act. The second watches your own staff — who is using which AI tool, how much, for what — and that is workplace surveillance, governed by the GDPR, the ePrivacy rules, and national labour law, not the AI Act.

    Tracking employees' use of a tool such as an AI assistant typically needs a lawful basis, a data-protection impact assessment, advance transparency, and — in several EU states — a works-council agreement before you switch it on (a German Betriebsvereinbarung, French CSE consultation, or Italian Article 4 agreement). Keep the two programmes separate: different owners, different legal bases, different evidence. Folding staff surveillance into "AI monitoring" is a costly mistake.

    What Good Evidence Looks Like

    When an auditor — or, increasingly, a claimant under the revised Product Liability Directive — asks how you monitor, these are the artifacts that answer:

    • A documented post-market monitoring plan with named KPIs, thresholds, and a review cadence.
    • Time-series metric records, and the alerts raised when a threshold was breached.
    • Immutable logs of system use and of every human override or intervention.
    • An incident register with root-cause analysis and proof of timely regulatory notification.
    • A signed periodic-review record naming the accountable owner and the continued-use decision.

    How LandingRed helps: LandingRed turns this into a working system — a post-market monitoring plan per AI system, KPI tracking with threshold alerts that auto-escalate into incidents, immutable Article 12 logs with enforced retention, human-oversight records, and an audit-ready evidence pack — scaled to each system's risk level.

    LandingRed automates all of this

    Stop managing compliance in spreadsheets. Classify, document, assess, and monitor your AI systems from one platform.