Voice Hotline Intake: STT Pipeline for Sapin II Compliance
- 13 minutes readA compliant voice hotline intake under France’s Loi Waserman, the act that modernised Sapin II to transpose EU Directive 2019/1937, is one pipeline, not three. Capture audio in the browser via the MediaRecorder API, encrypt and upload it into the same report bundle as the text fields using libsodium SealedBox to the recipient’s Curve25519 public key, produce a draft transcript on the recipient side using a self-hosted STT model, and let the reporter verify, rectify, and approve through an anonymous one-time receipt code (never an email or phone re-prompt). The same five-stage pipeline satisfies Article 9(2) and Article 18 of the directive, France’s verify/rectify/approve cycle, and Italy’s D.lgs. 24/2023 oral-report rule. The only deltas across regimes are the consent UX wording and the retention period.
Key Takeaways
- Article 9(2) of EU Directive 2019/1937 makes oral reporting mandatory: phone, voice messaging, or on-request in-person meeting.
- Article 18 sets three documentation modes (recording, transcript, or minutes) and requires reporter verify, rectify, and approve, as covered in the 12-row engineering checklist for Articles 8 to 18.
- France’s Loi Waserman extends the verify, rectify, and approve cycle to anonymous reports, which forces a one-time-receipt callback mechanism.
- Self-hosted whisper-class STT keeps transcription inside the platform’s compliance perimeter and avoids cloud vendors that may train on the audio.
- The same MediaRecorder-plus-SealedBox pipeline satisfies France, Italy, and Germany with two switches: consent UX and retention period.
What does Sapin II / Loi Waserman require for oral reports?
The Loi Waserman (Law 2022-401 of 21 March 2022) modernised Sapin II to transpose EU Directive 2019/1937. As of April 2026, verified against Légifrance, it carries the channel-format and oral-report rules into French law and makes the verify/rectify/approve cycle explicit.
Channels must accept three input modes:
- written reports,
- oral reports by telephone or other voice messaging systems,
- on request by the reporter, a physical meeting within a reasonable timeframe.
Oral reports must be documented in one of three modes: a recording in durable retrievable form, a complete and accurate transcript, or (for unrecorded conversations) accurate minutes prepared by the recipient. The reporter must be offered the opportunity to verify, rectify, and approve the recording, transcript, or minutes by signing them. This applies to anonymous reports too, which is the requirement that forces an anonymous callback mechanism rather than an email reply-to address.
The 2nd Decree (Décret n° 2022-1284 of 3 October 2022) sets retention rules and operational detail; the consolidated text on Légifrance is the canonical source. Where this differs from the EU floor: the French law makes the verify/rectify/approve cycle explicit, while Article 18 of the directive implies it but is less prescriptive in wording.
How does Article 18 specify the documentation modes?
Article 18 is the directive’s instruction to the recipient on what to do with an oral report once it has been received. Three documentation modes follow from the article, and a per-report consent flag distinguishes them.
| Mode | Channel state | Output | Reporter approves |
|---|---|---|---|
| 1 | Recorded line, audio retained | Audio file in durable retrievable form | The recording itself |
| 2 | Recorded line, transcript prepared | Complete and accurate transcript | The transcript |
| 3 | Unrecorded conversation | Accurate minutes by the recipient | The minutes |
In every mode the reporter has the right to verify, rectify, and approve by signing. For anonymous reporters, “signing” means a one-way acknowledgement that does not unmask them: a return visit through the receipt-code session and an explicit approval click.
The physical-meeting variant is the same three modes by analogy, with the same approval right. The consent flag is per-report, not per-channel: the platform must capture and store whether the reporter consented to recording at the moment of capture, and then respect that decision in both retention and disclosure paths.
What does the pipeline look like end-to-end?
The directive and its French and Italian transpositions describe behaviour, not architecture. Here is the five-stage pipeline that translates the legal text into something an engineering team can build and audit.
Stage 1, browser-side audio capture. The MediaRecorder API captures microphone audio, typically as WebM/Opus. The intake form treats the audio field as a mandatory field with a real byte-count check. A recording that captured zero bytes must fail validation, not silently submit. This is not a UX nicety. A real-world platform’s voice questionnaire field shipped a regression where mandatory voice fields could save without content because the validator only checked field presence, not byte length, which is a compliance regression as much as a UX one.
A pseudocode validator that closes the gap:
function validateVoiceField(blob, fieldConfig):
if fieldConfig.required and blob.size < fieldConfig.minBytes:
return ValidationError("voice field empty or too short")
if blob.duration < fieldConfig.minDuration:
return ValidationError("recording too short")
return Ok
Stage 2, encrypted upload into the report bundle. The audio Blob is uploaded as an attachment alongside text fields and encrypted with libsodium SealedBox to the recipient’s Curve25519 public key. The reporter does not retain the symmetric key; only the recipient can decrypt. This reuses the same primitive the platform already applies to text reports and file attachments, so the audio does not introduce a separate cryptographic surface to audit.
Stage 3, recipient-side playback and STT-assisted transcription. When the recipient opens the case, the audio is decrypted in the browser (or on a sandboxed transcription host) and a self-hosted STT model produces a draft transcript. Whisper-class open-weight models running on CPU or a small GPU keep the audio inside the platform’s compliance perimeter and out of any third-party vendor’s storage.
Stage 4, reporter callback for verify/rectify/approve. The reporter logs back in with a one-time receipt code (typically a 16-digit numeric token), reviews the recording or transcript, requests rectifications via the case messaging channel, and approves by re-confirming the receipt code. No email, no phone re-prompt, no account.
Stage 5, retention and disclosure. The platform’s per-channel retention period applies to both the audio and the transcript, not the telco’s. Disclosure events (for example a court order under Article 16 of the directive) are logged in the same audit trail as text-report disclosures.
How do you keep STT on-platform without leaking audio?
The directive does not name STT, but at any non-trivial scale a transcript editor without a machine-assisted draft becomes a bottleneck. The hosting choices below keep audio inside the platform’s compliance perimeter.
For small to medium volumes, self-hosted whisper.cpp on a CPU box is practical and needs no GPU. The model weights are open, the inference runs locally, and the audio never leaves the host. As of April 2026, whisper.cpp ships large-v3 and turbo variants that run on commodity x86 boxes. Deployments handling high voice-report volumes use the same pipeline with GPU-accelerated inference for faster turnaround, still on-platform.
Voice reports arrive in any language the channel supports, so the platform should route to the right model rather than running everything through a single multilingual one and accepting accuracy degradation on uncommon languages. Italian, French, and German all have well-supported whisper-class checkpoints.
A major cloud vendor’s general-purpose speech API is dangerous for whistleblowing audio because vendor terms typically allow audio to be stored or used for model training unless an explicit DPA opt-out is signed. The platform must be able to prove its audio path, which is hard to do across a third-party vendor boundary.
Two operational rules round this out. The STT output is always a draft transcript, never the formal record: the recipient edits it before passing it to the reporter for approval. And the audit trail logs which model produced which transcript, on which host, at which version, against which audio hash. That log is the proof that the STT step did not leak.
Why is the anonymous callback the hardest part?
The verify/rectify/approve cycle is what separates a Loi-Waserman-compliant pipeline from a glorified voicemail. Anonymous reporters force a particular callback design, and platforms that get this wrong tend to get it wrong in the same way: they ask for an email address.
The naive design fails because an email address or phone number can deanonymise the reporter. A reporter who is willing to leave their work email is, by definition, willing to identify themselves; a reporter who is not, will abandon the channel. The directive’s right to verify, rectify, and approve must work for the second group too.
The compliant design uses a one-time receipt code generated at submission. Sixteen digits is a common shape because it is memorable enough to write down and unguessable in practice. The code is displayed once to the reporter, never stored in plaintext on the platform side (only its hash). The reporter returns to the same intake URL, enters the receipt code, and lands on a session bound to the case, with no account, no email, no second factor that ties to identity.
Inside that session the reporter does three things. They listen to the recording or read the draft transcript inline (verify). They post any corrections back as a case message rather than editing the original recording, which is kept as immutable evidence (rectify). They click a confirm button that records their approval timestamp in the audit trail (approve).
Idle-session and storage hygiene matter too. The receipt-code session expires aggressively, clipboard-paste of the code is allowed, but the page does not re-render the code after first display. Every approval, rectification request, and idle-expiry is logged with no reporter identifier; the reporter token is the only correlation key.
Two configuration switches: consent UX and retention period
The same five-stage pipeline serves multiple jurisdictions because only two parameters change across them.
| Switch | France (Loi Waserman) | Italy (D.lgs. 24/2023) | Germany (HinSchG) |
|---|---|---|---|
| Consent UX | Informed consent at capture; in-flow click | Informed consent at capture; in-flow click | Informed consent at capture; in-flow click |
| Retention period | Set by Décret 2022-1284, per case-state | Set by D.lgs. 24/2023 retention rules | Set by HinSchG retention rules |
Switch 1 is the per-report consent flag for recording. France, Italy, and Germany all require informed consent at capture. Some regimes accept a channel-level standing consent disclosed in the privacy notice; others require an in-flow consent click; the conservative implementation ships the in-flow click everywhere and records the consent decision per report.
Switch 2 is the duration after closure for which the recording and transcript are retained. The French Décret 2022-1284 fixes a specific window; D.lgs. 24/2023 fixes another; HinSchG yet another. Retention is the sole jurisdictional dial that matters for storage planning, and it slots cleanly into a per-channel configuration value.
What does not change across regimes: the encryption primitive, the upload path, the verify/rectify/approve loop, the audit trail. Engineering changes for a new EU jurisdiction are configuration, not new code. For multinationals, each in-scope subsidiary’s channel can run with its own switch values while sharing the underlying pipeline; tenant-level configuration is the right abstraction, not per-regime forks.
Where this breaks: countries outside the directive’s scope (US two-party-consent states, certain Asian jurisdictions) need additional switches or are out of scope of the channel and routed elsewhere.
When NOT to Use This
- The organisation only operates external (regulator) channels. Article 11 puts the obligation on the competent authority, not on the in-scope employer, and the verify/rectify/approve cycle has different audit requirements there.
- Reports flow through a privileged law-firm intake under attorney-client privilege. The law firm owns the recording-and-transcript obligation, not the platform.
- The deployment is in a non-EU jurisdiction where Article 9(2) has no force and where local recording-consent law (for example US two-party-consent states) imposes a stricter rule than the directive.
- The use case is a one-off internal investigation rather than a standing reporting channel. The durable-record and verify/rectify/approve obligations apply to channels held open for reports, not ad-hoc interviews.
- The organisation cannot host its own STT model and is forbidden by data-protection counsel from using cloud STT. In that case the transcript mode falls back to manual transcription, which is allowed by Article 18 but slower at scale.
FAQ
Does Sapin II / Loi Waserman require recording phone reports?
Can I just point a phone number at voicemail and email the MP3?
How do anonymous reporters approve a recording without identifying themselves?
Is cloud speech-to-text safe for whistleblowing recordings?
Is the same pipeline good for France, Italy, and Germany?
Where in the EU directive is the recording rule actually written?