Ga naar hoofdinhoud

Anonymization Entity Review

Status: Proposed

This feature is part of the batch anonymization workflow. It provides a consolidated view of all detected entities across a batch of documents, allowing users to selectively include or exclude entities before anonymization is applied.

Overview

After text extraction is complete for all files in a batch, DocuDesk presents a unified entity list deduplicated by value (case-insensitive). Each entity shows its type, highest confidence score, and the number of files in which it appears. Entities are pre-selected based on the active WOO anonymization profile.

Users can toggle individual entities on or off. The final selection is sent to the backend when the user triggers anonymization.

Key Capabilities

  • Consolidated, deduplicated entity list across all batch files
  • Pre-selection based on active WOO anonymize/keep profiles
  • Confidence threshold filter (default: entities above 0.7 included)
  • Per-entity toggle (frontend-only state, no intermediate API call)
  • Batch anonymization triggered with the reviewed entity list

API Endpoints

MethodPathDescription
GET/api/anonymization/batch/{batchId}/entitiesRetrieve consolidated entity list for review (batch must be in "review" status)
POST/api/anonymization/batch/{batchId}/anonymizeStart anonymization with the reviewed entity list

Standards

  • GDPR / AVG — Entity data is not persisted after anonymization; reviewed list is transient
  • WOO — Default entity profiles align with WOO publication anonymization requirements
  • TEC-DMS-7 (Workflow Management) — Entity review is a step in the document workflow