Azure-Language-Text-PII-redaction
Azure-Language-Text-PII-redaction
Version: 1
MicrosoftLast updated December 2025
PII Redaction for Text automatically detects and masks sensitive information such as names, addresses, phone numbers, credit card details, and other personally identifiable information (PII) in unstructured text.

Azure Language

Azure Language service helps you understand and process text at scale with advanced capabilities like sentiment analysis, entity recognition, summarization, and translation. It empowers businesses to unlock insights, automate workflows, and deliver personalized, multilingual experiences with enterprise-grade security and reliability.

Key capabilities

About this model

The PII Redaction model in Azure Language automatically detects and masks sensitive information in text, ensuring privacy and compliance. It is designed for real-time and batch processing, making it ideal for enterprise workflows that require secure handling of personal data.

Key model capabilities

  • Comprehensive PII Detection: Identifies a wide range of sensitive entities like names, addresses, IDs, and financial data.
  • Automatic Redaction: Replaces detected PII with placeholders to prevent exposure in downstream systems.
  • Multilingual Support: Detects PII across multiple languages for global applications.
  • Seamless Integration: Works with REST APIs, SDKs, and Azure AI Foundry Tools for easy deployment and scaling.

Use cases

See Responsible Use of AI for additional considerations for responsible use.

Key use cases

  • Generative AI Preprocessing: Remove PII before sending data to LLMs for summarization or content generation.
  • Customer Support Logs: Redact sensitive details in chat transcripts or emails for compliance.
  • Healthcare & Finance: Protect patient or client data in documents and reports.
  • Data Sharing & Analytics: Anonymize datasets for safe sharing and analysis.

Out of scope use cases

The model is not intended for:
  • Detecting non-textual PII (e.g., images, audio).
  • Guaranteeing compliance without human oversight.
  • Any use that violates Microsoft's Responsible Use of AI .

Pricing

Pricing is based on the number of text records processed and the selected tier. See the Azure pricing page for more details.

Technical specs

PII Redaction for Text is a cloud-based service that uses advanced machine learning and Named Entity Recognition (NER) models to identify and redact sensitive information in text. It supports multiple entity categories (e.g., financial, medical, personal identifiers) and works across a wide range of languages. The model is optimized for accuracy and compliance, enabling real-time or batch processing through REST APIs or SDKs. It integrates seamlessly with other Azure Language services and Generative AI workflows to ensure sensitive data is protected before downstream processing.

Input formats

The PII Redaction model expects UTF-8 encoded text as input. You can interact with the model through the Foundry portal, REST API (JSON payload), and SDKs (available for .NET, Python, Java, and JavaScript).

Supported language

The feature supports multiple languages for PII detection and redaction. Detected entities are returned with their type, confidence score, and redacted text. See the full list of supported languages linked here .

Supported Azure regions

See the full list of supported Azure regions for Azure Language linked here .

Sample JSON response

Sample input

{
    "documents": [
        {
            "id": "1",
            "text": "My name is John Doe and my phone number is 555-123-4567."
        }
    ]
}

Sample output

{
    "documents": [
        {
            "redactedText": "My name is ******** and my phone number is ************.",
            "entities": [
                {
                    "text": "John Doe",
                    "category": "Person",
                    "offset": 11,
                    "length": 8,
                    "confidenceScore": 0.99
                },
                {
                    "text": "555-123-4567",
                    "category": "PhoneNumber",
                    "offset": 36,
                    "length": 12,
                    "confidenceScore": 0.98
                }
            ],
            "id": "1",
            "warnings": []
        }
    ],
    "errors": [],
    "modelVersion": "2022-10-01"
}

Model architecture

Transformer-based multilingual NER architecture optimized for entity detection and redaction, leveraging attention mechanisms for high accuracy and contextual understanding.

Long context

If you're sending requests asynchronously, PII Redaction supports up to 125,000 characters. For synchronous requests, it supports up to 5,120 characters.

Optimizing model performance

Efficiency

  • Batch Processing: Combine multiple documents into a single API call to reduce network overhead and improve throughput.
  • Selective Redaction: Use entity category filters (e.g., only redact financial or healthcare PII) to minimize unnecessary processing.
  • Streaming or Chunking: For large documents, process text in chunks to maintain responsiveness and avoid timeouts.

Accuracy

  • Deterministic Redaction: The PII redaction process is deterministic, ensuring consistent and repeatable results for the same input text—critical for compliance and auditing.
  • Pre-cleaning Text: Normalize text by removing unnecessary symbols, correcting encoding issues, and stripping HTML tags for better detection.
  • Confidence Thresholding: Apply thresholds to handle ambiguous detections—e.g., flag low-confidence entities for human review.

Cost-Effectiveness

  • Hybrid Approach: Use regex or lightweight rules for simple patterns (e.g., ZIP codes, phone formats) before calling the full model.
  • Adaptive Sampling: For large datasets, redact representative samples when full coverage isn't required (e.g., analytics or testing).
  • Autoscaling & Rate Limiting: Configure autoscaling for peak loads and apply throttling to avoid unnecessary compute costs.

Additional assets

List of additional assets (e.g. training data, technical reports data processing code, model training code, model inference code, model evaluation code), if any, that are made available with a link, description of how each can be accessed and what licenses, if any, relate to their use.

Distribution

More information

Responsible AI considerations

Safety techniques

N/A

Safety evaluations

N/A

Known limitations

Depending on your scenario, input data and the entities you wish to extract, you could experience different levels of performance. The following sections are designed to help you understand key concepts about performance as they apply to using the Azure Language PII service.

Understand and measure performance

Since both false positive and false negative errors can occur, it is important to understand how both types of errors might affect your overall system. In redaction scenarios, for example, false negatives could lead to personal information leakage. For redaction scenarios, consider a process for human review to account for this type of error. For sensitivity label scenarios, both false positives and false negatives could lead to misclassification of documents. The audience may unnecessarily limited for documents labelled as confidential where a false positive occurred. PII could be leaked where a false negative occurred and a public label was applied. You can adjust the threshold for confidence score your system uses to tune your system. If it is more important to identify all potential instances of PII, you can use a lower threshold. This means that you may get more false positives (non- PII data being recognized as PII entities), but fewer false negatives (PII entities not recognized as PII). If it is more important for your system to recognize only true PII data, you can use a higher threshold. Threshold values may not have consistent behavior across individual categories of PII entities. Therefore, it is critical that you test your system with real data it will process in production.

System limitations and best practices for enhancing performance

  • Make sure you understand all the entity categories that can be recognized by the system. Depending on your scenario, your data may include other information that could be considered personal but is not covered by the categories the service currently supports.
  • Context is important for all entity categories to be correctly recognized by the system, as it often is for humans to recognize an entity. For example, without context a ten-digit number is just a number, not a PII entity. However, given context like You can reach me at my office number 2345678901, both the system and a human can recognize the ten-digit number as a phone number. Always include context when sending text to the system to obtain the best possible performance.
  • Person names in particular require linguistic context. Send as much context as possible for better person name detection.
  • For conversational data, consider sending more than a single turn in the conversation to ensure higher likelihood that the required context is included with the actual entities.
    In the following conversation, if you send a single row at a time, the passport number will not have any context associated with it and the EU Passport Number PII category will not be recognized.
Hi, how can I help you today? I want to renew my passport Sure, what is your current passport number? It's 123456789, thanks.
    However, if you send the whole conversation it will be recognized because the context is included.     Sometimes multiple entity categories can be recognized for the same entity. If we take the previous example:
Hi, how can I help you today? I want to renew my passport Sure, what is your current passport number? Its 123456789, thanks.
    Several different countries have the same format for passport numbers, so several different specific entity categories may be recognized. In some cases, using the highest confidence score may not be sufficient to choose the right entity class. If your scenario depends on the specific entity category being recognized, you may need to disambiguate the result elsewhere in your system either through a human review or additional validation code. Thorough testing on real life data can help you identify if you're likely to see multiple entity categories for recognized for your scenario.     Although many international entities are supported, currently the service only supports English text. Consider verifying the language the input text is in if you're not sure it will be all in English.
  • The PII service only takes text as an input. If you are redacting information from documents in other formats, make sure to carefully test your redaction code to ensure identified entities are not accidentally leaked.

Acceptable use

Acceptable use policy

Microsoft wants to help you responsibly develop and deploy solutions that use Azure Language. We are taking a principled approach to upholding personal agency and dignity by considering the fairness, reliability & safety, privacy & security, inclusiveness, transparency, and human accountability of our AI systems. These considerations are in line with our commitment to developing Responsible AI. This article discusses Azure Language features and the key considerations for making use of this technology responsibly. Consider the following factors when you decide how to use and implement AI-powered products and features.

General guidelines

When you're getting ready to deploy AI-powered products or features, the following activities help to set you up for success:
  • Understand what it can do: Fully assess the capabilities of any AI model you are using to understand its capabilities and limitations. Understand how it will perform in your particular scenario and context.
  • Test with real, diverse data: Understand how your system will perform in your scenario by thoroughly testing it with real life conditions and data that reflects the diversity in your users, geography and deployment contexts. Small datasets, synthetic data and tests that don't reflect your end-to-end scenario are unlikely to sufficiently represent your production performance.
  • Respect an individual's right to privacy: Only collect data and information from individuals for lawful and justifiable purposes. Only use data and information that you have consent to use for this purpose.
  • Legal review: Obtain appropriate legal advice to review your solution, particularly if you will use it in sensitive or high-risk applications. Understand what restrictions you might need to work within and your responsibility to resolve any issues that might come up in the future. Do not provide any legal advice or guidance.
  • System review: If you're planning to integrate and responsibly use an AI-powered product or feature into an existing system of software, customers or organizational processes, take the time to understand how each part of your system will be affected. Consider how your AI solution aligns with Microsoft's Responsible AI principles.
  • Human in the loop: Keep a human in the loop, and include human oversight as a consistent pattern area to explore. This means constant human oversight of the AI-powered product or feature and maintaining the role of humans in decision-making. Ensure you can have real-time human intervention in the solution to prevent harm. This enables you to manage where the AI model doesn't perform as required.
  • Security: Ensure your solution is secure and has adequate controls to preserve the integrity of your content and prevent unauthorized access.
  • Customer feedback loop: Provide a feedback channel that allows users and individuals to report issues with the service once it's been deployed. Once you've deployed an AI-powered product or feature it requires ongoing monitoring and improvement – be ready to implement any feedback and suggestions for improvement.

Terms of Service

Terms of Service Link

Your use of the Azure service is governed by the terms and conditions of the agreement under which you obtained the services.
  • For customers who purchase or renew a subscription (including free trials) online from Microsoft, your use is governed by either the Microsoft Customer Agreement ("MCA"), or the Microsoft Online Subscription Agreement ("MOSA"). Your use is governed by the latter if the MCA is not available in your geography. Visit the MCA page for availability details.
  • For customers who purchase through another Microsoft Commercial Licensing Program, such as an Enterprise Agreement, your use is governed by the licensing agreement under which you purchased the services. You can obtain a copy of your - licensing agreement by contacting your Microsoft account representative or Commercial Licensing.
  • If you do not have an Azure subscription, the Microsoft Terms of Use will govern your use of the limited Azure services which can be used without a subscription.
Model Specifications
Last UpdatedDecember 2025
Input TypeText
Output TypeText
ProviderMicrosoft