Azure-Language-Text-Analytics-for-Health
Azure-Language-Text-Analytics-for-Health
Version: 1
MicrosoftLast updated May 2026
Text Analytics for Health extracts and labels relevant medical information from unstructured clinical text, including doctors' notes, discharge summaries, and electronic health records, using named entity recognition, relation extraction, entity linking, a

Azure Language

Azure Language adds advanced natural language processing to your apps using task‑optimized AI models. It helps you extract key information from text, transcripts, and files as well as detect language to build multilingual, conversational experiences—all with enterprise‑grade security and flexible customization.

Key capabilities

About this model

The Text Analytics for Health model in Azure Language extracts and labels relevant medical information from unstructured clinical text. It performs named entity recognition, relation extraction, entity linking, and assertion detection to surface structured insights from doctors' notes, discharge summaries, clinical documents, and electronic health records. It supports both real-time and batch processing, making it suitable for a wide range of healthcare and life sciences workflows.

Key model capabilities

  • Named Entity Recognition: Identifies medical entities such as diagnoses, medications, symptoms/signs, body structures, dosages, and Social Determinants of Health (SDOH).
  • Relation Extraction: Detects semantic relationships between medical entities (e.g., medication–dosage, diagnosis–body structure, treatment–condition).
  • Entity Linking: Maps recognized entities to standardized medical ontologies via the Unified Medical Language System (UMLS) for interoperability.
  • Assertion Detection: Identifies negation, uncertainty, conditionality, and association in clinical text to correctly contextualize extracted entities.
  • FHIR Support: Returns results in the Fast Healthcare Interoperability Resources (FHIR 4.0.1) format for seamless EHR integration.
  • Multilingual Support: Processes clinical text in English and additional preview languages for global healthcare applications.

Use cases

See Responsible Use of AI for additional considerations for responsible use.

Key use cases

  • Clinical Document Analysis: Extract structured data from unstructured EHR notes, discharge summaries, and clinical trial documents.
  • Healthcare Data Pipelines: Power downstream analytics by converting free-text clinical notes into structured, searchable data.
  • Medical Coding Assistance: Surface diagnoses, procedures, and medications from notes to assist with clinical coding workflows.
  • Drug Safety & Pharmacovigilance: Identify adverse events, medications, and dosages in patient records and medical literature.
  • Population Health Analytics: Aggregate and analyze SDOH and clinical entity data across patient populations.

Out of scope use cases

The model is not intended for:
  • Use as a medical device, clinical support, or diagnostic tool, or for the diagnosis, cure, mitigation, treatment, or prevention of disease.
  • Replacing professional medical advice, healthcare opinion, or the clinical judgment of a healthcare professional.
  • Any use that violates Microsoft's Responsible Use of AI .

Pricing

Pricing is based on the number of text records processed and the selected tier. See the Azure pricing page for more details.

Technical specs

Text Analytics for Health is a cloud-based service using advanced transformer-based NLP models pre-trained on clinical and biomedical text. It supports named entity recognition across a wide range of healthcare entity categories, relation extraction between co-occurring entities, entity linking to UMLS, and assertion detection for negation, uncertainty, and conditionality. The service can be accessed via REST API, Azure SDKs, Microsoft Foundry, or deployed on-premises using Docker containers.

Input formats

The Text Analytics for Health model expects UTF-8 encoded plain text as input. You can interact with the model through the Foundry portal, REST API (JSON payload), SDKs (available for .NET, Python, Java, and JavaScript), or a self-hosted Docker container.

Supported languages

The feature supports English for general availability, with additional languages available in preview. See the full list of supported languages linked here .

Supported Azure regions

See the full list of supported Azure regions for Azure Language linked here .

Sample JSON response

Sample input

{
    "documents": [
        {
            "language": "en",
            "id": "1",
            "text": "Patient is a 45-year-old male diagnosed with Type 2 diabetes mellitus. He is currently taking Metformin 500mg twice daily."
        }
    ]
}

Sample output

{
    "results": {
        "documents": [
            {
                "id": "1",
                "entities": [
                    {
                        "offset": 14,
                        "length": 13,
                        "text": "45-year-old",
                        "category": "Age",
                        "confidenceScore": 0.98
                    },
                    {
                        "offset": 34,
                        "length": 4,
                        "text": "male",
                        "category": "Gender",
                        "confidenceScore": 0.99
                    },
                    {
                        "offset": 53,
                        "length": 25,
                        "text": "Type 2 diabetes mellitus",
                        "category": "Diagnosis",
                        "confidenceScore": 0.97,
                        "links": [
                            {
                                "dataSource": "UMLS",
                                "id": "C0011860"
                            }
                        ]
                    },
                    {
                        "offset": 95,
                        "length": 9,
                        "text": "Metformin",
                        "category": "MedicationName",
                        "confidenceScore": 0.99
                    },
                    {
                        "offset": 105,
                        "length": 5,
                        "text": "500mg",
                        "category": "Dosage",
                        "confidenceScore": 0.98
                    },
                    {
                        "offset": 111,
                        "length": 11,
                        "text": "twice daily",
                        "category": "Frequency",
                        "confidenceScore": 0.97
                    }
                ],
                "relations": [
                    {
                        "relationType": "DosageOfMedication",
                        "entities": [
                            { "ref": "#/results/documents/0/entities/3", "role": "Medication" },
                            { "ref": "#/results/documents/0/entities/4", "role": "Dosage" }
                        ]
                    }
                ],
                "warnings": []
            }
        ],
        "errors": [],
        "modelVersion": "2022-08-15"
    }
}

Model architecture

Transformer-based multilingual NER architecture pre-trained on biomedical and clinical corpora, fine-tuned for health entity recognition, relation extraction, entity linking to UMLS, and assertion detection in clinical text.

Long context

For synchronous requests, Text Analytics for Health supports up to 5,120 characters per document. For asynchronous requests, up to 125,000 characters per document. Results from asynchronous requests are available for 24 hours after ingestion.

Optimizing model performance

Efficiency

  • Batch Processing: Combine multiple documents into a single API call to reduce network overhead and improve throughput.
  • Asynchronous API: Use asynchronous requests for large documents or high-volume workloads to maximize throughput.
  • Docker Container: For on-premises or air-gapped scenarios, deploy the Docker container to bring the service closer to your data.

Accuracy

  • Full Document Context: Submit complete clinical notes rather than fragmented sentences to ensure the model has sufficient context for accurate entity detection and relation extraction.
  • FHIR Output: Use the FHIR response format for structured, standards-compliant output when integrating with EHR systems.
  • Assertion Detection: Always review assertion detection results (negation, uncertainty) before using extracted entities in downstream clinical workflows.

Cost-Effectiveness

  • Selective Analysis: Use the API's feature selection parameters to enable only the capabilities you need (e.g., NER only, without relation extraction or entity linking).
  • Chunking Large Documents: For documents exceeding the character limit, chunk them at natural boundaries (e.g., section headers) to maintain context.
  • Autoscaling & Rate Limiting: Configure autoscaling for peak loads and apply throttling to avoid unnecessary compute costs.

Additional assets

List of additional assets (e.g. training data, technical reports data processing code, model training code, model inference code, model evaluation code), if any, that are made available with a link, description of how each can be accessed and what licenses, if any, relate to their use.

Distribution

More information

Responsible AI considerations

Safety techniques

N/A

Safety evaluations

N/A

Important disclaimer

Text Analytics for Health is provided "AS IS" and "WITH ALL FAULTS." It is not intended or made available for use as a medical device, clinical support, diagnostic tool, or other technology intended to be used in the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions. No license or right is granted by Microsoft to use this capability for such purposes. This capability is not designed or intended to be a substitute for professional medical advice or healthcare opinion, diagnosis, treatment, or the clinical judgment of a healthcare professional, and should not be used as such. The customer is solely responsible for any use of Text Analytics for Health. Customers must separately license any and all source vocabularies they intend to use under the terms set for the UMLS Metathesaurus License Agreement . The customer is responsible for ensuring compliance with those license terms, including any geographic or other applicable restrictions. All decisions leveraging outputs of Text Analytics for Health that impact individuals or resource allocation (including, but not limited to, those related to billing, human resources, or treatment and managing care) should be made with human oversight and should not be based solely on the findings of the model.

Known limitations

Depending on your scenario, input data and the entities you wish to extract, you could experience different levels of performance. The following sections are designed to help you understand key concepts about performance as they apply to using the Azure Language Text Analytics for Health service.

Understand and measure performance

Since both false positive and false negative errors can occur, it is important to understand how both types of errors might affect your overall system. In clinical workflows, false negatives could lead to missed medical entities that are relevant to patient care. False positives could introduce incorrect entities into downstream systems. You can adjust the threshold for confidence score your system uses to tune your system. Threshold values may not have consistent behavior across individual categories of health entities. Therefore, it is critical that you test your system with real clinical data it will process in production.

System limitations and best practices for enhancing performance

  • Make sure you understand all the health entity categories that can be recognized by the system. Your clinical data may include information that is not covered by the categories the service currently supports.
  • Context is critical for clinical entity recognition. Submit complete clinical notes or full document sections rather than isolated sentences to maximize accuracy for entity detection, relation extraction, and assertion detection.
  • Assertion detection (negation, uncertainty, conditionality) is important in clinical text. Always verify assertion detection outputs before using extracted entities in downstream clinical or compliance workflows.
  • The service extracts Social Determinants of Health (SDOH) and ethnicity mentions in text. This capability may not cover all potential SDOH and does not derive inferences based on SDOH or ethnicity (for example, substance use information is surfaced, but substance abuse is not inferred). The SDOH extraction capability is intended to help providers improve health outcomes and should not be used to stigmatize or draw negative inferences about patient populations.
  • The service is optimized for English. For other languages currently in preview, performance may vary. Consider verifying the language of your input text before processing.
  • Text Analytics for Health processes plain text. If you are extracting text from clinical documents in other formats (e.g., PDF, scanned images), ensure your preprocessing accurately captures the full text without truncation or corruption.

Acceptable use

Acceptable use policy

Microsoft wants to help you responsibly develop and deploy solutions that use Azure Language. We are taking a principled approach to upholding personal agency and dignity by considering the fairness, reliability & safety, privacy & security, inclusiveness, transparency, and human accountability of our AI systems. These considerations are in line with our commitment to developing Responsible AI. This article discusses Azure Language features and the key considerations for making use of this technology responsibly. Consider the following factors when you decide how to use and implement AI-powered products and features.

General guidelines

When you're getting ready to deploy AI-powered products or features, the following activities help to set you up for success:
  • Understand what it can do: Fully assess the capabilities of any AI model you are using to understand its capabilities and limitations. Understand how it will perform in your particular scenario and context.
  • Test with real, diverse data: Understand how your system will perform in your scenario by thoroughly testing it with real life conditions and data that reflects the diversity in your users, geography and deployment contexts. Small datasets, synthetic data and tests that don't reflect your end-to-end scenario are unlikely to sufficiently represent your production performance.
  • Respect an individual's right to privacy: Only collect data and information from individuals for lawful and justifiable purposes. Only use data and information that you have consent to use for this purpose. Handle all clinical and health data in accordance with applicable laws and regulations (e.g., HIPAA, GDPR).
  • Legal review: Obtain appropriate legal advice to review your solution, particularly if you will use it in sensitive or high-risk applications. Understand what restrictions you might need to work within and your responsibility to resolve any issues that might come up in the future. Do not provide any legal advice or guidance.
  • System review: If you're planning to integrate and responsibly use an AI-powered product or feature into an existing system of software, customers or organizational processes, take the time to understand how each part of your system will be affected. Consider how your AI solution aligns with Microsoft's Responsible AI principles.
  • Human in the loop: Keep a human in the loop, and include human oversight as a consistent pattern area to explore. This means constant human oversight of the AI-powered product or feature and maintaining the role of humans in decision-making. Ensure you can have real-time human intervention in the solution to prevent harm. This is especially critical for clinical applications where AI outputs should always be reviewed by qualified healthcare professionals.
  • Security: Ensure your solution is secure and has adequate controls to preserve the integrity of your content and prevent unauthorized access. Clinical and health data requires particularly robust security controls.
  • Customer feedback loop: Provide a feedback channel that allows users and individuals to report issues with the service once it's been deployed. Once you've deployed an AI-powered product or feature it requires ongoing monitoring and improvement – be ready to implement any feedback and suggestions for improvement.

Terms of Service

Terms of Service Link

Your use of the Azure service is governed by the terms and conditions of the agreement under which you obtained the services.
  • For customers who purchase or renew a subscription (including free trials) online from Microsoft, your use is governed by either the Microsoft Customer Agreement ("MCA"), or the Microsoft Online Subscription Agreement ("MOSA"). Your use is governed by the latter if the MCA is not available in your geography. Visit the MCA page for availability details.
  • For customers who purchase through another Microsoft Commercial Licensing Program, such as an Enterprise Agreement, your use is governed by the licensing agreement under which you purchased the services. You can obtain a copy of your - licensing agreement by contacting your Microsoft account representative or Commercial Licensing.
  • If you do not have an Azure subscription, the Microsoft Terms of Use will govern your use of the limited Azure services which can be used without a subscription.
Model Specifications
Last UpdatedMay 2026
Input TypeText
Output TypeText
ProviderMicrosoft