microsoft-xlm-align-base
Version: 3
HuggingFaceLast updated August 2025

XLM-Align

XLM-Align (ACL 2021, paper , repo , model ) Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment XLM-Align is a pretrained cross-lingual language model that supports 94 languages. See details in our paper .

Example

model = AutoModel.from_pretrained("microsoft/xlm-align-base")

Evaluation Results

XTREME cross-lingual understanding tasks:
ModelPOSNERXQuADMLQATyDiQAXNLIPAWS-XAvg
XLM-R_base75.661.871.9 / 56.465.1 / 47.255.4 / 38.375.084.966.4
XLM-Align76.063.774.7 / 59.068.1 / 49.862.1 / 44.876.286.868.9

MD5

b9d214025837250ede2f69c9385f812c  config.json
6005db708eb4bab5b85fa3976b9db85b  pytorch_model.bin
bf25eb5120ad92ef5c7d8596b5dc4046  sentencepiece.bpe.model
eedbd60a7268b9fc45981b849664f747  tokenizer.json

About

Contact: chizewen@outlook.com BibTeX:
@inproceedings{xlmalign,
  title = "Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment",
  author={Zewen Chi and Li Dong and Bo Zheng and Shaohan Huang and Xian-Ling Mao and Heyan Huang and Furu Wei},
  booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
  month = aug,
  year = "2021",
  address = "Online",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2021.acl-long.265",
  doi = "10.18653/v1/2021.acl-long.265",
  pages = "3418--3430",}

microsoft/xlm-align-base powered by Hugging Face Inference Toolkit

Send Request

You can use cURL or any REST Client to send a request to the AzureML endpoint with your AzureML token.
curl <AZUREML_ENDPOINT_URL> \
    -X POST \
    -H "Authorization: Bearer <AZUREML_TOKEN>" \
    -H "Content-Type: application/json" \
    -d '{"inputs":"The answer to the universe is undefined."}'

Supported Parameters

  • inputs (string): The text with masked tokens
  • parameters (object):
    • top_k (integer): When passed, overrides the number of predictions to return.
    • targets (string[]): When passed, the model will limit the scores to the passed targets instead of looking up in the whole vocabulary. If the provided targets are not in the model vocab, they will be tokenized and the first resulting token will be used (with a warning, and that might be slower).
Check the full API Specification at the Hugging Face Inference documentation .
Model Specifications
LicenseUnknown
Last UpdatedAugust 2025
ProviderHuggingFace