microsoft-markuplm-large-finetuned-websrc
Version: 2
MarkupLM, fine-tuned on WebSRC
Multimodal (text +markup language) pre-training for Document AIIntroduction
MarkupLM is a simple but effective multi-modal pre-training method of text and markup language for visually-rich document understanding and information extraction tasks, such as webpage QA and webpage information extraction. MarkupLM archives the SOTA results on multiple datasets. For more details, please refer to our paper: MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding Junlong Li, Yiheng Xu, Lei Cui, Furu WeiUsage
We refer to the docs and demo notebooks .microsoft/markuplm-large-finetuned-websrc powered by Hugging Face Inference Toolkit
Send Request
You can use cURL or any REST Client to send a request to the AzureML endpoint with your AzureML token.curl <AZUREML_ENDPOINT_URL> \
-X POST \
-H "Authorization: Bearer <AZUREML_TOKEN>" \
-H "Content-Type: application/json" \
-d '{"question": "What is my name?", "context": "My name is Clara and I live in Berkeley."}'
Supported Parameters
- inputs (object): One (context, question) pair to answer
- context (string): The context to be used for answering the question
- question (string): The question to be answered
- parameters (object):
- top_k (integer): The number of answers to return (will be chosen by order of likelihood). Note that we return less than topk answers if there are not enough options available within the context.
- doc_stride (integer): If the context is too long to fit with the question for the model, it will be split in several chunks with some overlap. This argument controls the size of that overlap.
- max_answer_len (integer): The maximum length of predicted answers (e.g., only answers with a shorter length are considered).
- max_seq_len (integer): The maximum length of the total sentence (context + question) in tokens of each chunk passed to the model. The context will be split in several chunks (using docStride as overlap) if needed.
- max_question_len (integer): The maximum length of the question after tokenization. It will be truncated if needed.
- handle_impossible_answer (boolean): Whether to accept impossible as an answer.
- align_to_words (boolean): Attempts to align the answer to real words. Improves quality on space separated languages. Might hurt on non-space-separated languages (like Japanese or Chinese)
Model Specifications
LicenseUnknown
Last UpdatedAugust 2025
ProviderHuggingFace