microsoft-tapex-large-finetuned-wtq
Version: 4
TAPEX (large-sized model)
TAPEX was proposed in TAPEX: Table Pre-training via Learning a Neural SQL Executor by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. The original repo can be found here .Model description
TAPEX (Table Pre-training via Execution) is a conceptually simple and empirically powerful pre-training approach to empower existing models with table reasoning skills. TAPEX realizes table pre-training by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries. TAPEX is based on the BART architecture, the transformer encoder-decoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. This model is thetapex-base model fine-tuned on the WikiTableQuestions dataset.
Intended Uses
You can use the model for table question answering on complex questions. Some solveable questions are shown below (corresponding tables now shown):| Question | Answer |
|---|---|
| according to the table, what is the last title that spicy horse produced? | Akaneiro: Demon Hunters |
| what is the difference in runners-up from coleraine academical institution and royal school dungannon? | 20 |
| what were the first and last movies greenstreet acted in? | The Maltese Falcon, Malaya |
| in which olympic games did arasay thondike not finish in the top 20? | 2012 |
| which broadcaster hosted 3 titles but they had only 1 episode? | Channel 4 |
How to Use
Here is how to use this model in transformers:from transformers import TapexTokenizer, BartForConditionalGeneration
import pandas as pd
tokenizer = TapexTokenizer.from_pretrained("microsoft/tapex-large-finetuned-wtq")
model = BartForConditionalGeneration.from_pretrained("microsoft/tapex-large-finetuned-wtq")
data = {
"year": [1896, 1900, 1904, 2004, 2008, 2012],
"city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
}
table = pd.DataFrame.from_dict(data)
# tapex accepts uncased input since it is pre-trained on the uncased corpus
query = "In which year did beijing host the Olympic Games?"
encoding = tokenizer(table=table, query=query, return_tensors="pt")
outputs = model.generate(**encoding)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
# [' 2008.0']
How to Eval
Please find the eval script here .BibTeX entry and citation info
@inproceedings{
liu2022tapex,
title={{TAPEX}: Table Pre-training via Learning a Neural {SQL} Executor},
author={Qian Liu and Bei Chen and Jiaqi Guo and Morteza Ziyadi and Zeqi Lin and Weizhu Chen and Jian-Guang Lou},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=O50443AsCP}
}
microsoft/tapex-large-finetuned-wtq is a pre-trained language model available on the Hugging Face Hub. It's specifically designed for the table-question-answering task in the transformers library. If you want to learn more about the model's architecture, hyperparameters, limitations, and biases, you can find this information on the model's dedicated Model Card on the Hugging Face Hub .
Here's an example API request payload that you can use to obtain predictions from the model:
{
"inputs": "How many stars does the transformers repository have?"
}
Model Specifications
LicenseMit
Last UpdatedJuly 2025
ProviderHuggingFace