microsoft-tapex-large-finetuned-wikisql
Version: 3
HuggingFaceLast updated July 2025

TAPEX (large-sized model)

TAPEX was proposed in TAPEX: Table Pre-training via Learning a Neural SQL Executor by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. The original repo can be found here .

Model description

TAPEX (Table Pre-training via Execution) is a conceptually simple and empirically powerful pre-training approach to empower existing models with table reasoning skills. TAPEX realizes table pre-training by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries. TAPEX is based on the BART architecture, the transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. This model is the tapex-base model fine-tuned on the WikiSQL dataset.

Intended Uses

You can use the model for table question answering on relatively simple questions. Some solveable questions are shown below (corresponding tables now shown):
QuestionAnswer
tell me what the notes are for south australiano slogan on current series
what position does the player who played for butler cc (ks) play?guard-forward
how many schools did player number 3 play at?1.0
how many winning drivers in the kraco twin 125 (r2) race were there?1.0
for the episode(s) aired in the u.s. on 4 april 2008, what were the names?"bust a move" part one, "bust a move" part two

How to Use

Here is how to use this model in transformers:
from transformers import TapexTokenizer, BartForConditionalGeneration
import pandas as pd

tokenizer = TapexTokenizer.from_pretrained("microsoft/tapex-large-finetuned-wikisql")
model = BartForConditionalGeneration.from_pretrained("microsoft/tapex-large-finetuned-wikisql")

data = {
    "year": [1896, 1900, 1904, 2004, 2008, 2012],
    "city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
}
table = pd.DataFrame.from_dict(data)

# tapex accepts uncased input since it is pre-trained on the uncased corpus
query = "In which year did beijing host the Olympic Games?"
encoding = tokenizer(table=table, query=query, return_tensors="pt")

outputs = model.generate(**encoding)

print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
# [' 2008.0']

How to Eval

Please find the eval script here .

BibTeX entry and citation info

@inproceedings{
    liu2022tapex,
    title={{TAPEX}: Table Pre-training via Learning a Neural {SQL} Executor},
    author={Qian Liu and Bei Chen and Jiaqi Guo and Morteza Ziyadi and Zeqi Lin and Weizhu Chen and Jian-Guang Lou},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=O50443AsCP}
}

microsoft/tapex-large-finetuned-wikisql is a pre-trained language model available on the Hugging Face Hub. It's specifically designed for the table-question-answering task in the transformers library. If you want to learn more about the model's architecture, hyperparameters, limitations, and biases, you can find this information on the model's dedicated Model Card on the Hugging Face Hub . Here's an example API request payload that you can use to obtain predictions from the model:
{
  "inputs": "How many stars does the transformers repository have?"
}
Model Specifications
LicenseMit
Last UpdatedJuly 2025
ProviderHuggingFace