microsoft-tapex-base
microsoft-tapex-base
Version: 6
Hugging FaceLast updated July 2025

TAPEX (base-sized model)

TAPEX was proposed in TAPEX: Table Pre-training via Learning a Neural SQL Executor by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. The original repo can be found here .

Model description

TAPEX (Table Pre-training via Execution) is a conceptually simple and empirically powerful pre-training approach to empower existing models with table reasoning skills. TAPEX realizes table pre-training by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries. TAPEX is based on the BART architecture, the transformer encoder-decoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder.

Intended Uses

You can use the raw model for simulating neural SQL execution, i.e., employ TAPEX to execute a SQL query on a given table. However, the model is mostly meant to be fine-tuned on a supervised dataset. Currently TAPEX can be fine-tuned to tackle table question answering tasks and table fact verification tasks. See the model hub to look for fine-tuned versions on a task that interests you.

How to Use

Here is how to use this model in transformers:
from transformers import TapexTokenizer, BartForConditionalGeneration
import pandas as pd

tokenizer = TapexTokenizer.from_pretrained("microsoft/tapex-base")
model = BartForConditionalGeneration.from_pretrained("microsoft/tapex-base")

data = {
    "year": [1896, 1900, 1904, 2004, 2008, 2012],
    "city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
}
table = pd.DataFrame.from_dict(data)

# tapex accepts uncased input since it is pre-trained on the uncased corpus
query = "select year where city = beijing"
encoding = tokenizer(table=table, query=query, return_tensors="pt")

outputs = model.generate(**encoding)

print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
# ['2008']

How to Fine-tuning

Please find the fine-tuning script here .

BibTeX entry and citation info

@inproceedings{
    liu2022tapex,
    title={{TAPEX}: Table Pre-training via Learning a Neural {SQL} Executor},
    author={Qian Liu and Bei Chen and Jiaqi Guo and Morteza Ziyadi and Zeqi Lin and Weizhu Chen and Jian-Guang Lou},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=O50443AsCP}
}

microsoft/tapex-base is a pre-trained language model available on the Hugging Face Hub. It's specifically designed for the table-question-answering task in the transformers library. If you want to learn more about the model's architecture, hyperparameters, limitations, and biases, you can find this information on the model's dedicated Model Card on the Hugging Face Hub . Here's an example API request payload that you can use to obtain predictions from the model:
{
  "inputs": "How many stars does the transformers repository have?"
}
Model Specifications
LicenseMit
Last UpdatedJuly 2025
ProviderHugging Face