databricks-dbrx-base
databricks-dbrx-base
Version: 4
DatabricksLast updated February 2026

Key capabilities

About this model

DBRX is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction. The training mix used for DBRX contains both natural-language and code examples.

Key model capabilities

Use cases

See Responsible AI for additional considerations for responsible use.

Key use cases

These are several general ways to use the DBRX models:
  • DBRX Base and DBRX Instruct are available for download on HuggingFace.
  • The DBRX model repository can be found on GitHub here .
  • DBRX Base and DBRX Instruct are available with Databricks Foundation Model API via both Pay-per-token and Provisioned throughput endpoints. These are enterprise-ready deployments.

Out of scope use cases

DBRX does not have multimodal capabilities. Therefore, DBRX should be considered a generalist model for text-based use in the English language.

Pricing

Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.

Technical specs

DBRX uses a fine-grained mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input. DBRX uses rotary position encodings (RoPE), gated linear units(GLU), and grouped query attention (GQA). It uses the GPT-4 tokenizer as provided in the tiktoken repository. We made these choices based on exhaustive evaluation and scaling experiments.

Training cut-off date

The DBRX models were trained on 12T tokens of text, with a knowledge cutoff date of January 2024.

Training time

The provider has not supplied this information.

Input formats

DBRX only accepts text-based inputs and accepts a context length of up to 32768 tokens.

Output formats

DBRX only produces text-based outputs.

Supported languages

The vast majority of our training data is in the English language. We did not test DBRX for non-English proficiency. Therefore, DBRX should be considered a generalist model for text-based use in the English language.

Sample JSON response

[
  {
    "0": "Write me a poem about Databricks. I want it to be a sonnet, 14 lines, iambic pentameter, 
    and I want it to be about the company's mission to accelerate innovation for its customers.
    I want it to mention how Databricks unifies data science, engineering, and business, and how 
    it provides a collaborative workspace for data teams to work on big data and AI projects. 
    I want it to mention how Databricks is built on Apache Spark and how it provides a managed 
    platform for data engineering"
  }
]

Model architecture

More detailed information about DBRX Instruct and DBRX Base can be found in our technical blog post .

Long context

DBRX was pretrained on 12T tokens of carefully curated data and a maximum context length of 32K tokens.

Optimizing model performance

The provider has not supplied this information.

Additional assets

These are several general ways to use the DBRX models:
  • DBRX Base and DBRX Instruct are available for download on HuggingFace.
  • The DBRX model repository can be found on GitHub here .
  • DBRX Base and DBRX Instruct are available with Databricks Foundation Model API via both Pay-per-token and Provisioned throughput endpoints. These are enterprise-ready deployments.

Training disclosure

Training, testing and validation

DBRX was pretrained on 12T tokens of carefully curated data and a maximum context length of 32K tokens. We estimate that this data is at least 2x better token-for-token than the data we used to pretrain the MPT family of models. This new dataset was developed using the full suite of Databricks tools, including Apache Spark™ and Databricks notebooks for data processing, and Unity Catalog for data management and governance. We used curriculum learning for pretraining, changing the data mix during training in ways we found to substantially improve model quality.

Distribution

Distribution channels

These are several general ways to use the DBRX models:
  • DBRX Base and DBRX Instruct are available for download on HuggingFace.
  • The DBRX model repository can be found on GitHub here .
  • DBRX Base and DBRX Instruct are available with Databricks Foundation Model API via both Pay-per-token and Provisioned throughput endpoints. These are enterprise-ready deployments.

More information

Inference samples

Inference typePython sample (Notebook)CLI with YAML
Real timetext-generation-online-endpoint.ipynb text-generation-online-endpoint.sh

Sample Inputs and Outputs (for real-time inference)

Sample Input

{
  "input_data": 
  {
    "input_string": ["Write me a poem about Databricks."],
     "parameters": {
          "temperature": 0.1,
           "top_p": 0.9,
          "do_sample": true,
           "max_new_tokens": 100
       }  
   }
}

Training Stack

MoE models are complicated to train, and the training of DBRX Base and DBRX Instruct was heavily supported by Databricks' infrastructure for data processing and large-scale LLM training (e.g., Composer , Streaming , Megablocks , and LLM Foundry ). Composer is our core library for large-scale training. It provides an optimized training loop, easy checkpointing and logging , FSDP -based model sharding , convenient abstractions , extreme customizability via callbacks , and more. Streaming enables fast, low cost, and scalable training on large datasets from cloud storage. It handles a variety of challenges around deterministic resumption as node counts change, avoiding redundant downloads across devices, high-quality shuffling at scale, sample-level random access, and speed. Megablocks is a lightweight library for MoE training. Crucially, it supports "dropless MoE," which avoids inefficient padding and is intended to provide deterministic outputs for a given sequence no matter what other sequences are in the batch. LLM Foundry ties all of these libraries together to create a simple LLM pretraining, fine-tuning, and inference experience. DBRX was trained using proprietary optimized versions of the above open source libraries, along with our LLM training platform .

Evaluation

We find that DBRX Instruct outperforms established open-source and open-weight base models on the Databricks Model Gauntlet , the Hugging Face Open LLM Leaderboard , and HumanEval. Full evaluation details can be found in our technical blog post .

Acknowledgements

The DBRX models were made possible thanks in large part to the open-source community, especially:
  • The MegaBlocks library, which established a foundation for our MoE implementation
  • PyTorch FSDP , which we built on for distributed training

Responsible AI considerations

Safety techniques

The provider has not supplied this information.

Safety evaluations

The provider has not supplied this information.

Known limitations

The DBRX models were trained on 12T tokens of text, with a knowledge cutoff date of January 2024. The training mix used for DBRX contains both natural-language and code examples. The vast majority of our training data is in the English language. We did not test DBRX for non-English proficiency. Therefore, DBRX should be considered a generalist model for text-based use in the English language. DBRX does not have multimodal capabilities.

Acceptable use

Acceptable use policy

Databricks Open Model Acceptable Use Policy

Quality and performance evaluations

Source: Databricks We find that DBRX Instruct outperforms established open-source and open-weight base models on the Databricks Model Gauntlet , the Hugging Face Open LLM Leaderboard , and HumanEval. Full evaluation details can be found in our technical blog post .

Benchmarking methodology

Source: Databricks The provider has not supplied this information.

Public data summary

Source: Databricks The provider has not supplied this information.
Model Specifications
LicenseOther
Last UpdatedFebruary 2026
ProviderDatabricks