databricks-dbrx-base
Version: 4
Key capabilities
About this model
DBRX is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction. The training mix used for DBRX contains both natural-language and code examples.Key model capabilities
- Inputs: DBRX only accepts text-based inputs and accepts a context length of up to 32768 tokens.
- Output: DBRX only produces text-based outputs.
- Model Architecture: More detailed information about DBRX Instruct and DBRX Base can be found in our technical blog post .
- License: Databricks Open Model License
- Acceptable Use Policy: Databricks Open Model Acceptable Use Policy
- Version: 1.0
- Owner: Databricks, Inc.
Use cases
See Responsible AI for additional considerations for responsible use.Key use cases
These are several general ways to use the DBRX models:- DBRX Base and DBRX Instruct are available for download on HuggingFace.
- The DBRX model repository can be found on GitHub here .
- DBRX Base and DBRX Instruct are available with Databricks Foundation Model API via both Pay-per-token and Provisioned throughput endpoints. These are enterprise-ready deployments.
Out of scope use cases
DBRX does not have multimodal capabilities. Therefore, DBRX should be considered a generalist model for text-based use in the English language.Pricing
Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.Technical specs
DBRX uses a fine-grained mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input. DBRX uses rotary position encodings (RoPE), gated linear units(GLU), and grouped query attention (GQA). It uses the GPT-4 tokenizer as provided in the tiktoken repository. We made these choices based on exhaustive evaluation and scaling experiments.Training cut-off date
The DBRX models were trained on 12T tokens of text, with a knowledge cutoff date of January 2024.Training time
The provider has not supplied this information.Input formats
DBRX only accepts text-based inputs and accepts a context length of up to 32768 tokens.Output formats
DBRX only produces text-based outputs.Supported languages
The vast majority of our training data is in the English language. We did not test DBRX for non-English proficiency. Therefore, DBRX should be considered a generalist model for text-based use in the English language.Sample JSON response
[
{
"0": "Write me a poem about Databricks. I want it to be a sonnet, 14 lines, iambic pentameter,
and I want it to be about the company's mission to accelerate innovation for its customers.
I want it to mention how Databricks unifies data science, engineering, and business, and how
it provides a collaborative workspace for data teams to work on big data and AI projects.
I want it to mention how Databricks is built on Apache Spark and how it provides a managed
platform for data engineering"
}
]
Model architecture
More detailed information about DBRX Instruct and DBRX Base can be found in our technical blog post .Long context
DBRX was pretrained on 12T tokens of carefully curated data and a maximum context length of 32K tokens.Optimizing model performance
The provider has not supplied this information.Additional assets
These are several general ways to use the DBRX models:- DBRX Base and DBRX Instruct are available for download on HuggingFace.
- The DBRX model repository can be found on GitHub here .
- DBRX Base and DBRX Instruct are available with Databricks Foundation Model API via both Pay-per-token and Provisioned throughput endpoints. These are enterprise-ready deployments.
Training disclosure
Training, testing and validation
DBRX was pretrained on 12T tokens of carefully curated data and a maximum context length of 32K tokens. We estimate that this data is at least 2x better token-for-token than the data we used to pretrain the MPT family of models. This new dataset was developed using the full suite of Databricks tools, including Apache Spark™ and Databricks notebooks for data processing, and Unity Catalog for data management and governance. We used curriculum learning for pretraining, changing the data mix during training in ways we found to substantially improve model quality.Distribution
Distribution channels
These are several general ways to use the DBRX models:- DBRX Base and DBRX Instruct are available for download on HuggingFace.
- The DBRX model repository can be found on GitHub here .
- DBRX Base and DBRX Instruct are available with Databricks Foundation Model API via both Pay-per-token and Provisioned throughput endpoints. These are enterprise-ready deployments.
More information
Inference samples
| Inference type | Python sample (Notebook) | CLI with YAML |
|---|---|---|
| Real time | text-generation-online-endpoint.ipynb | text-generation-online-endpoint.sh |
Sample Inputs and Outputs (for real-time inference)
Sample Input
{
"input_data":
{
"input_string": ["Write me a poem about Databricks."],
"parameters": {
"temperature": 0.1,
"top_p": 0.9,
"do_sample": true,
"max_new_tokens": 100
}
}
}
Training Stack
MoE models are complicated to train, and the training of DBRX Base and DBRX Instruct was heavily supported by Databricks' infrastructure for data processing and large-scale LLM training (e.g., Composer , Streaming , Megablocks , and LLM Foundry ). Composer is our core library for large-scale training. It provides an optimized training loop, easy checkpointing and logging , FSDP -based model sharding , convenient abstractions , extreme customizability via callbacks , and more. Streaming enables fast, low cost, and scalable training on large datasets from cloud storage. It handles a variety of challenges around deterministic resumption as node counts change, avoiding redundant downloads across devices, high-quality shuffling at scale, sample-level random access, and speed. Megablocks is a lightweight library for MoE training. Crucially, it supports "dropless MoE," which avoids inefficient padding and is intended to provide deterministic outputs for a given sequence no matter what other sequences are in the batch. LLM Foundry ties all of these libraries together to create a simple LLM pretraining, fine-tuning, and inference experience. DBRX was trained using proprietary optimized versions of the above open source libraries, along with our LLM training platform .Evaluation
We find that DBRX Instruct outperforms established open-source and open-weight base models on the Databricks Model Gauntlet , the Hugging Face Open LLM Leaderboard , and HumanEval. Full evaluation details can be found in our technical blog post .Acknowledgements
The DBRX models were made possible thanks in large part to the open-source community, especially:- The MegaBlocks library, which established a foundation for our MoE implementation
- PyTorch FSDP , which we built on for distributed training
Responsible AI considerations
Safety techniques
The provider has not supplied this information.Safety evaluations
The provider has not supplied this information.Known limitations
The DBRX models were trained on 12T tokens of text, with a knowledge cutoff date of January 2024. The training mix used for DBRX contains both natural-language and code examples. The vast majority of our training data is in the English language. We did not test DBRX for non-English proficiency. Therefore, DBRX should be considered a generalist model for text-based use in the English language. DBRX does not have multimodal capabilities.Acceptable use
Acceptable use policy
Databricks Open Model Acceptable Use PolicyQuality and performance evaluations
Source: Databricks We find that DBRX Instruct outperforms established open-source and open-weight base models on the Databricks Model Gauntlet , the Hugging Face Open LLM Leaderboard , and HumanEval. Full evaluation details can be found in our technical blog post .Benchmarking methodology
Source: Databricks The provider has not supplied this information.Public data summary
Source: Databricks The provider has not supplied this information.Model Specifications
LicenseOther
Last UpdatedFebruary 2026
ProviderDatabricks