tiiuae-falcon-40b
Version: 10
tiiuaeLast updated August 2024

Description

Falcon-40B is a large language model (LLM) developed by the Technology Innovation Institute (TII) with 40 billion parameters. It is a causal decoder-only model trained on 1 trillion tokens from the RefinedWeb dataset, enhanced with curated corpora. Falcon-40B supports English, German, Spanish, and French languages, with limited capabilities in Italian, Portuguese, Polish, Dutch, Romanian, Czech, and Swedish. It is available under the Apache 2.0 license. Falcon-40B is considered the best open-source model currently available, optimized for inference with features such as FlashAttention and multiquery. However, it is recommended to fine-tune the model for specific use cases. The training of Falcon-40B involved using 384 A100 40GB GPUs and took two months. The model carries biases and stereotypes encountered online and requires appropriate precautions for production use. It is suggested to finetune the model for specific tasks and consider guardrails. The technical specifications, training details, and evaluation results are provided in the summary.
The above summary was generated using ChatGPT. Review the original model card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model.

Training Details

Training Data

Falcon-40B was trained on 1,000B tokens of RefinedWeb , a high-quality filtered and deduplicated web dataset which we enhanced with curated corpora. Significant components from our curated copora were inspired by The Pile (Gao et al., 2020 ).
Data sourceFractionTokensSources
RefinedWeb-English 75%750Bmassive web crawl
RefinedWeb-Europe7%70BEuropean massive web crawl
Books6%60B
Conversations5%50BReddit, StackOverflow, HackerNews
Code5%50B
Technical2%20BarXiv, PubMed, USPTO, etc.
RefinedWeb-Europe is made of the following languages:
LanguageFraction of multilingual dataTokens
German26%18B
Spanish24%17B
French23%16B
Italian7%5B
Portuguese4%3B
Polish4%3B
Dutch4%3B
Romanian3%2B
Czech3%2B
Swedish2%1B
The data was tokenized with the Falcon-7B /40B tokenizer.

Training Procedure

Falcon-40B was trained on 384 A100 40GB GPUs, using a 3D parallelism strategy (TP=8, PP=4, DP=12) combined with ZeRO.

Training Hyperparameters

HyperparameterValueComment
Precisionbfloat16
OptimizerAdamW
Learning rate1.85e-44B tokens warm-up, cosine decay to 1.85e-5
Weight decay1e-1
Z-loss1e-4
Batch size1152100B tokens ramp-up

Speeds, Sizes, Times

Training started in December 2022 and took two months.

Evaluation

Paper coming soon. See the OpenLLM Leaderboard for early results.

Technical Specifications

Model Architecture and Objective

Falcon-40B is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token). The architecture is broadly adapted from the GPT-3 paper (Brown et al., 2020 ), with the following differences: For multiquery, we are using an internal variant which uses independent key and values per tensor parallel degree.
HyperparameterValueComment
Layers60
d_model8192
head_dim64Reduced to optimise for FlashAttention
Vocabulary65024
Sequence length2048

Compute Infrastructure

Hardware

Falcon-40B was trained on AWS SageMaker, on 384 A100 40GB GPUs in P4d instances.

Software

Falcon-40B was trained a custom distributed training codebase, Gigatron. It uses a 3D parallelism approach combined with ZeRO and high-performance Triton kernels (FlashAttention, etc.)

License

Falcon-40B is made available under the Apache 2.0 license.

Finetuning samples

TaskUse caseDatasetPython sample (Notebook)CLI with YAML
Text ClassificationEmotion DetectionEmotion emotion-detection.ipynb emotion-detection.sh

Model Evaluation Sample

TaskUse caseDatasetPython sample (Notebook)CLI with YAML
Text generationText generation cnn_dailymail evaluate-model-text-generation.ipynb evaluate-model-text-generation.yml

Inference samples

Inference typePython sample (Notebook)CLI with YAML
Real timetext-generation-online-endpoint.ipynb text-generation-online-endpoint.sh
Batchtext-generation-batch-endpoint.ipynb coming soon

Sample input (for real-time inference)

{
  "input_data": {
      "input_string":["The meaning of the life is"]
  }
}

Sample output

[
  {
    "0": "The meaning of the life is to find your gift. The purpose of life is to give it away"
  }
]
Model Specifications
LicenseApache-2.0
Last UpdatedAugust 2024
Publishertiiuae
Languages4 Languages