Llama-3.2-3B
Version: 4
Key capabilities
About this model
Llama 3.2 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks.Key model capabilities
- Multilingual dialogue use cases
- Agentic retrieval and summarization tasks
- Assistant-like chat applications
- Knowledge retrieval and summarization
- Mobile AI powered writing assistants
- Query and prompt rewriting
- Natural language generation tasks
Use cases
See Responsible AI for additional considerations for responsible use.Key use cases
Llama 3.2 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks.Out of scope use cases
Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card.Pricing
Pricing is based on a number of factors, including deployment type and tokens used. See pricing details here.Technical specs
Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.| Training Data | Params | Input modalities | Output modalities | Context Length | GQA | Shared Embeddings | Token count | Knowledge cutoff | |
|---|---|---|---|---|---|---|---|---|---|
| Llama 3.2 (text only) | A new mix of publicly available online data. | 1B (1.23B) | Multilingual Text | Multilingual Text and code | 128k | Yes | Yes | Up to 9T tokens | December 2023 |
| 3B (3.21B) | Multilingual Text | Multilingual Text and code |
Training cut-off date
The pretraining data has a cutoff of December 2023.Training time
Training utilized a cumulative of 916k GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency.| Training Time (GPU hours) | Logit Generation Time (GPU Hours) | Training Power Consumption (W) | Training Location-Based Greenhouse Gas Emissions (tons CO2eq) | Training Market-Based Greenhouse Gas Emissions (tons CO2eq) | |
|---|---|---|---|---|---|
| Llama 3.2 1B | 370k | - | 700 | 107 | 0 |
| Llama 3.2 3B | 460k | - | 700 | 133 | 0 |
| Total | 830k | 86k | 240 | 0 |
Input formats
Multilingual TextOutput formats
Multilingual Text and codeSupported languages
English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly.Sample JSON response
Sample input
{
"input_data": {
"input_string": ["I believe the meaning of life is"],
"parameters":{
"top_p": 0.8,
"temperature": 0.8,
"max_new_tokens": 100,
"do_sample": true
}
}
}
Sample output
[
{
"0": "I believe the meaning of life is to be happy. I think we should always strive to be happy and live our lives to the fullest. I don't think we should always worry about what other people think. We should always do what makes us happy and what we think is right. I think it's important to always be yourself and never try to be someone you're not. I think it's important to always be positive and never give up. I think it's important to always believe in yourself and never let anyone tell you that"
}
]
Model architecture
Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. All model versions use Grouped-Query Attention (GQA) for improved inference scalability.Long context
Context Length: 128k| Category | Benchmark | # Shots | Metric | Llama 3.2 1B | Llama 3.2 3B | Llama 3.1 8B |
|---|---|---|---|---|---|---|
| Long Context | Needle in Haystack | 0 | em | 96.8 | 1 | 1 |
| Capability | Benchmark | # Shots | Metric | Llama 3.2 1B | Llama 3.2 3B | Llama 3.1 8B | |
|---|---|---|---|---|---|---|---|
| Long Context | InfiniteBench/En.QA | 0 | longbook_qa/f1 | 20.3 | 19.8 | 27.3 | |
| InfiniteBench/En.MC | 0 | longbook_choice/acc | 38.0 | 63.3 | 72.2 | ||
| NIH/Multi-needle | 0 | recall | 75.0 | 84.7 | 98.8 |
Optimizing model performance
The provider has not supplied this information.Additional assets
Instructions on how to provide feedback or comments on the model can be found in the model README . For more technical information about generation parameters and recipes for how to use Llama 3.2 in applications, please go here .Training disclosure
Training, testing and validation
Llama 3.2 was pretrained on up to 9 trillion tokens of data from publicly available sources. For the 1B and 3B Llama 3.2 models, we incorporated logits from the Llama 3.1 8B and 70B models into the pretraining stage of the model development, where outputs (logits) from these larger models were used as token-level targets. Knowledge distillation was used after pruning to recover performance. In post-training we used a similar recipe as Llama 3.1 and produced final chat models by doing several rounds of alignment on top of the pre-trained model. Each round involved Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO).Distribution
Distribution channels
Use of Llama 3.2 is governed by the Llama 3.2 Community License (a custom, commercial license agreement).More information
Model Developer: Meta Model Release Date: Sept 25, 2024 Status: This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety. Training Factors: We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure. Training Greenhouse Gas Emissions: Estimated total location-based greenhouse gas emissions were 240 tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy; therefore, the total market-based greenhouse gas emissions for training were 0 tons CO2eq.Responsible AI considerations
Safety techniques
As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks:- Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama
- Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm
- Provide protections for the community to help prevent the misuse of our models
Safety evaluations
Scaled Evaluations: We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Purple Llama safeguards to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case. Red Teaming: We conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets. Critical Risks Assessment: In addition to our safety work above, we took extra care on measuring and/or mitigating the following critical risk areas: 1. CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive Weapons): Llama 3.2 1B and 3B models are smaller and less capable derivatives of Llama 3.1. For Llama 3.1 70B and 405B, to assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons and have determined that such testing also applies to the smaller 1B and 3B models. 2. Child Safety: Child Safety risk assessments were conducted using a team of experts, to assess the model's capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences. 3. Cyber Attacks: For Llama 3.1 405B, our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed. Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Because Llama 3.2's 1B and 3B models are smaller and less capable models than Llama 3.1 405B, we broadly believe that the testing conducted for the 405B model also applies to Llama 3.2 models.Known limitations
Values: The core values of Llama 3.2 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.2 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress. Testing: Llama 3.2 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.2's potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.2 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our Responsible Use Guide , Trust and Safety solutions, and other resources to learn more about responsible development. Status: This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety. Constrained Environments: Llama 3.2 1B and 3B models are expected to be deployed in highly constrained environments, such as mobile devices. LLM Systems using smaller models will have a different alignment profile and safety/helpfulness tradeoff than more complex, larger systems. Developers should ensure the safety of their system meets the requirements of their use case. We recommend using lighter system safeguards for such use cases, like Llama Guard 3-1B or its mobile-optimized version.Acceptable use
Acceptable use policy
Out of Scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card. Intended Use Cases: Llama 3.2 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks.Model Specifications
LicenseCustom
Last UpdatedJanuary 2026
ProviderMeta
Languages1 Language