Nemotron-3-8B-Chat-RLHF
Version: 1
Model Overview
Description
Nemotron-3-8B-Chat-4k-RLHF is a large language model instruct-tuned on an 8B base model. It takes input with context length up to 4,096 tokens.The model has been further fine-tuned for instruction following using Reinforcement Learning from Human Feedback (RLHF). Nemotron-3-8B-Chat-4k-RLHF is part of Nemotron-3, is a family of enterprise ready decoder-only generative text models compatible with NeMo Framework. For other models in this collection, see here NVIDIA NeMo is an end-to-end, cloud-native framework to build, customize, and deploy generative AI models anywhere. It includes training and inferencing frameworks, guardrailing toolkits, data curation tools, and pretrained models, offering enterprises an easy, cost-effective, and fast way to adopt generative AI.License
The use of this model is governed by the NVIDIA AI Foundational Models Community License AgreementModel Architecture
Architecture Type: Transformer Network Architecture: Generative Pre-Trained Transformer (GPT-3)Prompt Format
Single Turn | Multi-Turn or Few-shot/In-context prompting |
---|---|
<extra_id_0>System <extra_id_1>User {prompt} <extra_id_1>Assistant | <extra_id_0>System <extra_id_1>User {prompt 1} <extra_id_1>Assistant {response 1} <extra_id_1>User {prompt 2} <extra_id_1>Assistant {response 2} ... <extra_id_1>User {prompt N} <extra_id_1>Assistant |
Example prompt formation code
PROMPT_TEMPLATE = """<extra_id_0>System{system}
<extra_id_1>User{prompt}
<extra_id_1>Assistant
"""
system = ""
prompt = "Write a poem on NVIDIA in the style of Shakespeare"
prompt = PROMPT_TEMPLATE.format(prompt=prompt, system=system)
print(prompt)
Samples
Inference samples
Inference type | Python sample (Notebook) |
---|---|
Real time | text-generation-online-endpoint-nemotron.ipynb |
Software Integration
Runtime Engine(s): NVIDIA AI Enterprise Toolkit: NeMo Framework See the document here for details on how to setup an inference server with the pyTriton and TensorRT-LLM backend.Training data
NVIDIA models are trained on a diverse set of public and proprietary datasets. This model was trained on a dataset containing 3.5 Trillion tokens of text. The dataset contains 53 different human languages and 37 programming languages. NVIDIA is committed to the responsible development of large language models and conducts reviews of all datasets included in training.Intended use
- Nemotron-3-8B-chat-4k-rlhf is best for chat use cases including Question and Answering, Search, Summarization following instructions.
- Ethical use: Technology can have a profound impact on people and the world, and NVIDIA is committed to enabling trust and transparency in AI development. NVIDIA encourages users to adopt principles of AI ethics and trustworthiness to guide your business decisions by following the guidelines in the NVIDIA NeMo Foundational Models Community License Agreement.
Limitations
- The model was trained on the data that contains toxic language and societal biases originally crawled from the Internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts.
- The Model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.
References
https://developer.nvidia.com/blog/nvidia-ai-foundation-models-build-custom-enterprise-chatbots-and-co-pilots-with-production-ready-llms/Model Specifications
LicenseCustom
Last UpdatedNovember 2023
Publishernvidia-ai
Languages1 Language