Nemotron-3-8B-Chat-SteerLM
Version: 2
Model Overview
Description
Nemotron-3-8B-SteerLM is an 8 billion parameter generative language model based on the NVIDIA 8B GPT base model. It has been customized using the SteerLM Method developed by NVIDIA to allow for user control of model outputs during inference Key capabilities enabled by SteerLM:- Dynamic steering of responses by specifying desired attributes like quality, helpfulness, and toxicity at inference time.
- Simplified training compared to RLHF techniques like fine-tuning and bootstrapping.
License
The use of this model is governed by the NVIDIA AI Foundational Models Community License AgreementModel Architecture
Architecture Type: Transformer Network Architecture: Generative Pre-Trained Transformer (GPT-3) The SteerLM method involves the following key steps:- Train an attribute prediction model on human annotated data to evaluate response quality.
- Use this model to annotate diverse datasets and enrich training data.
- Perform conditioned fine-tuning to align responses with specified combinations of attributes.
- (Optionally) Bootstrap training through model sampling and further fine-tuning.
Prompt Format:
Single Turn | Multi-Turn or Few-shot/In-context prompting |
---|---|
<extra_id_0>System A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. <extra_id_1>User {prompt} <extra_id_1>Assistant <extra_id_2>quality:4,understanding:4, correctness:4,coherence:4,complexity:4, verbosity:4,toxicity:0,humor:0,creativity:0, violence:0,helpfulness:4,not_appropriate:0, hate_speech:0,sexual_content:0, fails_task:0,political_content:0, moral_judgement:0,lang:en | <extra_id_0>System A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. <extra_id_1>User {prompt 1} <extra_id_1>Assistant <extra_id_2>quality:4,understanding:4, correctness:4,coherence:4, complexity:4,verbosity:4,toxicity:0, humor:0,creativity:0,violence:0, helpfulness:4,not_appropriate:0, hate_speech:0,sexual_content:0, fails_task:0,political_content:0, moral_judgement:0,lang:en {response 1} <extra_id_1>User {prompt 2} <extra_id_1>Assistant <extra_id_2>quality:4,understanding:4, correctness:4,coherence:4,complexity:4, verbosity:4,toxicity:0,humor:0,creativity:0, violence:0,helpfulness:4,not_appropriate:0, hate_speech:0,sexual_content:0, fails_task:0,political_content:0, moral_judgement:0,lang:en |
Example prompt formation code
PROMPT_TEMPLATE = """<extra_id_0>SystemA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
<extra_id_1>User
{prompt}
<extra_id_1>Assistant
<extra_id_2>quality:4,understanding:4,correctness:4,coherence:4,complexity:4,verbosity:4,toxicity:0,humor:0,creativity:0,violence:0,helpfulness:4,not_appropriate:0,hate_speech:0,sexual_content:0,fails_task:0,political_content:0,moral_judgement:0,lang:en"""
question = "Write a poem on NVIDIA in the style of Shakespeare"
prompt = PROMPT_TEMPLATE.format(prompt=question)
print(prompt)
Each of the properties (e.g. humor, toxicity…) can receive integer values in the range [0..4].
Samples
Inference samples
Inference type | Python sample (Notebook) |
---|---|
Real time | text-generation-online-endpoint-nemotron.ipynb |
Software Integration
Runtime Engine(s): NVIDIA AI Enterprise Toolkit: NeMo Framework See the document here for details on how to setup an inference server with the pyTriton and TensorRT-LLM backend.Dataset
NVIDIA models are trained on a diverse set of public and proprietary datasets. NVIDIA is committed to the responsible development of large language models and conducts reviews of all datasets included in training.Intended use
- The 8B-Chat-SteerLM model is for users who want to customize a model’s response during inference.
- Ethical use: Technology can have a profound impact on people and the world, and NVIDIA is committed to enabling trust and transparency in AI development. NVIDIA encourages users to adopt principles of AI ethics and trustworthiness to guide your business decisions by following the guidelines in the NVIDIA AI Foundational Models Community License Agreement.
Limitations
- The model was trained on the data that contains toxic language and societal biases originally crawled from the Internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts.
- The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.
References
https://developer.nvidia.com/blog/nvidia-ai-foundation-models-build-custom-enterprise-chatbots-and-co-pilots-with-production-ready-llms/Model Specifications
LicenseCustom
Last UpdatedNovember 2023
Publishernvidia-ai
Languages1 Language