tsuzumi is a lightweight large language model developed by NTT, designed to handle both Japanese and English with high efficiency. Please refer to the User Guide for model features.Link to Download page (「tsuzumi on Azure MaaSユーザーガイド」)
Model Information
See below table
Model Variations
No variations for the moment
Model Input
Models input text only.
Model Output
Models generate text only.
Model Architecture
tsuzumi is an auto-regressive language optimized transformer. The tuned versions use supervised fine-tuning (SFT).
Model Dates
tsuzumi was trained until 2024/08; Knowledge cutoff is 2024/05.
Model Information Table
Name
Training Data
Params
Content Length
GQA
Tokens
Tsuzumi-7b
A mix of publicly available online and private data
7B
8k
✔
1.4T
Training Data
Data Freshness
The pretraining data has a cutoff of May 2024.
Evaluation Results
In this section, we report the results for the tsuzumi models on Japanese standard benchmarks.
Model
Size
Japanese MT-bench; turn1※ Japanese / English
writing
stem
humanities
roleplay
extraction
coding
math
reasoning
tsuzumi-7B
7B
8.6 / 8.2
7.6 / 7.1
8.45 / 8.2
6.3 / 6.05
5.6 / 2.9
2.3 / 2.2
1.1 / 1.1
2.1 / 4.3
※Evaluated in 1 turn only
Sample inputs and outputs (for real-time inference)
Supported Parameters # Basic parameters:
name
Defaults to
Explanation
temperature
0.15
Controls randomness in the model. Lower values will make the model more deterministic and higher values will make the model more random.
max_tokens
4096
The maximum number of tokens to generate.
top_p
1.0
The cumulative probability of parameter highest probability vocabulary tokens to keep for nucleus sampling.
frequency_penalty
0.0
Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
logit_bias
null
Modify the likelihood of specified tokens appearing in the completion.
logprobs
false
Whether to return log probabilities of the output tokens or not.
top_logprobs
0
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.
n
1
The number of generated response variations.
presence_penalty
0.0
Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
stop
null
Specification for stop words.
stream
false
Whether response is returned in partial message deltas.
# Advanced Parameters – To specify, set “extra-params: allow” in your HTTP request header:
name
Defaults to
explanation
min_tokens
0
The minimum number of tokens to generate.
top_k
-1 (no filter)
The number of highest probability vocabulary tokens to keep for top-k-filtering.
repetition_penalty
1.0
The weight of penalty for repeated phrases. Higher values will suppress repeating similar phrases.