Domyn Large
Domyn Large
Version: 1
DomynLast updated March 2026
Domyn Large is a 260B parameter large language model designed for efficient and effective reasoning in constrained environments. It supports extended multi-step reasoning, long-document comprehension, and tool-use workflows. The model can operate with reas
Agents
Instruction
Coding
Domyn Large is a 260B parameter large language model designed for efficient and effective reasoning in constrained environments. It supports extended multi-step reasoning, long-document comprehension, and tool-use workflows. The model can operate with reasoning on to enhance performance in complex tasks requiring detailed analysis, or off for faster, more compact outputs. Optimized for mathematics and coding, Domyn Large combines strong reasoning capabilities with a practical deployment footprint: a single node with 8x NVIDIA H100 GPUs is sufficient for inference. Domyn Large supports more than 50 languages, with high proficiency in European languages such as Italian, Spanish, French, and German, and strong capabilities in Japanese, Russian, Chinese, Arabic, Hindi, Indonesian, Korean, and more. It delivers strong performance with contexts up to 128K tokens, enabling extended reasoning and comprehension over very long documents. Development included supervised fine-tuning (SFT) to strengthen instruction following and ensure consistent performance across tasks. The model achieves state-of-the-art results on enterprise-critical benchmarks including Text-to-SQL, Text-to-Cypher, knowledge graph extraction, and safety classification, making it well-suited for regulated industries use-cases such as financial services, defense, and advanced manufacturing.

Intended Use

Primary Use Cases

Domyn Large is a reasoning-first model for agentic AI scenarios, combining reliable instruction following, robust tool and function calling, orchestration support, and strong multilingual coverage (50+ languages). It offers explicit reasoning control via the system prompt: "thinking on" invokes step-by-step structured reasoning for complex tasks such as code, math, long-context analysis, grounded RAG, while "thinking off" produces concise direct answers for lower latency when depth is unnecessary. With a 128k+ context window it handles long documents, integrates smoothly with retrieval pipelines, and maintains grounded, context-aware outputs, letting developers balance precision versus speed in production workflows. The model achieves state-of-the-art results on enterprise-critical benchmarks including Text-to-SQL, Text-to-Cypher, knowledge graph extraction, and safety classification, making it well-suited for regulated industries use-cases such as financial services, defense, and advanced manufacturing.

Out-of-Scope Use Cases

Our models are not specifically designed or evaluated for all downstream purposes.
As with any language model, developers should carefully evaluate accuracy, safety, and fairness before applying it to specific downstream scenarios, particularly those that may be high-risk. They should also ensure compliance with all applicable laws and regulations (including, but not limited to, privacy and trade compliance) that are relevant to their use case.

Responsible AI Considerations

Like other language models, Domyn Large can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include:
  • Representation of Harms & Perpetuation of Stereotypes: These models can over- or under-represent groups of people, erase representation of some groups, or reinforce demeaning or negative stereotypes. Despite safety post-training, these limitations may still be present due to differing levels of representation of different groups, cultural contexts, or prevalence of examples of negative stereotypes in training data that reflect real-world patterns and societal biases.
  • Inappropriate or Offensive Content: These models may produce other types of inappropriate or offensive content, which may make it inappropriate to deploy for sensitive contexts without additional mitigations that are specific to the use case.
  • Information Reliability: Language models can generate nonsensical content or fabricate content that might sound reasonable but is inaccurate or outdated.
  • Limited Scope for Code: Majority of Domyn Large training data is based in Python and use common packages such as "typing, math, random, collections, datetime, itertools". If the model generates Python scripts that utilize other packages or scripts in other languages, we strongly recommend users manually verify all API uses.
  • Long Conversation: Domyn Large, like other models, can in some cases generate responses that are repetitive, unhelpful, or inconsistent in very long chat sessions in both English and non-English languages. Developers are encouraged to place appropriate mitigations, like limiting conversation turns to account for the possible conversational drift.
Developers should adopt responsible AI best practices, including identifying, assessing, and mitigating risks specific to their use case and cultural or linguistic context. While Domyn Large is a general-purpose model, developers are encouraged to fine-tune it for their specific applications and integrate it into broader AI systems with appropriate language-specific safeguards. Key considerations include: Recommended Mitigations
  • Audit & Monitoring: Use third-party safety tools (e.g., Azure AI Content Safety) to detect and filter harmful or unsafe outputs.
  • Transparency: Clearly inform users when they are interacting with an AI system, and incorporate feedback loops for continual improvement.
  • RAG Integration: Improve factuality and domain relevance through Retrieval-Augmented Generation (RAG).
  • Immutable Logging: For sovereign or customized use, embed audit logging at the architectural level to ensure traceability and prevent tampering.
  • High-Stakes Applications: Avoid deploying the model in sensitive scenarios (e.g., healthcare, legal advice, credit scoring) without extensive validation and safeguards.

Training Data

Domyn Large was post-trained using Supervised Fine-Tuning (SFT), relying on carefully curated data pipelines that combine open-source datasets and synthetically generated data. The focus was on reasoning, code, and task-following capabilities, with a strong emphasis on safety and diversity. The dataset includes a curated mix of:
  • Safety datasets designed to teach the model to provide safe, aligned responses and avoid harmful or inappropriate content.
  • Long-context datasets that leverage the model's ability to handle long inputs, including long-form Q&A and document summarization tasks.
  • Chat datasets to improve natural and coherent dialogue.
  • Instruction-following datasets focused on explicit task instructions.
  • Multilingual datasets in multiple languages to strengthen multilingual capabilities.
  • Code datasets containing source code to improve programming and code reasoning skills, including structured query languages such as SQL and Cypher used for enterprise data access.
  • Math datasets with mathematical problems and solutions to enhance problem-solving abilities.
  • Function-calling datasets that teach the model when and how to use external tools and APIs.
  • STEM datasets centered on science, technology, engineering, and mathematics topics.
All data passed through a multi-step curation pipeline, where examples were scored for:
  • Helpfulness (instruction clarity, completeness of response)
  • Complexity (depth of reasoning, cognitive load)
  • Contamination (data leakage with respect to benchmark evaluations)
Only the top-scoring examples were retained to ensure high-quality supervision signals.

Reasoning Control

This model supports reasoning control:
  • Use '"thinking on"' in the system prompt to enable reasoning mode.
  • Use '"thinking off"' to disable reasoning mode.

Example Usage

Enable Reasoning Mode

[
    { "role": "system", "content": "My system prompt.\n\nthinking on" },
    { "role": "user", "content": "My question" }
]

Disable Reasoning Mode

[
    { "role": "system", "content": "My system prompt.\n\nthinking off" },
    { "role": "user", "content": "My question" }
]

Default Reasoning Mode

[
    { "role": "system", "content": "My system prompt" },
    { "role": "user", "content": "My question" }
]
If the system prompt does not explicitly specify a reasoning mode, the chat template defaults to '"thinking off"'.

Tool Calling

This model has been explicitly trained to support tool calling (function calling).
Follow the guidelines below to achieve the best results.

1. Defining Tools

Tools and their definitions must be included in the system prompt, under the system role.
We recommend wrapping the tool specification in and XML tags, as shown below:
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags.
You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.
Here are the available tools:
<tools>
[{'type': 'function', 'function': {'name': 'create_calendar_event', 'description': 'Create a new event in the calendar', 'parameters': {'type': 'object', 'properties': {'event_title': {'type': 'string', 'description': 'The title of the event'}, 'event_date': {'type': 'string', 'description': 'The date of the event'}, 'event_location': {'type': 'string', 'description': 'The location of the event'}}, 'required': ['event_title', 'event_date']}}}, {'type': 'function', 'function': {'name': 'convert_currency', 'description': 'Convert currency from one unit to another', 'parameters': {'type': 'object', 'properties': {'amount': {'type': 'number', 'description': 'The amount to be converted'}, 'from_currency': {'type': 'string', 'description': 'The currency to convert from'}, 'to_currency': {'type': 'string', 'description': 'The currency to convert to'}}, 'required': ['amount', 'from_currency', 'to_currency']}}}]
</tools>
For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{'arguments': <args-dict>, 'name': <function-name>}
</tool_call>

2. Representing Tool Calls and Tool Responses

You should follow these conventions: Tool calls
  • Must appear as an assistant message.
  • Must be wrapped in <tool_call> and </tool_call> tags.
  • Must contain a JSON object with "name" and "arguments".
Tool responses (Observations)
  • Must appear as a user message.
  • Must be wrapped in <tool_response> and </tool_response> tags.
  • Should contain the raw result returned by the tool.
Below is a complete example of message formatting:
[
  {
    "role": "system",
    "content": "You are a function-calling AI model. Here are the available tools..."
  },
  {
    "role": "user",
    "content": "Can you find Italian restaurants in New York?"
  },
  {
    "role": "assistant",
    "content": "<tool_call>{\"name\": \"find_nearby_restaurants\", \"arguments\": {\"location\": \"New York\", \"cuisine\": \"Italian\", \"radius\": 5000}}</tool_call>"
  },
  {
    "role": "user",
    "content": "<tool_response>[{\"name\": \"Da Pippo\", \"stars\": 5, \"address\": \"Central Park\"}]</tool_response>"
  }
]

Point of Contact for Copyright-Related Complaints

We have designated a point of contact for electronic communication with affected rightsholders regarding copyright and related rights concerns related to this model.
Affected rightsholders and their authorised representatives, including collective management organisations, may submit sufficiently precise and adequately substantiated complaints electronically concerning any non-compliance with our commitments under the Copyright Chapter of the GPAI Code of Practice.
We commit to handling such complaints diligently, impartially, and within a reasonable timeframe, except in cases where the complaint is manifestly unfounded or has already been addressed.
Contact email: copyright@domyn.com This mechanism complements, but does not limit, the available legal measures, remedies, and sanctions under Union and national copyright law.
To evaluate Domyn Large's reasoning capabilities, we selected benchmarks across math, coding, and other aggregated categories. We also tested enterprise-critical tasks: Text-to-SQL, Text-to-Cypher, knowledge-graph extraction (entity/relation triplets), and safety classification.
These tasks evaluate functional correctness (valid queries and accurate triplets) and operational behavior important for production robustness, precision/recall, and safety trade-offs. See Enterprise-Task-Evaluation for details.
We evaluated all benchmarks using a maximum context length of 32k and recommended parameters: temperature=0.6, top_k=25, and min_p=0.1.
CategoryBenchmarkDomyn Large
Math & ReasoningMATH-500 (avg@10)93.5
AIME 2025 (avg@10)38.7
GPQA-D (avg@10)59.1
CodingLiveCodeBench (pass@1)66.7
MBPP score (pass@1)83.2
openai_humaneval (pass@1)94.5
Aggregated benchmarkMMLU85.2
MMLU-PRO75.5
Instruction FollowingIFEval Inst-level-strict-accuracy87.2
MultilingualMMMLU72.8
MGSM85.2
Long ContextRuler 32k84.5
SafetySafetyBench83.17
Enterprise-TasksText2SQL (FINCH)48.0
Text2Cypher35.1
KG Triplet Extraction72.8
Safety Classification77.0
To assess Domyn Large capabilities for tool calling, we report results for Berkeley Function Calling V3 with reasoning mode disabled.
CategoryBenchmarkDomyn Large
Tool CallingBFCL V3 Non Live92.5
BFCL V3 Live82.0
BFCL V3 Multiturn35.1
Model Specifications
LicenseCustom
Training DataJanuary 2024
Last UpdatedMarch 2026
Input TypeText
Output TypeText
ProviderDomyn
Languages5 Languages