Fara1.5-9B
Version: 1
Fara1.5-9B is a multimodal web agent developed by Microsoft Research AI Frontiers. It observes the browser via screenshots and acts on behalf of the user by emitting structured tool calls (click, type, scroll, web_search, visit_url, and others) to complete multi-step web tasks end-to-end. Primary use cases include filling forms, shopping, booking travel and restaurants, information seeking, and account-driven workflows. Fara1.5-9B is trained to recognize "critical points" in a task (situations involving missing user information, ambiguous instructions, or irreversible actions such as completing a purchase or sending a message) and to pause for user confirmation rather than proceed unilaterally. It is intended for human-in-the-loop, sandboxed deployment; the recommended deployment vehicle is MagenticLite, which provides allow-listed navigation, watch-mode action monitoring, an immediate pause control, and a Docker-isolated browser environment.
Fara1.5-9B is a 9B-parameter multimodal decoder-only language model fine-tuned from the Qwen 3.5 9B base. It accepts the user's textual goal, current browser screenshot(s), and a textual history of prior thoughts and actions, and emits a chain-of-thought block followed by a structured tool-call block. The model is vision-only, perceiving the browser exclusively through UI screenshots, and supports up to a 262K-token context window, which enables long multi-step trajectories with substantial screenshot and action-history accumulation.
Intended Use
Primary Use Cases
Fara1.5-9B is intended as a computer-use agent for browser-based task automation, deployed inside a sandboxed harness with human-in-the-loop supervision. MagenticLite is the recommended deployment vehicle, providing allow-listed navigation, watch-mode action monitoring, an immediate pause control, and a Docker-isolated browser. Within this configuration, Fara1.5-9B observes the browser via screenshots and emits structured tool calls to advance multi-step tasks. Primary use cases include:- Web task automation in productivity scenarios: filling forms, online shopping, booking travel and restaurants, information seeking, and other repetitive web workflows where automating clicks, typing, and navigation saves user time.
- Web-agent role inside broader agentic systems: serving as the browser-acting sub-agent for orchestrators (for example, MagenticBrain or other planning models) that delegate web-bound subtasks.
- Grounding model for screen-action prediction: predicting coordinates and actions on a screenshot when integrated as a vision-grounded action head inside a larger agentic stack.
- Research and benchmarking on computer-use agents: studying web-agent behavior, evaluating safety controls, and developing new agentic capabilities in controlled sandboxed environments.
Out-of-Scope Use Cases
Fara1.5-9B is not designed for fully autonomous web operation without human-in-the-loop supervision, deployment in highly regulated or high-stakes domains (legal, medical, or financial advice; safety-critical workflows; decisions that affect a user's legal status, health, employment, or finances), commercial applications without further testing and development beyond the conditions evaluated here, multilingual contexts (the model supports English only; non-English content may produce degraded or unsafe behavior), or unsandboxed execution against the open web with unrestricted permissions. Use that involves passing passwords, payment credentials, authentication tokens, or sensitive personal information directly to the model is unsupported. Developers should consider common limitations of language models as they select use cases, evaluate and mitigate for accuracy, safety, and fairness before deploying within a specific downstream context, and adhere to applicable laws and regulations (including privacy, trade compliance, and sectoral rules) relevant to their deployment.Responsible AI Considerations
Like other language models, Fara1.5-9B can behave in ways that are unfair, unreliable, or unsafe. Because it is a computer-use agent that acts on the live web on behalf of a user, the most material risks relate to how its actions interact with browser state and online content rather than to free-form text generation. The safety properties described here apply specifically to deployment within MagenticLite, the recommended user-facing harness, which provides allow-listed navigation, watch-mode action monitoring, an immediate pause control, and a Docker-isolated browser environment. Limiting behaviors to be aware of include:- Unintended or unsafe browser actions: ambiguous user instructions, misleading UI elements, and unexpected on-screen content can lead to incorrect clicks, form submissions, or navigation. Fara1.5-9B is trained to pause at "critical points" (entering personal information, payment or shipping details, completing purchases or bookings, sending emails, submitting job applications, signing into accounts, making phone calls), but the safety property depends on the model correctly identifying these situations.
- Prompt injection from web content: harmful or adversarial content on visited pages can attempt to manipulate the model's behavior. Allow-lists, block-lists, and watch-mode monitoring are the primary mitigations.
- Quality of service: the model is trained primarily on English and is not intended for multilingual use; non-English content may experience worse performance. English varieties with less representation in the training data may also experience worse performance than standard American English.
- Representation harms and stereotypes: like other language models, Fara1.5-9B may over- or under-represent groups of people or reinforce demeaning or negative stereotypes despite safety post-training.
- Inappropriate or offensive content: may produce content unsuitable for sensitive deployment contexts without additional, use-case-specific mitigations.
- Information reliability: may generate nonsensical or fabricated content, misattribute sources, or be misled by deceptive or low-quality online content. Outputs require verification before being acted on.
- Data exposure: sensitive credentials (passwords, authentication tokens, payment data, personal identifiers) should never be passed to the model. Sandboxing, scoped permissions, and strict access controls are required to prevent leakage.
- Use MagenticLite where possible: the harness is built for safe Fara1.5-9B deployment and provides allow-lists, watch-mode, an immediate pause control, and sandboxing.
- Always run with human-in-the-loop: a human should monitor Fara1.5-9B's actions on the live web and be able to halt them immediately.
- Sandbox the browser: run Fara1.5-9B in a containerized environment (Docker, VM) that does not expose host files, environment variables, or saved credentials.
- Restrict network reach: use allow-lists or block-lists to limit which sites Fara1.5-9B can visit; this reduces exposure to prompt-injection attacks from adversarial pages.
- Never share sensitive credentials: passwords, payment data, and personal identifiers should not be entered into the model context.
- Layer safety services: use Azure AI Content Safety or equivalent guardrails on inputs and outputs.
- High-risk scenarios: do not use Fara1.5-9B for legal, medical, financial, or other regulated advice, nor in safety-critical workflows. Add application-layer safeguards appropriate to the deployment context.
- Misuse and misinformation: build feedback mechanisms and monitor for fraud, spam, or malware-style abuse patterns. Inform end-users they are interacting with an AI system.
- Allocation decisions: Fara1.5-9B is not suitable for scenarios with consequential impact on legal status or the allocation of resources or life opportunities (housing, employment, credit, and similar) without further assessment and additional debiasing.
- Verify outputs: treat model outputs as recommendations rather than authoritative decisions. Hallucination, misattribution, and adversarial manipulation by web content are all possible.
- Commercial deployment: this release is intended for research and development. Commercial use requires additional testing and validation in the deployment context.
Training Data
The Fara1.5-9B post-training corpus is composed primarily of synthetic web-task trajectories generated by FaraGen2.0, Microsoft Research AI Frontiers' multi-agent data pipeline, alongside curated public datasets. Both image and text modalities are present. The data mix consists of:- Synthetic trajectory data:
- Grounding data: curated datasets for predicting actions and coordinates on the screen, combining screenshots, textual annotations, and bounding-box labels.
- UI understanding data: visual question answering, captioning, and OCR over web-page screenshots collected by the data generation pipeline.
- Safety and instruction-following data: refusal datasets containing harmful task statements that the model is trained to refuse rather than act on. This subset is text-only.
- Public web and reasoning datasets: open-source datasets used alongside the synthetic corpus to broaden coverage.
Fara1.5-9B is evaluated on web-agent benchmarks because its intended capability surface is multi-step browser-based task completion rather than open-ended generation. The headline benchmarks are Online-Mind2Web (open-web task completion against real, unmodified websites) and WebVoyager (open-web information seeking and transaction tasks). Both are reported as task success rate or accuracy in percent. Scores for Fara1.5-9B and Fara-7B are averaged over three runs to account for run-to-run variance on live web targets. Comparators include the prior Fara-7B release and two contemporaneous open-weight web-agent baselines: MolmoWeb-8B and GUI-Owl-1.5-8B.
Headline takeaway: Fara1.5-9B substantially outperforms prior open-weight web-agent baselines on both benchmarks. The largest gains are on Online-Mind2Web, the harder open-web evaluation, where Fara1.5-9B reaches 63.4% against a next-best open baseline of 48.6% (GUI-Owl-1.5-8B) and roughly doubles the prior Fara-7B release at 34.1%. WebVoyager moves from the high-70s baselines to 86.6%, indicating consistent gains on both transactional and information-seeking tasks. Reported scores reflect Fara1.5-9B in isolation under standardized evaluation harnesses; in production use within MagenticLite, end-user-facing behavior is the combined output of the model and the harness (allow-listed navigation, watch-mode monitoring, pause controls, and a sandboxed browser environment), and end-to-end product scenarios are tracked separately.
| Model | Online-Mind2Web | WebVoyager |
|---|---|---|
| Fara1.5-9B | 63.4 | 86.6 |
| GUI-Owl-1.5-8B | 48.6 | 78.1 |
| MolmoWeb-8B | 35.3 | 78.2 |
| Fara-7B | 34.1 | 73.5 |
Model Specifications
LicenseMit
Last UpdatedMay 2026
Input TypeImage,Text
Output TypeText
ProviderMicrosoft
Languages1 Language