microsoft-magma-8b
Version: 7
microsoft/Magma-8B
powered by Hugging Face Inference Toolkit
Send Request
You can use cURL or any REST Client to send a request to the AzureML endpoint with your AzureML token.curl <AZUREML_ENDPOINT_URL>/v1/chat/completions \
-X POST \
-d '{"model":"microsoft/Magma-8B","messages":[{"role":"system","content":"You are agent that can see, talk and act."},{"role":"user","content":[{"type":"text","text":"What is in this image?"},{"type":"image_url","image_url":{"url":"https://assets-c4akfrf5b4d3f4b7.z01.azurefd.net/assets/2024/04/BMDataViz_661fb89f3845e.png"}}]}],"temperature":1.0,"stream":true}' \
-H "Authorization: Bearer <AZUREML_TOKEN>" \
-H "Content-Type: application/json"
curl <AZUREML_ENDPOINT_URL>/v1/chat/completions \
-X POST \
-d '{"model":"microsoft/Magma-8B","messages":[{"role":"user","content":[{"type":"text","text":"What is the next action token for the robot in the image to move its end-effector straight up in vertical position?\nProvide the action tokens for the robot."},{"type":"image_url","image_url":{"url":"https://rhinotoolhouse.com/wp-content/uploads/60107346_m_normal_none.jpg"}}]}],"temperature":0.1,"stream":false}' \
-H "Authorization: Bearer <AZUREML_TOKEN>" \
-H "Content-Type: application/json"
/v1/action-tokens
endpoint that will convert the generated output tokens into extracted actions for the given robot (either bridge_orig
or google_robot
, or with custom normalization stats instead via normalization_stats
) as it follows:
curl <AZUREML_ENDPOINT_URL>/v1/action-tokens \
-X POST \
-d '{"text":"立刻ẳn сирiв衝ційнаiв","robot_type":"bridge_orig"}' \
-H "Authorization: Bearer <AZUREML_TOKEN>" \
-H "Content-Type: application/json"
I/O Example
Visual Language Models (VLMs)
The following input payload:{
"model": "microsoft/Magma-8B",
"messages": [
{"role": "system", "content": "You are agent that can see, talk and act."},
{"role": "user", "content": [
{"type": "text", "text":"What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://assets-c4akfrf5b4d3f4b7.z01.azurefd.net/assets/2024/04/BMDataViz_661fb89f3845e.png"}}
]}
],
"temperature": 1.0,
"stream": false
}
{
"id": "chatcmpl-4f37b5c82d",
"object": "chat.completion",
"created": 1747162017,
"model": "microsoft/Magma-8B",
"choices": [
{
"index": 0,
"message": {
"content": "The image presents a bar graph titled "Percentage of respondents who agree". The graph is divided into four sections, each representing a different level of agreement with a statement. The statements are "Having clear and pre-defined goals for meetings", "Knowing where to find the information I need for a meeting", "Understanding my exact role and responsibilities", and "Having more focus time to prepare for meetings".
The bars for the first two statements are green, indicating a high level of agreement. The bars for the third and fourth statements are yellow, suggesting a lower level of agreement. The bars for the first and second statements are slightly taller than those for the third and fourth statements, indicating that more respondents agreed with the first two statements than the third and fourth.
The graph does not provide any additional text or context beyond the title and the statements. The relative positions of the bars suggest a comparison between the levels of agreement for the different statements. However, without additional information, it's difficult to determine the exact significance of these results.",
"refusal": null,
"role": "assistant",
"annotations": []
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 37,
"completion_tokens": 206,
"reasoning_tokens": 0,
"total_tokens": 243,
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"audio_tokens": 0,
"reasoning_tokens": 0,
"rejected_prediction_tokens": 0
},
"prompt_tokens_details": {
"audio_tokens": 0,
"cached_tokens": 0
}
},
"service_tier": "default",
"system_fingerprint": "43d325a5-34c9-4bd4-a2e9-e2af6330d1e7"
}
Visual-Language-Action (VLA)
First you send a request as it follows:curl <AZUREML_ENDPOINT_URL>/v1/chat/completions \
-X POST \
-d '{"model":"microsoft/Magma-8B","messages":[{"role":"user","content":[{"type":"text","text":"What is the next action token for the robot in the image to move its end-effector straight up in vertical position?\nProvide the action tokens for the robot."},{"type":"image_url","image_url":{"url":"https://rhinotoolhouse.com/wp-content/uploads/60107346_m_normal_none.jpg"}}]}],"temperature":0.1,"stream":false}' \
-H "Authorization: Bearer <AZUREML_TOKEN>" \
-H "Content-Type: application/json"
{
"id": "chatcmpl-e3317d6d9b",
"object": "chat.completion",
"created": 1750069687,
"model": "microsoft/Magma-8B",
"choices": [
{
"index": 0,
"message": {
"content": "立刻ẳn сирiв衝ційнаiв",
"refusal": null,
"role": "assistant",
"annotations": []
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 47,
"completion_tokens": 8,
"reasoning_tokens": 0,
"total_tokens": 55,
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"audio_tokens": 0,
"reasoning_tokens": 0,
"rejected_prediction_tokens": 0
},
"prompt_tokens_details": {
"audio_tokens": 0,
"cached_tokens": 0
}
},
"service_tier": "default",
"system_fingerprint": "38b490d1-fadc-4db9-b423-e3e998452fbf"
}
curl <AZUREML_ENDPOINT_URL>/v1/action-tokens \
-X POST \
-d '{"text":"立刻ẳn сирiв衝ційнаiв","robot_type":"bridge_orig"}' \
-H "Authorization: Bearer <AZUREML_TOKEN>" \
-H "Content-Type: application/json"
{
"x": 0.006277765673749571,
"y": -0.0017191617944661772,
"z": -0.004658273500116439,
"roll": 0.0004997134208677839,
"pitch": 0.00591804922503579,
"yaw": -0.004902105561658515,
"gripper": 0
}
normalization_stats
via the normalization_stats
parameter instead of relying on the default normalization stats based on the robot_type
parameter, and the provided normalization_stats
should contain: mask
(a list of bool) and both q01
and q99
a list of float, all the lists with exactly 7 items (standing for the respective action tokens). Model Specifications
LicenseMit
Last UpdatedJune 2025
PublisherHuggingFace