Phi-4-Reasoning-Vision-15B
Phi-4-Reasoning-Vision-15B is a broadly capable model that can be used for a wide array of vision-language tasks such as image captioning, asking questions about images, reading documents and receipts, helping with homework, interfering about changes in sequences of images, and much more. Beyond these general capabilities it excels at math and science reasoning and at understanding and grounding elements on computer and mobile screens.
Quick facts
Model providerMicrosoft
TypeChat completion, Visual question answering, Image analysis, Image classification, Image to text
LifecycleGenerally available (GA)
Input typeimage, text
Output typetext
PricingView pricing