Skip to main content

Vision

VIKTOR includes a built-in vision model that lets you analyze images directly in your app. Upload an image, send it to the model, and get a text description back — no external API key or third-party account needed. For basic text chat, see the Viktor LLM overview.

Chat Completions API only

The built-in vision model is only supported via the Chat Completions API. The Responses API is not supported for this model.

Example

The following example shows a complete app that accepts an image upload and sends it to the vision model for analysis. Note that vkt.ViktorOpenAI handles authentication automatically — no API key setup is required on your end:

import base64

import viktor as vkt
from openai import OpenAI

client = OpenAI(
base_url=vkt.ViktorOpenAI.get_base_url(version="v1"),
api_key=vkt.ViktorOpenAI.get_api_key(),
)

class Parametrization(vkt.Parametrization):
title = vkt.Text("# AI Image Analyzer")
image_file = vkt.FileField(
"Upload Image",
file_types=[".jpg", ".jpeg"],
description="Upload a JPG image to be analyzed by the AI model.",
)


class Controller(vkt.Controller):
parametrization = Parametrization

@vkt.WebView("Image Analysis")
def analyze_image(self, params, **kwargs):
if not params.image_file:
return vkt.WebResult(html="<p>Please upload an image to get started.</p>")

image_bytes = params.image_file.file.getvalue_binary()
base64_image = base64.b64encode(image_bytes).decode("utf-8")
image_url = f"data:image/jpeg;base64,{base64_image}"

messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe what is shown in this image."},
{"type": "image_url", "image_url": {"url": image_url}},
],
}
]

response = client.chat.completions.create(
model="mistral.ministral-3-14b-instruct",
messages=messages,
)

result_text = response.choices[0].message.content
html = f"""
<html><body style="font-family:sans-serif;max-width:900px;margin:auto;padding:20px">
<img src="{image_url}" style="max-width:100%;border-radius:8px">
<p style="margin-top:16px;line-height:1.6">{result_text}</p>
</body></html>
"""
return vkt.WebResult(html=html)

This is how your application should look:

Sample app showing image analysis with vision LLM

Key patterns

  • Base64 encoding — read the image as bytes, encode with base64.b64encode, and wrap in a data:image/jpeg;base64,... data URI.
  • Vision message format — the content field is a list containing a text item (your prompt) and an image_url item (the base64 data URI).
  • Model — use mistral.ministral-3-14b-instruct as the model ID.

Image extension and MIME type

VIKTOR's LLM endpoint supports different image formats. For example, to use a PNG in the previous example, keep the base64 encoding the same and change the MIME type in image_url from image/jpeg to image/png.

Use the matching MIME type for other supported formats:

  • JPG / JPEG → image/jpeg
  • PNG → image/png
  • WEBP → image/webp
image_url = f"data:image/png;base64,{base64_image}"
image_url = f"data:image/webp;base64,{base64_image}"

Make sure the FileField also allows these file types.

Limitations

  • VIKTOR's LLM endpoint only supports base64-encoded data URIs for images. HTTP URLs, such as https://example.com/image.jpg, are not supported. Images must be encoded inline.
  • Although the chat.completions endpoint supports the optional detail field in image_url, VIKTOR's LLM endpoint does not support detail levels such as "low", "high", or "original".
  • The service enforces fair-usage limits. If your app exceeds the allowed token usage, the API returns a 429 Too Many Requests response, which the openai library raises as openai.RateLimitError. Catch it and raise a UserError to show a clear message, as shown above.