Integrating with LLMs

New in v14.29.0

Viktor LLM is a built-in large language model service — no external API key or third-party account required. It uses an OpenAI-compatible API, so you can use the standard openai Python client directly in your app.

Why use Viktor LLM

No setup — no API key, no third-party account, no environment variables to manage
OpenAI-compatible — use the standard openai Python client
Works in App Builder — the openai package is pre-installed.

How chat completions work

LLMs are stateless — the model has no memory of previous messages. Every request must include the complete conversation history from the beginning of the session. The model reads the full history, generates the next reply, and discards all state. The next message will need to include that reply in the history too.

Messages are passed as an ordered list, each with a role and content:

Role	Who sends it	Purpose
`system`	You (developer)	Sets the model's persona, constraints, and task context. Sent first, before any user messages.
`user`	The end user	The user's input in each turn.
`assistant`	The model	The model's reply from previous turns.

VIKTOR's ChatConversation (from params.chat) tracks only the user and assistant turns — it does not store system messages. Call conversation.get_messages() to retrieve the full conversation history, then prepend your system message before passing to the API.

Basic usage

The following example shows a complete chat integration with a system prompt and streaming. The ViktorOpenAI helper provides the base URL and API key needed to configure the client:

import openai
import viktor as vkt
from openai import OpenAI  # add to requirements.txt if not using App Builder

client = OpenAI(
    base_url=vkt.ViktorOpenAI.get_base_url(version="v1"),
    api_key=vkt.ViktorOpenAI.get_api_key(),
)


class Parametrization(vkt.Parametrization):
    chat = vkt.Chat("Chat", method="call_llm")


class Controller(vkt.Controller):
    parametrization = Parametrization

    def call_llm(self, params, **kwargs):
        conversation = params.chat

        if not conversation:
            return None

        messages = [
            {"role": "system", "content": "You are a helpful assistant."},
            *conversation.get_messages(),
        ]

        try:
            stream = client.chat.completions.create(
                model="openai.gpt-oss-120b",
                messages=messages,
                stream=True,
            )
        except openai.RateLimitError:
            raise vkt.UserError("Fair usage limit reached. Please try again later.")

        text_stream = (
            chunk.choices[0].delta.content
            for chunk in stream
            if chunk.choices[0].delta.content is not None
        )

        return vkt.ChatResult(conversation, text_stream)

note

The VIKTOR SDK supports only the user and assistant roles in ChatConversation. Prepend system messages manually, as shown above.

Fair usage

The service enforces fair-usage limits. If your app exceeds the allowed token usage, the API returns a 429 Too Many Requests response, which the openai library raises as openai.RateLimitError. Catch it and raise a UserError to surface a clear message, as shown above.

Next step

To give the LLM access to your app's functions — running calculations, fetching data, querying models — see Tool Use.

Integrating with LLMs

Why use Viktor LLM​

How chat completions work​

Basic usage​

Next step​

Why use Viktor LLM

How chat completions work

Basic usage

Next step