LLMling-Agent supports a wide range of model types thanks to Pydantic-AI. In the simplest form, models are defined by their "identifier", which is defined as PROVIDER_NAME:MODEL_NAME (example: "openai:gpt-5-nano").
For more advanced scenarios, it is also possible to assign a more detailed model config including model settings like temperature etc.
In addition, some more experimental (meta-)Models are supported using LLMling-models.
These include models which let the user get into the role of an Agent, as well as fallback models and lot more.
agents:my_agent:model:openai:gpt-5-nano# simple model identifiermy_agent2:model:# extended model configprovider:openaimodel:gpt-5-nanotemperature:0.5
-type:augmentedmain_model:openai:gpt-5-nano# The primary model identifier.pre_prompt:null# Optional configuration for prompt preprocessing.post_prompt:null# Optional configuration for prompt postprocessing.
This model uses the Claude Agent SDK to communicate with the Claude Code CLI,
providing access to Claude with filesystem access, code execution,
and other agentic capabilities.
-type:claude_codemodel:sonnet# The Claude model to use. Supports aliases (sonnet, opus, haiku) or full names.cwd:null# Working directory for Claude Code operations.permission_mode:bypassPermissions# Permission mode for tool execution.system_prompt:null# Custom system prompt to use.max_turns:null# Maximum number of conversation turns (1-100).max_thinking_tokens:null# Maximum tokens for extended thinking.
-type:delegationselector_model:openai:gpt-5-nano# Model responsible for selecting which model to use.models:# List of available models to choose from.-openai:gpt-5-nano-anthropic:claude-sonnet-4-5selection_prompt:Choose the best model for this task# Prompt used to guide the selector model's decision.model_descriptions:null# Optional descriptions of each model for selection purposes.
-type:inputprompt_template:'👤Pleaserespondto:{prompt}'# Template for displaying the prompt to the user.show_system:true# Whether to show system messages.input_prompt:'Yourresponse:'# Text displayed when requesting input.handler:!!python/name:llmling_models.models.input_model.input_handlers.DefaultInputHandler''# Handler for processing user input.
-type:remote-inputurl:ws://localhost:8000/v1/chat/stream# WebSocket URL for connecting to the remote input service.api_key:null# Optional API key for authentication.
-type:remote-proxyurl:ws://localhost:8000/v1/completion/stream# WebSocket URL for connecting to the remote model service.api_key:null# Optional API key for authentication.
-type:stringidentifier:openai:gpt-5-nano# String identifier for the model.max_tokens:null# The maximum number of tokens to generate before stopping.temperature:null# Amount of randomness injected into the response.top_p:null# An alternative to sampling with temperature, called nucleus sampling.timeout:null# Override the client-level default timeout for a request, in seconds.parallel_tool_calls:null# Whether to allow parallel tool calls.seed:null# The random seed to use for the model, theoretically allowing for deterministic results.presence_penalty:null# Penalize new tokens based on whether they have appeared in the text so far.frequency_penalty:null# Penalize new tokens based on their existing frequency in the text so far.logit_bias:null# Modify the likelihood of specified tokens appearing in the completion.stop_sequences:null# Sequences that will cause the model to stop generating.extra_headers:null# Extra headers to send to the model.extra_body:null# Extra body to send to the model.
-type:user-selectmodels:# List of models the user can choose from.-openai:gpt-5-nano-anthropic:claude-sonnet-4-5prompt_template:'🤖Chooseamodelfor:{prompt}'# Template for displaying the choice prompt to the user.show_system:true# Whether to show system messages during selection.input_prompt:'Entermodelnumber(0-{max}):'# Text displayed when requesting model selection.handler:!!python/name:llmling_models.models.input_model.input_handlers.DefaultInputHandler''# Handler for processing user selection input.
-type:openaiidentifier:openai:gpt-5# String identifier for the model.max_tokens:null# The maximum number of tokens to generate before stopping.temperature:null# Amount of randomness injected into the response.top_p:null# An alternative to sampling with temperature, called nucleus sampling.timeout:null# Override the client-level default timeout for a request, in seconds.parallel_tool_calls:null# Whether to allow parallel tool calls.seed:null# The random seed to use for the model, theoretically allowing for deterministic results.presence_penalty:null# Penalize new tokens based on whether they have appeared in the text so far.frequency_penalty:null# Penalize new tokens based on their existing frequency in the text so far.logit_bias:null# Modify the likelihood of specified tokens appearing in the completion.stop_sequences:null# Sequences that will cause the model to stop generating.extra_headers:null# Extra headers to send to the model.extra_body:null# Extra body to send to the model.reasoning_effort:null# Constrains effort on reasoning for reasoning models.logprobs:null# Include log probabilities in the response.top_logprobs:null# Include log probabilities of the top n tokens in the response.user:null# A unique identifier representing the end-user.service_tier:null# The service tier to use for the model request.prompt_cache_key:a# Used by OpenAI to cache responses for similar requests to optimize your cache hit rates.prompt_cache_retention:in-memory# The retention policy for the prompt cache. Set to 24h to enable extended prompt caching, which keeps...
-type:anthropicidentifier:anthropic:claude-haiku-4-5# String identifier for the model.max_tokens:null# The maximum number of tokens to generate before stopping.temperature:null# Amount of randomness injected into the response.top_p:null# An alternative to sampling with temperature, called nucleus sampling.timeout:null# Override the client-level default timeout for a request, in seconds.parallel_tool_calls:null# Whether to allow parallel tool calls.seed:null# The random seed to use for the model, theoretically allowing for deterministic results.presence_penalty:null# Penalize new tokens based on whether they have appeared in the text so far.frequency_penalty:null# Penalize new tokens based on their existing frequency in the text so far.logit_bias:null# Modify the likelihood of specified tokens appearing in the completion.stop_sequences:null# Sequences that will cause the model to stop generating.extra_headers:null# Extra headers to send to the model.extra_body:null# Extra body to send to the model.metadata:null# An object describing metadata about the request.cache_tool_definitions:null# Whether to add cache_control to the last tool definition.cache_instructions:null# Whether to add cache_control to the last system prompt block.cache_messages:null# Convenience setting to enable caching for the last user message.
-type:geminiidentifier:gemini:aqa# String identifier for the model.max_tokens:null# The maximum number of tokens to generate before stopping.temperature:null# Amount of randomness injected into the response.top_p:null# An alternative to sampling with temperature, called nucleus sampling.timeout:null# Override the client-level default timeout for a request, in seconds.parallel_tool_calls:null# Whether to allow parallel tool calls.seed:null# The random seed to use for the model, theoretically allowing for deterministic results.presence_penalty:null# Penalize new tokens based on whether they have appeared in the text so far.frequency_penalty:null# Penalize new tokens based on their existing frequency in the text so far.logit_bias:null# Modify the likelihood of specified tokens appearing in the completion.stop_sequences:null# Sequences that will cause the model to stop generating.extra_headers:null# Extra headers to send to the model.extra_body:null# Extra body to send to the model.safety_settings:null# Safety settings options for Gemini model request.thinking_config:null# Thinking features configuration.labels:null# User-defined metadata to break down billed charges.