Overview¶

LLMling-Agent supports a wide range of model types thanks to Pydantic-AI. In the simplest form, models are defined by their "identifier", which is defined as PROVIDER_NAME:MODEL_NAME (example: "openai:gpt-5-nano").

For more advanced scenarios, it is also possible to assign a more detailed model config including model settings like temperature etc.

In addition, some more experimental (meta-)Models are supported using LLMling-models.

These include models which let the user get into the role of an Agent, as well as fallback models and lot more.

agents:
  my_agent:
    model: openai:gpt-5-nano  # simple model identifier
  my_agent2:
    model:  # extended model config
      provider: openai
      model: gpt-5-nano
      temperature: 0.5

Configuration Reference¶

Augmented model¶

Configuration for model with pre/post prompt processing.

Augmented model (YAML)
- type: augmented
  main_model: openai:gpt-5-nano  # The primary model identifier.
  pre_prompt: null  # Optional configuration for prompt preprocessing.
  post_prompt: null  # Optional configuration for prompt postprocessing.

Claude Code model¶

Configuration for Claude Code model.

This model uses the Claude Agent SDK to communicate with the Claude Code CLI, providing access to Claude with filesystem access, code execution, and other agentic capabilities.

Example:

model:
  type: claude_code
  model: sonnet
  permission_mode: bypassPermissions
  cwd: /path/to/project

Claude Code model (YAML)
- type: claude_code
  model: sonnet  # The Claude model to use. Supports aliases (sonnet, opus, haiku) or full names.
  cwd: null  # Working directory for Claude Code operations.
  permission_mode: bypassPermissions  # Permission mode for tool execution.
  system_prompt: null  # Custom system prompt to use.
  max_turns: null  # Maximum number of conversation turns (1-100).
  max_thinking_tokens: null  # Maximum tokens for extended thinking.

Delegation model¶

Configuration for delegation-based model selection.

Delegation model (YAML)
- type: delegation
  selector_model: openai:gpt-5-nano  # Model responsible for selecting which model to use.
  models:  # List of available models to choose from.
  - openai:gpt-5-nano
  - anthropic:claude-sonnet-4-5
  selection_prompt: Choose the best model for this task  # Prompt used to guide the selector model's decision.
  model_descriptions: null  # Optional descriptions of each model for selection purposes.

Fallback model¶

Configuration for fallback strategy.

Fallback model (YAML)
- type: fallback
  models:  # Ordered list of models to try in sequence.
  - openai:gpt-5-nano
  - anthropic:claude-sonnet-4-5

Function model¶

Configuration for function-based model references.

Function model (YAML)
- type: function
  function: a  # Function identifier for the model.

Import model¶

Configuration for importing external models.

Import model (YAML)
- type: import
  model: my_models.CustomModel  # Import path to the model class or function.
  kw_args: {}  # Keyword arguments to pass to the imported model.

Input model¶

Configuration for human input model.

Input model (YAML)
- type: input
  prompt_template: '👤 Please respond to: {prompt}'  # Template for displaying the prompt to the user.
  show_system: true  # Whether to show system messages.
  input_prompt: 'Your response: '  # Text displayed when requesting input.
  handler: !!python/name:llmling_models.models.input_model.input_handlers.DefaultInputHandler ''  # Handler for processing user input.

Remote input model¶

Configuration for remote human input.

Remote input model (YAML)
- type: remote-input
  url: ws://localhost:8000/v1/chat/stream  # WebSocket URL for connecting to the remote input service.
  api_key: null  # Optional API key for authentication.

Remote proxy model¶

Configuration for remote model proxy.

Remote proxy model (YAML)
- type: remote-proxy
  url: ws://localhost:8000/v1/completion/stream  # WebSocket URL for connecting to the remote model service.
  api_key: null  # Optional API key for authentication.

String model¶

Configuration for string-based model references.

String model (YAML)
- type: string
  identifier: openai:gpt-5-nano  # String identifier for the model.
  max_tokens: null  # The maximum number of tokens to generate before stopping.
  temperature: null  # Amount of randomness injected into the response.
  top_p: null  # An alternative to sampling with temperature, called nucleus sampling.
  timeout: null  # Override the client-level default timeout for a request, in seconds.
  parallel_tool_calls: null  # Whether to allow parallel tool calls.
  seed: null  # The random seed to use for the model, theoretically allowing for deterministic results.
  presence_penalty: null  # Penalize new tokens based on whether they have appeared in the text so far.
  frequency_penalty: null  # Penalize new tokens based on their existing frequency in the text so far.
  logit_bias: null  # Modify the likelihood of specified tokens appearing in the completion.
  stop_sequences: null  # Sequences that will cause the model to stop generating.
  extra_headers: null  # Extra headers to send to the model.
  extra_body: null  # Extra body to send to the model.

Test model¶

Configuration for test models.

Test model (YAML)
- type: test
  custom_output_text: null  # Optional custom text to return from the test model.
  call_tools: all  # Tools that can be called by the test model.

User select model¶

Configuration for interactive model selection.

User select model (YAML)
- type: user-select
  models:  # List of models the user can choose from.
  - openai:gpt-5-nano
  - anthropic:claude-sonnet-4-5
  prompt_template: '🤖 Choose a model for: {prompt}'  # Template for displaying the choice prompt to the user.
  show_system: true  # Whether to show system messages during selection.
  input_prompt: 'Enter model number (0-{max}): '  # Text displayed when requesting model selection.
  handler: !!python/name:llmling_models.models.input_model.input_handlers.DefaultInputHandler ''  # Handler for processing user selection input.

OpenAI model¶

Configuration for OpenAI models.

OpenAI model (YAML)
- type: openai
  identifier: openai:gpt-5  # String identifier for the model.
  max_tokens: null  # The maximum number of tokens to generate before stopping.
  temperature: null  # Amount of randomness injected into the response.
  top_p: null  # An alternative to sampling with temperature, called nucleus sampling.
  timeout: null  # Override the client-level default timeout for a request, in seconds.
  parallel_tool_calls: null  # Whether to allow parallel tool calls.
  seed: null  # The random seed to use for the model, theoretically allowing for deterministic results.
  presence_penalty: null  # Penalize new tokens based on whether they have appeared in the text so far.
  frequency_penalty: null  # Penalize new tokens based on their existing frequency in the text so far.
  logit_bias: null  # Modify the likelihood of specified tokens appearing in the completion.
  stop_sequences: null  # Sequences that will cause the model to stop generating.
  extra_headers: null  # Extra headers to send to the model.
  extra_body: null  # Extra body to send to the model.
  reasoning_effort: null  # Constrains effort on reasoning for reasoning models.
  logprobs: null  # Include log probabilities in the response.
  top_logprobs: null  # Include log probabilities of the top n tokens in the response.
  user: null  # A unique identifier representing the end-user.
  service_tier: null  # The service tier to use for the model request.
  prompt_cache_key: a  # Used by OpenAI to cache responses for similar requests to optimize your cache hit rates.
  prompt_cache_retention: in-memory  # The retention policy for the prompt cache. Set to 24h to enable extended prompt caching, which keeps...

Anthropic model¶

Configuration for Anthropic models.

Anthropic model (YAML)
- type: anthropic
  identifier: anthropic:claude-haiku-4-5  # String identifier for the model.
  max_tokens: null  # The maximum number of tokens to generate before stopping.
  temperature: null  # Amount of randomness injected into the response.
  top_p: null  # An alternative to sampling with temperature, called nucleus sampling.
  timeout: null  # Override the client-level default timeout for a request, in seconds.
  parallel_tool_calls: null  # Whether to allow parallel tool calls.
  seed: null  # The random seed to use for the model, theoretically allowing for deterministic results.
  presence_penalty: null  # Penalize new tokens based on whether they have appeared in the text so far.
  frequency_penalty: null  # Penalize new tokens based on their existing frequency in the text so far.
  logit_bias: null  # Modify the likelihood of specified tokens appearing in the completion.
  stop_sequences: null  # Sequences that will cause the model to stop generating.
  extra_headers: null  # Extra headers to send to the model.
  extra_body: null  # Extra body to send to the model.
  metadata: null  # An object describing metadata about the request.
  cache_tool_definitions: null  # Whether to add cache_control to the last tool definition.
  cache_instructions: null  # Whether to add cache_control to the last system prompt block.
  cache_messages: null  # Convenience setting to enable caching for the last user message.

Gemini model¶

Configuration for Gemini models.

Gemini model (YAML)
- type: gemini
  identifier: gemini:aqa  # String identifier for the model.
  max_tokens: null  # The maximum number of tokens to generate before stopping.
  temperature: null  # Amount of randomness injected into the response.
  top_p: null  # An alternative to sampling with temperature, called nucleus sampling.
  timeout: null  # Override the client-level default timeout for a request, in seconds.
  parallel_tool_calls: null  # Whether to allow parallel tool calls.
  seed: null  # The random seed to use for the model, theoretically allowing for deterministic results.
  presence_penalty: null  # Penalize new tokens based on whether they have appeared in the text so far.
  frequency_penalty: null  # Penalize new tokens based on their existing frequency in the text so far.
  logit_bias: null  # Modify the likelihood of specified tokens appearing in the completion.
  stop_sequences: null  # Sequences that will cause the model to stop generating.
  extra_headers: null  # Extra headers to send to the model.
  extra_body: null  # Extra body to send to the model.
  safety_settings: null  # Safety settings options for Gemini model request.
  thinking_config: null  # Thinking features configuration.
  labels: null  # User-defined metadata to break down billed charges.