Provider models¶

While LLMling-agent supports both LiteLLM as well as pydantic-ai, choosing pydantic-ai as the provider will give you a lot more features and flexibility.

In addition to the regular pydantic-ai models, LLMling-agent supports all model types from llmling-models through YAML configuration. Each model is identified by its type field. These models often are some kind of "meta-models", allowing model selection patterns as well as human-in-the-loop interactions.

Basic Configuration¶

agents:
  my_agent:
    model:
      type: string           # Basic string model identifier
      identifier: gpt-4      # Model name

Available Model Types¶

Fallback Model¶

Tries models in sequence until one succeeds. Perfect for handling rate limits.

agents:
  resilient_agent:
    model:
      type: fallback
      models:
        - openai:gpt-4          # First choice
        - openai:gpt-3.5-turbo  # Fallback option
        - anthropic:claude-2    # Last resort

Cost-Optimized Model¶

Selects models based on budget constraints:

agents:
  budget_agent:
    model:
      type: cost-optimized
      models:
        - openai:gpt-4
        - openai:gpt-3.5-turbo
      max_input_cost: 0.1     # Maximum USD per request
      strategy: cheapest_possible  # Or best_within_budget

Token-Optimized Model¶

Selects models based on context length:

agents:
  context_aware_agent:
    model:
      type: token-optimized
      models:
        - openai:gpt-4-32k     # 32k context
        - openai:gpt-4         # 8k context
        - openai:gpt-3.5-turbo # 4k context
      strategy: efficient  # Or maximum_context

Delegation Model¶

Uses a selector model to choose appropriate models for tasks:

agents:
  smart_delegator:
    model:
      type: delegation
      selector_model: openai:gpt-4-turbo
      models:
        - openai:gpt-4
        - openai:gpt-3.5-turbo
      selection_prompt: |
        Pick gpt-4 for complex tasks,
        gpt-3.5-turbo for simple queries.

Input Model¶

For human-in-the-loop operations:

agents:
  human_assisted:
    model:
      type: input
      prompt_template: "🤖 Question: {prompt}"
      show_system: true
      input_prompt: "Your answer: "

Remote Input Model¶

Connects to a remote human operator:

agents:
  remote_operator:
    model:
      type: remote-input
      url: ws://operator:8000/v1/chat/stream
      protocol: websocket  # or rest
      api_key: your-api-key

LLM Library Integration¶

Use models from the LLM library:

agents:
  llm_based:
    model:
      type: llm
      model_name: gpt-4o-mini

AISuite Integration¶

Use models from AISuite:

agents:
  aisuite_agent:
    model:
      type: aisuite
      model: anthropic:claude-3-opus-20240229
      config:
        anthropic:
          api_key: your-api-key

Augmented Model¶

Enhance prompts with pre/post processing:

agents:
  enhanced_agent:
    model:
      type: augmented
      main_model: openai:gpt-4
      pre_prompt:
        text: "Expand this question: {input}"
        model: openai:gpt-3.5-turbo
      post_prompt:
        text: "Summarize response: {output}"
        model: openai:gpt-3.5-turbo

Common Settings¶

Most models support these common settings:

agents:
  example_agent:
    model:
      type: any_model_type
      # Common optional settings:
      name: optional_name          # Custom name for the model
      description: "Description"   # Model description

Model Settings¶

Both providers (PydanticAI and LiteLLM) support common model settings to fine-tune the LLM behavior:

agents:
  tuned_agent:
    model_settings:
      max_tokens: 2000          # Maximum tokens to generate
      temperature: 0.7          # Randomness (0.0 - 2.0)
      top_p: 0.9               # Nucleus sampling threshold
      timeout: 30.0            # Request timeout in seconds
      parallel_tool_calls: true # Allow parallel tool execution
      seed: 42                 # Random seed for reproducibility
      presence_penalty: 0.5     # (-2.0 to 2.0) Penalize token reuse
      frequency_penalty: 0.3    # (-2.0 to 2.0) Penalize token frequency
      logit_bias:              # Modify token likelihood
        "1234": 100  # Increase likelihood
        "5678": -100 # Decrease likelihood

Example with Provider and Model Settings¶

agents:
  advanced_agent:
    provider:
      type: pydantic_ai
      name: "Advanced GPT-4"
      model: openai:gpt-4
      end_strategy: early
      validation_enabled: true
      allow_text_fallback: true
      model_settings:
        temperature: 0.8
        max_tokens: 1000
        presence_penalty: 0.2
        timeout: 60.0

  cautious_agent:
    provider:
      type: litellm
      name: "Careful Claude"
      model: anthropic:claude-2
      retries: 3
      model_settings:
        temperature: 0.3  # More deterministic
        max_tokens: 2000
        timeout: 120.0    # Longer timeout

Provider-Specific Behavior¶

PydanticAI: Uses settings directly with the underlying model
LiteLLM: Maps settings to provider-specific parameters (e.g., 'timeout' becomes 'request_timeout')

All settings are optional and providers will use their defaults if not specified.

Setting pydantic-ai models by identifier¶

LLMling-agent also extends pydantic-ai functionality by allowing to define more models via simple string identifiers. These providers are

OpenRouter (openrouter:provider/model-name, requires OPENROUTER_API_KEY env var)
Grok (X) (grok:grok-2-1212, requires X_AI_API_KEY env var)
DeepSeek (deepsek:deepsek-chat, requires DEEPSEEK_API_KEY env var)

For detailed model documentation and features, see the llmling-models repository.