Skip to content

converters

Class info

Classes

Name Children Inherits
BaseConverterConfig
llmling_agent.models.converters
Base configuration for document converters.
ConversionConfig
llmling_agent.models.converters
Global conversion configuration.
    DoclingConverterConfig
    llmling_agent.models.converters
    Configuration for docling-based converter.
      DocumentConverter
      llmling_agent_converters.base
      Base class for document converters.
        GoogleSpeechConfig
        llmling_agent.models.converters
        Configuration for Google Cloud Speech-to-Text.
          LocalWhisperConfig
          llmling_agent.models.converters
          Configuration for local Whisper model.
            MarkItDownConfig
            llmling_agent.models.converters
            Configuration for MarkItDown-based converter.
              PlainConverterConfig
              llmling_agent.models.converters
              Configuration for plain text fallback converter.
                WhisperAPIConfig
                llmling_agent.models.converters
                Configuration for OpenAI's Whisper API.
                  YouTubeConverterConfig
                  llmling_agent.models.converters
                  Configuration for YouTube transcript converter.

                    🛈 DocStrings

                    BaseConverterConfig

                    Bases: BaseModel

                    Base configuration for document converters.

                    Source code in src/llmling_agent/models/converters.py
                    11
                    12
                    13
                    14
                    15
                    16
                    17
                    18
                    19
                    20
                    21
                    22
                    23
                    24
                    class BaseConverterConfig(BaseModel):
                        """Base configuration for document converters."""
                    
                        type: str = Field(init=False)
                        """Type discriminator for converter configs."""
                    
                        enabled: bool = True
                        """Whether this converter is currently active."""
                    
                        model_config = ConfigDict(frozen=True, use_attribute_docstrings=True, extra="forbid")
                    
                        def get_converter(self) -> DocumentConverter:
                            """Get the converter instance."""
                            raise NotImplementedError
                    

                    enabled class-attribute instance-attribute

                    enabled: bool = True
                    

                    Whether this converter is currently active.

                    type class-attribute instance-attribute

                    type: str = Field(init=False)
                    

                    Type discriminator for converter configs.

                    get_converter

                    get_converter() -> DocumentConverter
                    

                    Get the converter instance.

                    Source code in src/llmling_agent/models/converters.py
                    22
                    23
                    24
                    def get_converter(self) -> DocumentConverter:
                        """Get the converter instance."""
                        raise NotImplementedError
                    

                    ConversionConfig

                    Bases: BaseModel

                    Global conversion configuration.

                    Source code in src/llmling_agent/models/converters.py
                    190
                    191
                    192
                    193
                    194
                    195
                    196
                    197
                    198
                    199
                    200
                    201
                    202
                    class ConversionConfig(BaseModel):
                        """Global conversion configuration."""
                    
                        providers: list[ConverterConfig] | None = None
                        """List of configured converter providers."""
                    
                        default_provider: str | None = None
                        """Name of default provider for conversions."""
                    
                        max_size: int | None = None
                        """Global size limit for all converters."""
                    
                        model_config = ConfigDict(frozen=True, use_attribute_docstrings=True, extra="forbid")
                    

                    default_provider class-attribute instance-attribute

                    default_provider: str | None = None
                    

                    Name of default provider for conversions.

                    max_size class-attribute instance-attribute

                    max_size: int | None = None
                    

                    Global size limit for all converters.

                    providers class-attribute instance-attribute

                    providers: list[ConverterConfig] | None = None
                    

                    List of configured converter providers.

                    DoclingConverterConfig

                    Bases: BaseConverterConfig

                    Configuration for docling-based converter.

                    Source code in src/llmling_agent/models/converters.py
                    27
                    28
                    29
                    30
                    31
                    32
                    33
                    34
                    35
                    36
                    37
                    38
                    39
                    40
                    class DoclingConverterConfig(BaseConverterConfig):
                        """Configuration for docling-based converter."""
                    
                        type: Literal["docling"] = Field("docling", init=False)
                        """Type discriminator for docling converter."""
                    
                        max_size: int | None = None
                        """Optional size limit in bytes."""
                    
                        def get_converter(self) -> DocumentConverter:
                            """Get the converter instance."""
                            from llmling_agent_converters.docling import DoclingConverter
                    
                            return DoclingConverter(self)
                    

                    max_size class-attribute instance-attribute

                    max_size: int | None = None
                    

                    Optional size limit in bytes.

                    type class-attribute instance-attribute

                    type: Literal['docling'] = Field('docling', init=False)
                    

                    Type discriminator for docling converter.

                    get_converter

                    get_converter() -> DocumentConverter
                    

                    Get the converter instance.

                    Source code in src/llmling_agent/models/converters.py
                    36
                    37
                    38
                    39
                    40
                    def get_converter(self) -> DocumentConverter:
                        """Get the converter instance."""
                        from llmling_agent_converters.docling import DoclingConverter
                    
                        return DoclingConverter(self)
                    

                    GoogleSpeechConfig

                    Bases: BaseConverterConfig

                    Configuration for Google Cloud Speech-to-Text.

                    Source code in src/llmling_agent/models/converters.py
                    140
                    141
                    142
                    143
                    144
                    145
                    146
                    147
                    148
                    149
                    150
                    151
                    152
                    153
                    154
                    155
                    156
                    157
                    158
                    159
                    class GoogleSpeechConfig(BaseConverterConfig):
                        """Configuration for Google Cloud Speech-to-Text."""
                    
                        type: Literal["google_speech"] = Field("google_speech", init=False)
                        """Type discriminator for converter config."""
                    
                        language: str = "en-US"
                        """Language code for transcription."""
                    
                        model: str = "default"
                        """Speech model to use."""
                    
                        encoding: Literal["LINEAR16", "FLAC", "MP3"] = "LINEAR16"
                        """Audio encoding format."""
                    
                        def get_converter(self) -> DocumentConverter:
                            """Get the converter instance."""
                            from llmling_agent_converters.google_speech import GoogleSpeechConverter
                    
                            return GoogleSpeechConverter(self)
                    

                    encoding class-attribute instance-attribute

                    encoding: Literal['LINEAR16', 'FLAC', 'MP3'] = 'LINEAR16'
                    

                    Audio encoding format.

                    language class-attribute instance-attribute

                    language: str = 'en-US'
                    

                    Language code for transcription.

                    model class-attribute instance-attribute

                    model: str = 'default'
                    

                    Speech model to use.

                    type class-attribute instance-attribute

                    type: Literal['google_speech'] = Field('google_speech', init=False)
                    

                    Type discriminator for converter config.

                    get_converter

                    get_converter() -> DocumentConverter
                    

                    Get the converter instance.

                    Source code in src/llmling_agent/models/converters.py
                    155
                    156
                    157
                    158
                    159
                    def get_converter(self) -> DocumentConverter:
                        """Get the converter instance."""
                        from llmling_agent_converters.google_speech import GoogleSpeechConverter
                    
                        return GoogleSpeechConverter(self)
                    

                    LocalWhisperConfig

                    Bases: BaseConverterConfig

                    Configuration for local Whisper model.

                    Source code in src/llmling_agent/models/converters.py
                     93
                     94
                     95
                     96
                     97
                     98
                     99
                    100
                    101
                    102
                    103
                    104
                    105
                    106
                    107
                    108
                    109
                    110
                    111
                    112
                    113
                    114
                    115
                    class LocalWhisperConfig(BaseConverterConfig):
                        """Configuration for local Whisper model."""
                    
                        type: Literal["local_whisper"] = Field("local_whisper", init=False)
                        """Type discriminator for converter config."""
                    
                        model: str | None = None
                        """Optional model name."""
                    
                        model_size: Literal["tiny", "base", "small", "medium", "large"] = "base"
                        """Size of the Whisper model to use."""
                    
                        device: Literal["cpu", "cuda"] | None = None
                        """Device to run model on (None for auto-select)."""
                    
                        compute_type: Literal["float32", "float16"] = "float16"
                        """Compute precision to use."""
                    
                        def get_converter(self) -> DocumentConverter:
                            """Get the converter instance."""
                            from llmling_agent_converters.local_whisper import LocalWhisperConverter
                    
                            return LocalWhisperConverter(self)
                    

                    compute_type class-attribute instance-attribute

                    compute_type: Literal['float32', 'float16'] = 'float16'
                    

                    Compute precision to use.

                    device class-attribute instance-attribute

                    device: Literal['cpu', 'cuda'] | None = None
                    

                    Device to run model on (None for auto-select).

                    model class-attribute instance-attribute

                    model: str | None = None
                    

                    Optional model name.

                    model_size class-attribute instance-attribute

                    model_size: Literal['tiny', 'base', 'small', 'medium', 'large'] = 'base'
                    

                    Size of the Whisper model to use.

                    type class-attribute instance-attribute

                    type: Literal['local_whisper'] = Field('local_whisper', init=False)
                    

                    Type discriminator for converter config.

                    get_converter

                    get_converter() -> DocumentConverter
                    

                    Get the converter instance.

                    Source code in src/llmling_agent/models/converters.py
                    111
                    112
                    113
                    114
                    115
                    def get_converter(self) -> DocumentConverter:
                        """Get the converter instance."""
                        from llmling_agent_converters.local_whisper import LocalWhisperConverter
                    
                        return LocalWhisperConverter(self)
                    

                    MarkItDownConfig

                    Bases: BaseConverterConfig

                    Configuration for MarkItDown-based converter.

                    Source code in src/llmling_agent/models/converters.py
                    43
                    44
                    45
                    46
                    47
                    48
                    49
                    50
                    51
                    52
                    53
                    54
                    55
                    56
                    class MarkItDownConfig(BaseConverterConfig):
                        """Configuration for MarkItDown-based converter."""
                    
                        type: Literal["markitdown"] = Field("markitdown", init=False)
                        """Type discriminator for MarkItDown converter."""
                    
                        max_size: int | None = None
                        """Optional size limit in bytes."""
                    
                        def get_converter(self) -> DocumentConverter:
                            """Get the converter instance."""
                            from llmling_agent_converters.markitdown_converter import MarkItDownConverter
                    
                            return MarkItDownConverter(self)
                    

                    max_size class-attribute instance-attribute

                    max_size: int | None = None
                    

                    Optional size limit in bytes.

                    type class-attribute instance-attribute

                    type: Literal['markitdown'] = Field('markitdown', init=False)
                    

                    Type discriminator for MarkItDown converter.

                    get_converter

                    get_converter() -> DocumentConverter
                    

                    Get the converter instance.

                    Source code in src/llmling_agent/models/converters.py
                    52
                    53
                    54
                    55
                    56
                    def get_converter(self) -> DocumentConverter:
                        """Get the converter instance."""
                        from llmling_agent_converters.markitdown_converter import MarkItDownConverter
                    
                        return MarkItDownConverter(self)
                    

                    PlainConverterConfig

                    Bases: BaseConverterConfig

                    Configuration for plain text fallback converter.

                    Source code in src/llmling_agent/models/converters.py
                    162
                    163
                    164
                    165
                    166
                    167
                    168
                    169
                    170
                    171
                    172
                    173
                    174
                    175
                    class PlainConverterConfig(BaseConverterConfig):
                        """Configuration for plain text fallback converter."""
                    
                        type: Literal["plain"] = Field("plain", init=False)
                        """Type discriminator for plain text converter."""
                    
                        force: bool = False
                        """Whether to attempt converting any file type."""
                    
                        def get_converter(self) -> DocumentConverter:
                            """Get the converter instance."""
                            from llmling_agent_converters.plain_converter import PlainConverter
                    
                            return PlainConverter(self)
                    

                    force class-attribute instance-attribute

                    force: bool = False
                    

                    Whether to attempt converting any file type.

                    type class-attribute instance-attribute

                    type: Literal['plain'] = Field('plain', init=False)
                    

                    Type discriminator for plain text converter.

                    get_converter

                    get_converter() -> DocumentConverter
                    

                    Get the converter instance.

                    Source code in src/llmling_agent/models/converters.py
                    171
                    172
                    173
                    174
                    175
                    def get_converter(self) -> DocumentConverter:
                        """Get the converter instance."""
                        from llmling_agent_converters.plain_converter import PlainConverter
                    
                        return PlainConverter(self)
                    

                    WhisperAPIConfig

                    Bases: BaseConverterConfig

                    Configuration for OpenAI's Whisper API.

                    Source code in src/llmling_agent/models/converters.py
                    118
                    119
                    120
                    121
                    122
                    123
                    124
                    125
                    126
                    127
                    128
                    129
                    130
                    131
                    132
                    133
                    134
                    135
                    136
                    137
                    class WhisperAPIConfig(BaseConverterConfig):
                        """Configuration for OpenAI's Whisper API."""
                    
                        type: Literal["whisper_api"] = Field("whisper_api", init=False)
                        """Type discriminator for converter config."""
                    
                        model: str | None = None
                        """Optional model name."""
                    
                        api_key: SecretStr | None = None
                        """OpenAI API key."""
                    
                        language: str | None = None
                        """Optional language code."""
                    
                        def get_converter(self) -> DocumentConverter:
                            """Get the converter instance."""
                            from llmling_agent_converters.whisper_api import WhisperAPIConverter
                    
                            return WhisperAPIConverter(self)
                    

                    api_key class-attribute instance-attribute

                    api_key: SecretStr | None = None
                    

                    OpenAI API key.

                    language class-attribute instance-attribute

                    language: str | None = None
                    

                    Optional language code.

                    model class-attribute instance-attribute

                    model: str | None = None
                    

                    Optional model name.

                    type class-attribute instance-attribute

                    type: Literal['whisper_api'] = Field('whisper_api', init=False)
                    

                    Type discriminator for converter config.

                    get_converter

                    get_converter() -> DocumentConverter
                    

                    Get the converter instance.

                    Source code in src/llmling_agent/models/converters.py
                    133
                    134
                    135
                    136
                    137
                    def get_converter(self) -> DocumentConverter:
                        """Get the converter instance."""
                        from llmling_agent_converters.whisper_api import WhisperAPIConverter
                    
                        return WhisperAPIConverter(self)
                    

                    YouTubeConverterConfig

                    Bases: BaseConverterConfig

                    Configuration for YouTube transcript converter.

                    Source code in src/llmling_agent/models/converters.py
                    59
                    60
                    61
                    62
                    63
                    64
                    65
                    66
                    67
                    68
                    69
                    70
                    71
                    72
                    73
                    74
                    75
                    76
                    77
                    78
                    79
                    80
                    81
                    82
                    83
                    84
                    85
                    86
                    87
                    88
                    89
                    90
                    class YouTubeConverterConfig(BaseConverterConfig):
                        """Configuration for YouTube transcript converter."""
                    
                        type: Literal["youtube"] = Field("youtube", init=False)
                        """Type discriminator for converter config."""
                    
                        languages: list[str] = Field(default_factory=lambda: ["en"])
                        """Preferred language codes in priority order. Defaults to ['en']."""
                    
                        format: FormatterType = "text"
                        """Output format. One of: text, json, vtt, srt."""
                    
                        preserve_formatting: bool = False
                        """Whether to keep HTML formatting elements like <i> and <b>."""
                    
                        cookies_path: str | None = None
                        """Optional path to cookies file for age-restricted videos."""
                    
                        https_proxy: str | None = None
                        """Optional HTTPS proxy URL (format: https://user:pass@domain:port)."""
                    
                        max_retries: int = 3
                        """Maximum number of retries for failed requests."""
                    
                        timeout: int = 30
                        """Request timeout in seconds."""
                    
                        def get_converter(self) -> DocumentConverter:
                            """Get the converter instance."""
                            from llmling_agent_converters.youtubeconverter import YouTubeTranscriptConverter
                    
                            return YouTubeTranscriptConverter(self)
                    

                    cookies_path class-attribute instance-attribute

                    cookies_path: str | None = None
                    

                    Optional path to cookies file for age-restricted videos.

                    format class-attribute instance-attribute

                    format: FormatterType = 'text'
                    

                    Output format. One of: text, json, vtt, srt.

                    https_proxy class-attribute instance-attribute

                    https_proxy: str | None = None
                    

                    Optional HTTPS proxy URL (format: https://user:pass@domain:port).

                    languages class-attribute instance-attribute

                    languages: list[str] = Field(default_factory=lambda: ['en'])
                    

                    Preferred language codes in priority order. Defaults to ['en'].

                    max_retries class-attribute instance-attribute

                    max_retries: int = 3
                    

                    Maximum number of retries for failed requests.

                    preserve_formatting class-attribute instance-attribute

                    preserve_formatting: bool = False
                    

                    Whether to keep HTML formatting elements like and .

                    timeout class-attribute instance-attribute

                    timeout: int = 30
                    

                    Request timeout in seconds.

                    type class-attribute instance-attribute

                    type: Literal['youtube'] = Field('youtube', init=False)
                    

                    Type discriminator for converter config.

                    get_converter

                    get_converter() -> DocumentConverter
                    

                    Get the converter instance.

                    Source code in src/llmling_agent/models/converters.py
                    86
                    87
                    88
                    89
                    90
                    def get_converter(self) -> DocumentConverter:
                        """Get the converter instance."""
                        from llmling_agent_converters.youtubeconverter import YouTubeTranscriptConverter
                    
                        return YouTubeTranscriptConverter(self)