API Reference¶

Core¶

Collector¶

class aiobs.collector.Collector[source]¶

Bases: object

Simple, global-style collector with pluggable provider instrumentation.

API:

observe(): enable instrumentation and start a session
end(): finish current session
flush(): write captured data to JSON (default: ./<session-id>.json)

add_label(key: str, value: str) → None[source]¶

Add a single label to the current session.

Parameters:

key – Label key (lowercase alphanumeric with underscores).
value – Label value (UTF-8 string, max 256 chars).

Raises:

RuntimeError – If no active session.
ValueError – If key or value is invalid.

end() → None[source]¶

flush(path: str | None = None, include_trace_tree: bool = True, exporter: 'BaseExporter' | None = None, **exporter_kwargs: Any) → str | 'ExportResult'[source]¶

Flush all sessions and events to a file or custom exporter.

Parameters:

path – Output file path. Defaults to LLM_OBS_OUT env var or ‘<session-id>.json’. Ignored if exporter is provided.
include_trace_tree – Whether to include the nested trace_tree structure. Defaults to True.
exporter – Optional exporter instance (e.g., GCSExporter, CustomExporter). If provided, data is exported using this exporter instead of writing to a local file.
**exporter_kwargs – Additional keyword arguments passed to the exporter’s export() method.

Returns:

ExportResult from the exporter. Otherwise: The output file path used.

Return type:

If exporter is provided

get_current_span_id() → str | None[source]¶: Get the current span ID from context (for parent-child linking).

get_labels() → Dict[str, str][source]¶

Get all labels for the current session.

Returns:: Dictionary of current labels (empty dict if none).
Raises:: RuntimeError – If no active session.

observe(session_name: str | None = None, api_key: str | None = None, labels: Dict[str, str] | None = None) → str[source]¶

Enable instrumentation (once) and start a new session.

Parameters:

session_name – Optional name for the session.
api_key – API key (aiobs_sk_…) for usage tracking with shepherd-server. Can also be set via AIOBS_API_KEY environment variable.
labels – Optional dictionary of key-value labels for filtering and categorization. Keys must be lowercase alphanumeric with underscores (matching ^[a-z][a-z0-9_]{0,62}$). Values are UTF-8 strings (max 256 chars). Labels from AIOBS_LABEL_* environment variables are automatically merged.

Returns a session id.

Raises:

ValueError – If no API key is provided, the API key is invalid, or labels contain invalid keys/values.
RuntimeError – If unable to connect to shepherd server.

register_provider(provider: Any) → None[source]¶

remove_label(key: str) → None[source]¶

Remove a label from the current session.

Parameters:

key – Label key to remove.

Raises:

RuntimeError – If no active session.
ValueError – If trying to remove a system label.

reset() → None[source]¶: Reset collector state and unpatch providers (for tests/dev).

reset_span_id(token: Token) → None[source]¶: Reset the span ID to its previous value using the token.

set_current_span_id(span_id: str | None) → Token[source]¶: Set the current span ID in context. Returns a token to restore previous value.

set_labels(labels: Dict[str, str], merge: bool = True) → None[source]¶

Set or update labels for the current session.

Parameters:

labels – Dictionary of labels to set.
merge – If True, merge with existing labels. If False, replace all user labels (system labels are preserved).

Raises:

RuntimeError – If no active session.
ValueError – If labels contain invalid keys or values.

Observe Decorator¶

@observe decorator for tracing function execution.

aiobs.observe.observe(func: F) → F[source]¶

aiobs.observe.observe(*, name: str | None = None, capture_args: bool = True, capture_result: bool = True, enh_prompt: bool = False, auto_enhance_after: int | None = None) → Callable[[F], F]

Decorator to trace function execution.

Can be used with or without arguments:

@observe def my_func(): …

@observe(name=”custom_name”) def my_func(): …

@observe(enh_prompt=True, auto_enhance_after=10) def my_func(): …

Parameters:

func – The function to wrap (when used without parentheses)
name – Optional custom name for the traced function
capture_args – Whether to capture function arguments (default: True)
capture_result – Whether to capture the return value (default: True)
enh_prompt – Whether to include this trace in enh_prompt_traces for enhanced prompt analysis (default: False)
auto_enhance_after – Number of traces after which to run auto prompt enhancer (only relevant when enh_prompt=True)

Returns:

The wrapped function that records execution traces

Models¶

class aiobs.models.observability.Callsite(*, file: str | None = None, line: int | None = None, function: str | None = None)[source]¶

Bases: BaseModel

file: str | None¶

function: str | None¶

line: int | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class aiobs.models.observability.Event(*, provider: str, api: str, request: Any, response: Any | None = None, error: str | None = None, started_at: float, ended_at: float, duration_ms: float, callsite: Callsite | None = None, span_id: str | None = None, parent_span_id: str | None = None)[source]¶

Bases: BaseModel

api: str¶

callsite: Callsite | None¶

duration_ms: float¶

ended_at: float¶

error: str | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parent_span_id: str | None¶

provider: str¶

request: Any¶

response: Any | None¶

span_id: str | None¶

started_at: float¶

class aiobs.models.observability.FunctionEvent(*, provider: str = 'function', api: str, name: str, module: str | None = None, args: List[Any] | None = None, kwargs: dict | None = None, result: Any | None = None, error: str | None = None, started_at: float, ended_at: float, duration_ms: float, callsite: Callsite | None = None, span_id: str | None = None, parent_span_id: str | None = None, enh_prompt: bool = False, enh_prompt_id: str | None = None, auto_enhance_after: int | None = None)[source]¶

Bases: BaseModel

Event model for tracing decorated functions.

api: str¶

args: List[Any] | None¶

auto_enhance_after: int | None¶

callsite: Callsite | None¶

duration_ms: float¶

ended_at: float¶

enh_prompt: bool¶

enh_prompt_id: str | None¶

error: str | None¶

kwargs: dict | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

module: str | None¶

name: str¶

parent_span_id: str | None¶

provider: str¶

result: Any | None¶

span_id: str | None¶

started_at: float¶

class aiobs.models.observability.ObservabilityExport(*, sessions: ~typing.List[~aiobs.models.observability.Session], events: ~typing.List[~aiobs.models.observability.ObservedEvent], function_events: ~typing.List[~aiobs.models.observability.ObservedFunctionEvent] = <factory>, trace_tree: ~typing.List[~typing.Any] | None = None, enh_prompt_traces: ~typing.List[str] | None = None, generated_at: float, version: int = 1)[source]¶

Bases: BaseModel

enh_prompt_traces: List[str] | None¶

events: List[ObservedEvent]¶

function_events: List[ObservedFunctionEvent]¶

generated_at: float¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sessions: List[Session]¶

trace_tree: List[Any] | None¶

version: int¶

class aiobs.models.observability.ObservedEvent(*, provider: str, api: str, request: Any, response: Any | None = None, error: str | None = None, started_at: float, ended_at: float, duration_ms: float, callsite: Callsite | None = None, span_id: str | None = None, parent_span_id: str | None = None, session_id: str)[source]¶

Bases: Event

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

session_id: str¶

class aiobs.models.observability.ObservedFunctionEvent(*, provider: str = 'function', api: str, name: str, module: str | None = None, args: List[Any] | None = None, kwargs: dict | None = None, result: Any | None = None, error: str | None = None, started_at: float, ended_at: float, duration_ms: float, callsite: Callsite | None = None, span_id: str | None = None, parent_span_id: str | None = None, enh_prompt: bool = False, enh_prompt_id: str | None = None, auto_enhance_after: int | None = None, session_id: str)[source]¶

Bases: FunctionEvent

Function event with session_id for export.

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

session_id: str¶

class aiobs.models.observability.Session(*, id: str, name: str, started_at: float, ended_at: float | None = None, meta: SessionMeta, labels: Dict[str, str] | None = None)[source]¶

Bases: BaseModel

ended_at: float | None¶

id: str¶

labels: Dict[str, str] | None¶

meta: SessionMeta¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str¶

started_at: float¶

class aiobs.models.observability.SessionMeta(*, pid: int, cwd: str)[source]¶

Bases: BaseModel

cwd: str¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

pid: int¶

Providers¶

Base Provider¶

class aiobs.providers.base.BaseProvider[source]¶

Bases: ABC

Abstract base class for provider instrumentation.

Subclasses install monkeypatches or hooks to capture request/response details and call collector._record_event(…) with normalized payloads.

abstractmethod install(collector: Any) → Callable[[], None] | None[source]¶: Apply instrumentation and return an optional unpatch function.

classmethod is_available() → bool[source]¶: Return True if the provider can be instrumented (deps present).

name: str = 'provider'¶

OpenAI Provider¶

class aiobs.providers.openai.provider.OpenAIProvider[source]¶

Bases: BaseProvider

install(collector: Any) → Callable[[], None] | None[source]¶: Apply instrumentation and return an optional unpatch function.

classmethod is_available() → bool[source]¶: Return True if the provider can be instrumented (deps present).

name: str = 'openai'¶

OpenAI API Modules¶

class aiobs.providers.openai.apis.base_api.BaseOpenAIAPIModule[source]¶

Bases: ABC

Abstract interface for an OpenAI API module.

abstractmethod install(collector: Any) → Callable[[], None] | None[source]¶: Install instrumentation and return optional unpatch function.

classmethod is_available() → bool[source]¶

name: str = 'openai-api'¶

class aiobs.providers.openai.apis.chat_completions.ChatCompletionsAPI[source]¶

Bases: BaseOpenAIAPIModule

install(collector: Any) → Callable[[], None] | None[source]¶: Install instrumentation and return optional unpatch function.

classmethod is_available() → bool[source]¶

name: str = 'chat.completions'¶

class aiobs.providers.openai.apis.embeddings.EmbeddingsAPI[source]¶

Bases: BaseOpenAIAPIModule

install(collector: Any) → Callable[[], None] | None[source]¶: Install instrumentation and return optional unpatch function.

classmethod is_available() → bool[source]¶

name: str = 'embeddings'¶

OpenAI API Models¶

class aiobs.providers.openai.apis.models.base.BaseOpenAIRequest(*, model: str | None = None)[source]¶

Bases: BaseModel

Base class for OpenAI request capture models.

model: str | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

redacted() → BaseOpenAIRequest[source]¶: Return a copy safe for logging (override in subclasses).

class aiobs.providers.openai.apis.models.base.BaseOpenAIResponse(*, id: str | None = None, model: str | None = None, usage: Dict[str, Any] | None = None)[source]¶

Bases: BaseModel

Base class for OpenAI response capture models.

id: str | None¶

model: str | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

redacted() → BaseOpenAIResponse[source]¶: Return a copy safe for logging (override in subclasses).

usage: Dict[str, Any] | None¶

class aiobs.providers.openai.apis.models.chat_completions.ChatCompletionsRequest(*, model: str | None = None, messages: ~typing.List[~aiobs.providers.openai.apis.models.chat_completions.Message] | None = None, temperature: float | None = None, max_tokens: int | None = None, other: ~typing.Dict[str, ~typing.Any] = <factory>)[source]¶

Bases: BaseOpenAIRequest

max_tokens: int | None¶

messages: List[Message] | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

other: Dict[str, Any]¶

temperature: float | None¶

class aiobs.providers.openai.apis.models.chat_completions.ChatCompletionsResponse(*, id: str | None = None, model: str | None = None, usage: Dict[str, Any] | None = None, text: str | None = None)[source]¶

Bases: BaseOpenAIResponse

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

text: str | None¶

class aiobs.providers.openai.apis.models.chat_completions.Message(*, role: str, content: Any)[source]¶

Bases: BaseModel

content: Any¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

role: str¶

class aiobs.providers.openai.apis.models.embeddings.EmbeddingData(*, index: int, embedding: ~typing.List[float] = <factory>, object: str = 'embedding')[source]¶

Bases: BaseModel

Single embedding object in the response.

embedding: List[float]¶

index: int¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

object: str¶

Bases: BaseOpenAIRequest

Request model for OpenAI embeddings.create API.

dimensions: int | None¶

encoding_format: str | None¶

input: str | List[str] | List[int] | List[List[int]] | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

other: Dict[str, Any]¶

user: str | None¶

Bases: BaseOpenAIResponse

Response model for OpenAI embeddings.create API.

data: List[EmbeddingData] | None¶

embedding_dimensions: int | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

object: str | None¶

Gemini Provider¶

class aiobs.providers.gemini.provider.GeminiProvider[source]¶

Bases: BaseProvider

install(collector: Any) → Callable[[], None] | None[source]¶: Apply instrumentation and return an optional unpatch function.

classmethod is_available() → bool[source]¶: Return True if the provider can be instrumented (deps present).

name: str = 'gemini'¶

Gemini API Modules¶

class aiobs.providers.gemini.apis.base_api.BaseGeminiAPIModule[source]¶

Bases: ABC

Abstract interface for a Gemini API module.

abstractmethod install(collector: Any) → Callable[[], None] | None[source]¶: Install instrumentation and return optional unpatch function.

classmethod is_available() → bool[source]¶

name: str = 'gemini-api'¶

class aiobs.providers.gemini.apis.generate_content.GenerateContentAPI[source]¶

Bases: BaseGeminiAPIModule

install(collector: Any) → Callable[[], None] | None[source]¶: Install instrumentation and return optional unpatch function.

classmethod is_available() → bool[source]¶

name: str = 'models.generate_content'¶

class aiobs.providers.gemini.apis.generate_videos.GenerateVideosAPI[source]¶

Bases: BaseGeminiAPIModule

install(collector: Any) → Callable[[], None] | None[source]¶: Install instrumentation and return optional unpatch function.

classmethod is_available() → bool[source]¶

name: str = 'models.generate_videos'¶

Gemini API Models¶

class aiobs.providers.gemini.apis.models.base.BaseGeminiRequest(*, model: str | None = None)[source]¶

Bases: BaseModel

Base class for Gemini request capture models.

model: str | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

redacted() → BaseGeminiRequest[source]¶: Return a copy safe for logging (override in subclasses).

class aiobs.providers.gemini.apis.models.base.BaseGeminiResponse(*, model: str | None = None, usage: Dict[str, Any] | None = None)[source]¶

Bases: BaseModel

Base class for Gemini response capture models.

model: str | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

redacted() → BaseGeminiResponse[source]¶: Return a copy safe for logging (override in subclasses).

usage: Dict[str, Any] | None¶

class aiobs.providers.gemini.apis.models.generate_content.Content(*, role: str | None = None, parts: List[ContentPart] | None = None)[source]¶

Bases: BaseModel

Content structure for Gemini messages.

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parts: List[ContentPart] | None¶

role: str | None¶

class aiobs.providers.gemini.apis.models.generate_content.ContentPart(*, text: str | None = None, inline_data: Dict[str, Any] | None = None)[source]¶

Bases: BaseModel

A part of content (text, image, etc.).

inline_data: Dict[str, Any] | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

text: str | None¶

class aiobs.providers.gemini.apis.models.generate_content.GenerateContentRequest(*, model: str | None = None, contents: str | ~typing.List[~aiobs.providers.gemini.apis.models.generate_content.Content] | ~typing.Any | None = None, system_instruction: ~typing.Any | None = None, config: ~typing.Dict[str, ~typing.Any] | None = None, other: ~typing.Dict[str, ~typing.Any] = <factory>)[source]¶

Bases: BaseGeminiRequest

Request model for generate_content API.

config: Dict[str, Any] | None¶

contents: str | List[Content] | Any | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

other: Dict[str, Any]¶

system_instruction: Any | None¶

class aiobs.providers.gemini.apis.models.generate_content.GenerateContentResponse(*, model: str | None = None, usage: Dict[str, Any] | None = None, text: str | None = None, candidates: List[Dict[str, Any]] | None = None)[source]¶

Bases: BaseGeminiResponse

Response model for generate_content API.

candidates: List[Dict[str, Any]] | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

text: str | None¶

class aiobs.providers.gemini.apis.models.generate_videos.GenerateVideosRequest(*, model: str | None = None, prompt: str | None = None, image: ~typing.Dict[str, ~typing.Any] | None = None, video: ~typing.Dict[str, ~typing.Any] | None = None, config: ~typing.Dict[str, ~typing.Any] | None = None, other: ~typing.Dict[str, ~typing.Any] = <factory>)[source]¶

Bases: BaseGeminiRequest

Request model for generate_videos API.

config: Dict[str, Any] | None¶

image: Dict[str, Any] | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

other: Dict[str, Any]¶

prompt: str | None¶

video: Dict[str, Any] | None¶

class aiobs.providers.gemini.apis.models.generate_videos.GenerateVideosResponse(*, model: str | None = None, usage: Dict[str, Any] | None = None, operation_name: str | None = None, done: bool | None = None, generated_videos: List[Dict[str, Any]] | None = None)[source]¶

Bases: BaseGeminiResponse

Response model for generate_videos API.

done: bool | None¶

generated_videos: List[Dict[str, Any]] | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

operation_name: str | None¶

class aiobs.providers.gemini.apis.models.generate_videos.GeneratedVideo(*, video: Dict[str, Any] | None = None)[source]¶

Bases: BaseModel

A generated video result.

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

video: Dict[str, Any] | None¶

Bases: BaseModel

Configuration for video generation.

aspect_ratio: str | None¶

duration_seconds: int | None¶

enhance_prompt: bool | None¶

generate_audio: bool | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

negative_prompt: str | None¶

number_of_videos: int | None¶

output_gcs_uri: str | None¶

person_generation: str | None¶

resolution: str | None¶

seed: int | None¶

Classifiers¶

Base Classifier¶

Base classifier interface for aiobs.

class aiobs.classifier.base.BaseClassifier(config: ClassificationConfig | None = None)[source]¶

Bases: ABC

Abstract base class for response classifiers.

Classifiers evaluate model outputs against user inputs and system prompts to determine if the response quality is good, bad, or uncertain.

Subclasses must implement:

classify(): Synchronous classification
classify_async(): Asynchronous classification
classify_batch(): Batch classification for multiple inputs

Example usage:

from aiobs.classifier import OpenAIClassifier

classifier = OpenAIClassifier(api_key=”…”) result = classifier.classify(

system_prompt=”You are a helpful assistant.”, user_input=”What is 2+2?”, model_output=”2+2 equals 4.”

) print(result.verdict) # ClassificationVerdict.GOOD

abstractmethod classify(user_input: str, model_output: str, system_prompt: str | None = None, **kwargs: Any) → ClassificationResult[source]¶

Classify a model response synchronously.

Parameters:

user_input – The user’s input/query to the model.
model_output – The model’s generated response.
system_prompt – Optional system prompt provided to the model.
**kwargs – Additional arguments for the classifier.

Returns:

ClassificationResult with verdict, confidence, and reasoning.

abstractmethod async classify_async(user_input: str, model_output: str, system_prompt: str | None = None, **kwargs: Any) → ClassificationResult[source]¶

Classify a model response asynchronously.

Parameters:

user_input – The user’s input/query to the model.
model_output – The model’s generated response.
system_prompt – Optional system prompt provided to the model.
**kwargs – Additional arguments for the classifier.

Returns:

ClassificationResult with verdict, confidence, and reasoning.

abstractmethod classify_batch(inputs: List[ClassificationInput], **kwargs: Any) → List[ClassificationResult][source]¶

Classify multiple model responses in batch.

Parameters:

inputs – List of ClassificationInput objects to classify.
**kwargs – Additional arguments for the classifier.

Returns:

List of ClassificationResult objects, one per input.

abstractmethod async classify_batch_async(inputs: List[ClassificationInput], **kwargs: Any) → List[ClassificationResult][source]¶

Classify multiple model responses asynchronously in batch.

Parameters:

inputs – List of ClassificationInput objects to classify.
**kwargs – Additional arguments for the classifier.

Returns:

List of ClassificationResult objects, one per input.

classmethod is_available() → bool[source]¶

Check if this classifier can be used (dependencies present).

Returns:: True if all required dependencies are available.

name: str = 'base'¶

Classification Models¶

Pydantic models for classification inputs and outputs.

class aiobs.classifier.models.classification.ClassificationConfig(*, model: str = 'gpt-4o-mini', temperature: Annotated[float, Ge(ge=0.0), Le(le=2.0)] = 0.0, max_tokens: int = 1024, classification_prompt: str | None = None, confidence_threshold: Annotated[float, Ge(ge=0.0), Le(le=1.0)] = 0.7)[source]¶

Bases: BaseModel

Configuration for classifier behavior.

classification_prompt: str | None¶

confidence_threshold: float¶

max_tokens: int¶

model: str¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

temperature: float¶

class aiobs.classifier.models.classification.ClassificationInput(*, system_prompt: str | None = None, user_input: str, model_output: str, context: Dict[str, Any] | None = None)[source]¶

Bases: BaseModel

Input model for classification.

Contains the system prompt, user input, and model output that will be evaluated by the classifier.

context: Dict[str, Any] | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_output: str¶

system_prompt: str | None¶

user_input: str¶

class aiobs.classifier.models.classification.ClassificationResult(*, verdict: ClassificationVerdict, confidence: Annotated[float, Ge(ge=0.0), Le(le=1.0)], reasoning: str | None = None, categories: List[str] | None = None, raw_response: Any | None = None, metadata: Dict[str, Any] | None = None)[source]¶

Bases: BaseModel

Result model for classification.

Contains the verdict (good/bad/uncertain), confidence score, reasoning, and any additional metadata.

categories: List[str] | None¶

confidence: float¶

metadata: Dict[str, Any] | None¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

raw_response: Any | None¶

reasoning: str | None¶

verdict: ClassificationVerdict¶

class aiobs.classifier.models.classification.ClassificationVerdict(value)[source]¶

Bases: str, Enum

Verdict for classification result.

BAD = 'bad'¶

GOOD = 'good'¶

UNCERTAIN = 'uncertain'¶

OpenAI Classifier¶

OpenAI-based classifier implementation.

class aiobs.classifier.openai.classifier.OpenAIClassifier(api_key: str | None = None, config: ClassificationConfig | None = None, client: Any | None = None, async_client: Any | None = None)[source]¶

Bases: BaseClassifier

Classifier using OpenAI’s models to evaluate response quality.

Uses OpenAI’s chat completion API to analyze model outputs and determine if they are good, bad, or uncertain.

Example

from aiobs.classifier import OpenAIClassifier

classifier = OpenAIClassifier(api_key=”sk-…”) result = classifier.classify(

user_input=”What is the capital of France?”, model_output=”The capital of France is Paris.”, system_prompt=”You are a helpful geography assistant.”

)

if result.verdict == ClassificationVerdict.GOOD:: print(“Response is good!”)
else:: print(f”Issues: {result.categories}”)

classify(user_input: str, model_output: str, system_prompt: str | None = None, **kwargs: Any) → ClassificationResult[source]¶

Classify a model response synchronously using OpenAI.

Parameters:

user_input – The user’s input/query to the model.
model_output – The model’s generated response.
system_prompt – Optional system prompt provided to the model.
**kwargs – Additional arguments (passed to context).

Returns:

ClassificationResult with verdict, confidence, and reasoning.

async classify_async(user_input: str, model_output: str, system_prompt: str | None = None, **kwargs: Any) → ClassificationResult[source]¶

Classify a model response asynchronously using OpenAI.

Parameters:

user_input – The user’s input/query to the model.
model_output – The model’s generated response.
system_prompt – Optional system prompt provided to the model.
**kwargs – Additional arguments (passed to context).

Returns:

ClassificationResult with verdict, confidence, and reasoning.

classify_batch(inputs: List[ClassificationInput], **kwargs: Any) → List[ClassificationResult][source]¶

Classify multiple model responses in batch (sequential).

Note: This runs classifications sequentially. For true parallel execution, use classify_batch_async.

Parameters:

inputs – List of ClassificationInput objects to classify.
**kwargs – Additional arguments for the classifier.

Returns:

List of ClassificationResult objects, one per input.

async classify_batch_async(inputs: List[ClassificationInput], **kwargs: Any) → List[ClassificationResult][source]¶

Classify multiple model responses asynchronously in parallel.

Uses asyncio.gather for concurrent classification requests.

Parameters:

inputs – List of ClassificationInput objects to classify.
**kwargs – Additional arguments for the classifier.

Returns:

List of ClassificationResult objects, one per input.

classmethod is_available() → bool[source]¶: Check if OpenAI library is available.

name: str = 'openai'¶

LLM Abstraction¶

The LLM module provides a unified interface for interacting with different LLM providers. It is used internally by LLM-based evaluators like HallucinationDetectionEval.

LLM Factory¶

LLM factory for auto-detecting and creating LLM adapters.

class aiobs.llm.factory.LLM[source]¶

Bases: object

Factory class for creating LLM adapters.

Provides a unified interface for different LLM providers through automatic client detection or explicit provider specification.

Example

from openai import OpenAI from aiobs.llm import LLM

# Auto-detect from client client = OpenAI() llm = LLM.from_client(client, model=”gpt-4o”)

response = llm.complete(“What is 2+2?”) print(response.content) # “4”

# Async usage response = await llm.complete_async(“What is 2+2?”)

static anthropic(client: Any, model: str = 'claude-3-sonnet-20240229', temperature: float = 0.0, max_tokens: int | None = 1024) → AnthropicLLM[source]¶

Create an Anthropic LLM adapter explicitly.

Parameters:

client – Anthropic client instance.
model – Model name (default: “claude-3-sonnet-20240229”).
temperature – Sampling temperature.
max_tokens – Maximum tokens to generate (default: 1024).

Returns:

AnthropicLLM adapter instance.

static from_client(client: Any, model: str, temperature: float = 0.0, max_tokens: int | None = None) → BaseLLM[source]¶

Create an LLM adapter by auto-detecting the client type.

Parameters:

client – The LLM provider’s client instance.
model – Model name/identifier.
temperature – Sampling temperature (0.0 = deterministic).
max_tokens – Maximum tokens to generate.

Returns:

Appropriate LLM adapter instance.

Raises:

ValueError – If client type is not recognized.

Example

from openai import OpenAI llm = LLM.from_client(OpenAI(), model=”gpt-4o”)

from google import genai llm = LLM.from_client(genai.Client(), model=”gemini-2.0-flash”)

from anthropic import Anthropic llm = LLM.from_client(Anthropic(), model=”claude-3-sonnet-20240229”)

static gemini(client: Any, model: str = 'gemini-2.0-flash', temperature: float = 0.0, max_tokens: int | None = None) → GeminiLLM[source]¶

Create a Gemini LLM adapter explicitly.

Parameters:

client – Google GenAI client instance.
model – Model name (default: “gemini-2.0-flash”).
temperature – Sampling temperature.
max_tokens – Maximum tokens to generate.

Returns:

GeminiLLM adapter instance.

static openai(client: Any, model: str = 'gpt-4o-mini', temperature: float = 0.0, max_tokens: int | None = None) → OpenAILLM[source]¶

Create an OpenAI LLM adapter explicitly.

Parameters:

client – OpenAI client instance.
model – Model name (default: “gpt-4o-mini”).
temperature – Sampling temperature.
max_tokens – Maximum tokens to generate.

Returns:

OpenAILLM adapter instance.

Base LLM¶

Base LLM interface for aiobs.

class aiobs.llm.base.BaseLLM(client: Any, model: str, temperature: float = 0.0, max_tokens: int | None = None)[source]¶

Bases: ABC

Abstract base class for LLM adapters.

Provides a unified interface for different LLM providers.

abstractmethod complete(prompt: str, system_prompt: str | None = None, **kwargs: Any) → LLMResponse[source]¶

Generate a completion synchronously.

Parameters:

prompt – The user prompt.
system_prompt – Optional system prompt.
**kwargs – Additional provider-specific arguments.

Returns:

LLMResponse with generated content.

abstractmethod async complete_async(prompt: str, system_prompt: str | None = None, **kwargs: Any) → LLMResponse[source]¶

Generate a completion asynchronously.

Parameters:

prompt – The user prompt.
system_prompt – Optional system prompt.
**kwargs – Additional provider-specific arguments.

Returns:

LLMResponse with generated content.

complete_messages(messages: List[LLMMessage], **kwargs: Any) → LLMResponse[source]¶

Generate a completion from a list of messages.

Parameters:

messages – List of conversation messages.
**kwargs – Additional provider-specific arguments.

Returns:

LLMResponse with generated content.

async complete_messages_async(messages: List[LLMMessage], **kwargs: Any) → LLMResponse[source]¶

Generate a completion from messages asynchronously.

Parameters:

messages – List of conversation messages.
**kwargs – Additional provider-specific arguments.

Returns:

LLMResponse with generated content.

provider: str = 'base'¶

class aiobs.llm.base.LLMMessage(*, role: str, content: str)[source]¶

Bases: BaseModel

A message in a conversation.

content: str¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

role: str¶

class aiobs.llm.base.LLMResponse(*, content: str, model: str, usage: Dict[str, int] | None = None, raw_response: Any | None = None)[source]¶

Bases: BaseModel

Response from an LLM completion.

content: str¶

model: str¶

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

raw_response: Any | None¶

usage: Dict[str, int] | None¶

OpenAI LLM¶

OpenAI LLM adapter.

class aiobs.llm.openai.OpenAILLM(client: Any, model: str, temperature: float = 0.0, max_tokens: int | None = None)[source]¶

Bases: BaseLLM

LLM adapter for OpenAI and OpenAI-compatible APIs.

Works with: - OpenAI - Azure OpenAI - Groq - Together AI - Any OpenAI-compatible API

Example

from openai import OpenAI from aiobs.llm import LLM

client = OpenAI() llm = LLM.from_client(client, model=”gpt-4o”) response = llm.complete(“Hello!”)

complete(prompt: str, system_prompt: str | None = None, **kwargs: Any) → LLMResponse[source]¶

Generate a completion synchronously.

Parameters:

prompt – The user prompt.
system_prompt – Optional system prompt.
**kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

async complete_async(prompt: str, system_prompt: str | None = None, **kwargs: Any) → LLMResponse[source]¶

Generate a completion asynchronously.

Parameters:

prompt – The user prompt.
system_prompt – Optional system prompt.
**kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

complete_messages(messages: List[LLMMessage], **kwargs: Any) → LLMResponse[source]¶

Generate a completion from a list of messages.

Parameters:

messages – List of conversation messages.
**kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

classmethod is_compatible(client: Any) → bool[source]¶

Check if client is OpenAI-compatible.

Parameters:: client – Client instance to check.
Returns:: True if client has OpenAI-compatible interface.

provider: str = 'openai'¶

Gemini LLM¶

Google Gemini LLM adapter.

class aiobs.llm.gemini.GeminiLLM(client: Any, model: str, temperature: float = 0.0, max_tokens: int | None = None)[source]¶

Bases: BaseLLM

LLM adapter for Google Gemini API.

Example

from google import genai from aiobs.llm import LLM

client = genai.Client() llm = LLM.from_client(client, model=”gemini-2.0-flash”) response = llm.complete(“Hello!”)

complete(prompt: str, system_prompt: str | None = None, **kwargs: Any) → LLMResponse[source]¶

Generate a completion synchronously.

Parameters:

prompt – The user prompt.
system_prompt – Optional system prompt.
**kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

async complete_async(prompt: str, system_prompt: str | None = None, **kwargs: Any) → LLMResponse[source]¶

Generate a completion asynchronously.

Parameters:

prompt – The user prompt.
system_prompt – Optional system prompt.
**kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

complete_messages(messages: List[LLMMessage], **kwargs: Any) → LLMResponse[source]¶

Generate a completion from a list of messages.

Parameters:

messages – List of conversation messages.
**kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

classmethod is_compatible(client: Any) → bool[source]¶

Check if client is Gemini-compatible.

Parameters:: client – Client instance to check.
Returns:: True if client has Gemini-compatible interface.

provider: str = 'gemini'¶

Anthropic LLM¶

Anthropic Claude LLM adapter.

class aiobs.llm.anthropic.AnthropicLLM(client: Any, model: str, temperature: float = 0.0, max_tokens: int | None = 1024)[source]¶

Bases: BaseLLM

LLM adapter for Anthropic Claude API.

Example

from anthropic import Anthropic from aiobs.llm import LLM

client = Anthropic() llm = LLM.from_client(client, model=”claude-3-sonnet-20240229”) response = llm.complete(“Hello!”)

complete(prompt: str, system_prompt: str | None = None, **kwargs: Any) → LLMResponse[source]¶

Generate a completion synchronously.

Parameters:

prompt – The user prompt.
system_prompt – Optional system prompt.
**kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

async complete_async(prompt: str, system_prompt: str | None = None, **kwargs: Any) → LLMResponse[source]¶

Generate a completion asynchronously.

Parameters:

prompt – The user prompt.
system_prompt – Optional system prompt.
**kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

complete_messages(messages: List[LLMMessage], **kwargs: Any) → LLMResponse[source]¶

Generate a completion from a list of messages.

Parameters:

messages – List of conversation messages.
**kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

classmethod is_compatible(client: Any) → bool[source]¶

Check if client is Anthropic-compatible.

Parameters:: client – Client instance to check.
Returns:: True if client has Anthropic-compatible interface.

provider: str = 'anthropic'¶