API Reference

Core

Collector

class aiobs.collector.Collector[source]

Bases: object

Simple, global-style collector with pluggable provider instrumentation.

API:
  • observe(): enable instrumentation and start a session

  • end(): finish current session

  • flush(): write captured data to JSON (default: ./<session-id>.json)

add_label(key: str, value: str) None[source]

Add a single label to the current session.

Parameters:
  • key – Label key (lowercase alphanumeric with underscores).

  • value – Label value (UTF-8 string, max 256 chars).

Raises:
  • RuntimeError – If no active session.

  • ValueError – If key or value is invalid.

end() None[source]
flush(path: str | None = None, include_trace_tree: bool = True, exporter: 'BaseExporter' | None = None, **exporter_kwargs: Any) str | 'ExportResult'[source]

Flush all sessions and events to a file or custom exporter.

Parameters:
  • path – Output file path. Defaults to LLM_OBS_OUT env var or ‘<session-id>.json’. Ignored if exporter is provided.

  • include_trace_tree – Whether to include the nested trace_tree structure. Defaults to True.

  • exporter – Optional exporter instance (e.g., GCSExporter, CustomExporter). If provided, data is exported using this exporter instead of writing to a local file.

  • **exporter_kwargs – Additional keyword arguments passed to the exporter’s export() method.

Returns:

ExportResult from the exporter. Otherwise: The output file path used.

Return type:

If exporter is provided

get_current_span_id() str | None[source]

Get the current span ID from context (for parent-child linking).

get_labels() Dict[str, str][source]

Get all labels for the current session.

Returns:

Dictionary of current labels (empty dict if none).

Raises:

RuntimeError – If no active session.

observe(session_name: str | None = None, api_key: str | None = None, labels: Dict[str, str] | None = None) str[source]

Enable instrumentation (once) and start a new session.

Parameters:
  • session_name – Optional name for the session.

  • api_key – API key (aiobs_sk_…) for usage tracking with shepherd-server. Can also be set via AIOBS_API_KEY environment variable.

  • labels – Optional dictionary of key-value labels for filtering and categorization. Keys must be lowercase alphanumeric with underscores (matching ^[a-z][a-z0-9_]{0,62}$). Values are UTF-8 strings (max 256 chars). Labels from AIOBS_LABEL_* environment variables are automatically merged.

Returns a session id.

Raises:
  • ValueError – If no API key is provided, the API key is invalid, or labels contain invalid keys/values.

  • RuntimeError – If unable to connect to shepherd server.

register_provider(provider: Any) None[source]
remove_label(key: str) None[source]

Remove a label from the current session.

Parameters:

key – Label key to remove.

Raises:
  • RuntimeError – If no active session.

  • ValueError – If trying to remove a system label.

reset() None[source]

Reset collector state and unpatch providers (for tests/dev).

reset_span_id(token: Token) None[source]

Reset the span ID to its previous value using the token.

set_current_span_id(span_id: str | None) Token[source]

Set the current span ID in context. Returns a token to restore previous value.

set_labels(labels: Dict[str, str], merge: bool = True) None[source]

Set or update labels for the current session.

Parameters:
  • labels – Dictionary of labels to set.

  • merge – If True, merge with existing labels. If False, replace all user labels (system labels are preserved).

Raises:
  • RuntimeError – If no active session.

  • ValueError – If labels contain invalid keys or values.

Observe Decorator

@observe decorator for tracing function execution.

aiobs.observe.observe(func: F) F[source]
aiobs.observe.observe(*, name: str | None = None, capture_args: bool = True, capture_result: bool = True, enh_prompt: bool = False, auto_enhance_after: int | None = None) Callable[[F], F]

Decorator to trace function execution.

Can be used with or without arguments:

@observe def my_func(): …

@observe(name=”custom_name”) def my_func(): …

@observe(enh_prompt=True, auto_enhance_after=10) def my_func(): …

Parameters:
  • func – The function to wrap (when used without parentheses)

  • name – Optional custom name for the traced function

  • capture_args – Whether to capture function arguments (default: True)

  • capture_result – Whether to capture the return value (default: True)

  • enh_prompt – Whether to include this trace in enh_prompt_traces for enhanced prompt analysis (default: False)

  • auto_enhance_after – Number of traces after which to run auto prompt enhancer (only relevant when enh_prompt=True)

Returns:

The wrapped function that records execution traces

Models

class aiobs.models.observability.Callsite(*, file: str | None = None, line: int | None = None, function: str | None = None)[source]

Bases: BaseModel

file: str | None
function: str | None
line: int | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class aiobs.models.observability.Event(*, provider: str, api: str, request: Any, response: Any | None = None, error: str | None = None, started_at: float, ended_at: float, duration_ms: float, callsite: Callsite | None = None, span_id: str | None = None, parent_span_id: str | None = None)[source]

Bases: BaseModel

api: str
callsite: Callsite | None
duration_ms: float
ended_at: float
error: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parent_span_id: str | None
provider: str
request: Any
response: Any | None
span_id: str | None
started_at: float
class aiobs.models.observability.FunctionEvent(*, provider: str = 'function', api: str, name: str, module: str | None = None, args: List[Any] | None = None, kwargs: dict | None = None, result: Any | None = None, error: str | None = None, started_at: float, ended_at: float, duration_ms: float, callsite: Callsite | None = None, span_id: str | None = None, parent_span_id: str | None = None, enh_prompt: bool = False, enh_prompt_id: str | None = None, auto_enhance_after: int | None = None)[source]

Bases: BaseModel

Event model for tracing decorated functions.

api: str
args: List[Any] | None
auto_enhance_after: int | None
callsite: Callsite | None
duration_ms: float
ended_at: float
enh_prompt: bool
enh_prompt_id: str | None
error: str | None
kwargs: dict | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

module: str | None
name: str
parent_span_id: str | None
provider: str
result: Any | None
span_id: str | None
started_at: float
class aiobs.models.observability.ObservabilityExport(*, sessions: ~typing.List[~aiobs.models.observability.Session], events: ~typing.List[~aiobs.models.observability.ObservedEvent], function_events: ~typing.List[~aiobs.models.observability.ObservedFunctionEvent] = <factory>, trace_tree: ~typing.List[~typing.Any] | None = None, enh_prompt_traces: ~typing.List[str] | None = None, generated_at: float, version: int = 1)[source]

Bases: BaseModel

enh_prompt_traces: List[str] | None
events: List[ObservedEvent]
function_events: List[ObservedFunctionEvent]
generated_at: float
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sessions: List[Session]
trace_tree: List[Any] | None
version: int
class aiobs.models.observability.ObservedEvent(*, provider: str, api: str, request: Any, response: Any | None = None, error: str | None = None, started_at: float, ended_at: float, duration_ms: float, callsite: Callsite | None = None, span_id: str | None = None, parent_span_id: str | None = None, session_id: str)[source]

Bases: Event

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

session_id: str
class aiobs.models.observability.ObservedFunctionEvent(*, provider: str = 'function', api: str, name: str, module: str | None = None, args: List[Any] | None = None, kwargs: dict | None = None, result: Any | None = None, error: str | None = None, started_at: float, ended_at: float, duration_ms: float, callsite: Callsite | None = None, span_id: str | None = None, parent_span_id: str | None = None, enh_prompt: bool = False, enh_prompt_id: str | None = None, auto_enhance_after: int | None = None, session_id: str)[source]

Bases: FunctionEvent

Function event with session_id for export.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

session_id: str
class aiobs.models.observability.Session(*, id: str, name: str, started_at: float, ended_at: float | None = None, meta: SessionMeta, labels: Dict[str, str] | None = None)[source]

Bases: BaseModel

ended_at: float | None
id: str
labels: Dict[str, str] | None
meta: SessionMeta
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
started_at: float
class aiobs.models.observability.SessionMeta(*, pid: int, cwd: str)[source]

Bases: BaseModel

cwd: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

pid: int

Providers

Base Provider

class aiobs.providers.base.BaseProvider[source]

Bases: ABC

Abstract base class for provider instrumentation.

Subclasses install monkeypatches or hooks to capture request/response details and call collector._record_event(…) with normalized payloads.

abstractmethod install(collector: Any) Callable[[], None] | None[source]

Apply instrumentation and return an optional unpatch function.

classmethod is_available() bool[source]

Return True if the provider can be instrumented (deps present).

name: str = 'provider'

OpenAI Provider

class aiobs.providers.openai.provider.OpenAIProvider[source]

Bases: BaseProvider

install(collector: Any) Callable[[], None] | None[source]

Apply instrumentation and return an optional unpatch function.

classmethod is_available() bool[source]

Return True if the provider can be instrumented (deps present).

name: str = 'openai'

OpenAI API Modules

class aiobs.providers.openai.apis.base_api.BaseOpenAIAPIModule[source]

Bases: ABC

Abstract interface for an OpenAI API module.

abstractmethod install(collector: Any) Callable[[], None] | None[source]

Install instrumentation and return optional unpatch function.

classmethod is_available() bool[source]
name: str = 'openai-api'
class aiobs.providers.openai.apis.chat_completions.ChatCompletionsAPI[source]

Bases: BaseOpenAIAPIModule

install(collector: Any) Callable[[], None] | None[source]

Install instrumentation and return optional unpatch function.

classmethod is_available() bool[source]
name: str = 'chat.completions'
class aiobs.providers.openai.apis.embeddings.EmbeddingsAPI[source]

Bases: BaseOpenAIAPIModule

install(collector: Any) Callable[[], None] | None[source]

Install instrumentation and return optional unpatch function.

classmethod is_available() bool[source]
name: str = 'embeddings'

OpenAI API Models

class aiobs.providers.openai.apis.models.base.BaseOpenAIRequest(*, model: str | None = None)[source]

Bases: BaseModel

Base class for OpenAI request capture models.

model: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

redacted() BaseOpenAIRequest[source]

Return a copy safe for logging (override in subclasses).

class aiobs.providers.openai.apis.models.base.BaseOpenAIResponse(*, id: str | None = None, model: str | None = None, usage: Dict[str, Any] | None = None)[source]

Bases: BaseModel

Base class for OpenAI response capture models.

id: str | None
model: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

redacted() BaseOpenAIResponse[source]

Return a copy safe for logging (override in subclasses).

usage: Dict[str, Any] | None
class aiobs.providers.openai.apis.models.chat_completions.ChatCompletionsRequest(*, model: str | None = None, messages: ~typing.List[~aiobs.providers.openai.apis.models.chat_completions.Message] | None = None, temperature: float | None = None, max_tokens: int | None = None, other: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: BaseOpenAIRequest

max_tokens: int | None
messages: List[Message] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

other: Dict[str, Any]
temperature: float | None
class aiobs.providers.openai.apis.models.chat_completions.ChatCompletionsResponse(*, id: str | None = None, model: str | None = None, usage: Dict[str, Any] | None = None, text: str | None = None)[source]

Bases: BaseOpenAIResponse

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

text: str | None
class aiobs.providers.openai.apis.models.chat_completions.Message(*, role: str, content: Any)[source]

Bases: BaseModel

content: Any
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

role: str
class aiobs.providers.openai.apis.models.embeddings.EmbeddingData(*, index: int, embedding: ~typing.List[float] = <factory>, object: str = 'embedding')[source]

Bases: BaseModel

Single embedding object in the response.

embedding: List[float]
index: int
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

object: str
class aiobs.providers.openai.apis.models.embeddings.EmbeddingsRequest(*, model: str | None = None, input: str | ~typing.List[str] | ~typing.List[int] | ~typing.List[~typing.List[int]] | None = None, encoding_format: str | None = None, dimensions: int | None = None, user: str | None = None, other: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: BaseOpenAIRequest

Request model for OpenAI embeddings.create API.

dimensions: int | None
encoding_format: str | None
input: str | List[str] | List[int] | List[List[int]] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

other: Dict[str, Any]
user: str | None
class aiobs.providers.openai.apis.models.embeddings.EmbeddingsResponse(*, id: str | None = None, model: str | None = None, usage: Dict[str, Any] | None = None, object: str | None = None, data: List[EmbeddingData] | None = None, embedding_dimensions: int | None = None)[source]

Bases: BaseOpenAIResponse

Response model for OpenAI embeddings.create API.

data: List[EmbeddingData] | None
embedding_dimensions: int | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

object: str | None

Gemini Provider

class aiobs.providers.gemini.provider.GeminiProvider[source]

Bases: BaseProvider

install(collector: Any) Callable[[], None] | None[source]

Apply instrumentation and return an optional unpatch function.

classmethod is_available() bool[source]

Return True if the provider can be instrumented (deps present).

name: str = 'gemini'

Gemini API Modules

class aiobs.providers.gemini.apis.base_api.BaseGeminiAPIModule[source]

Bases: ABC

Abstract interface for a Gemini API module.

abstractmethod install(collector: Any) Callable[[], None] | None[source]

Install instrumentation and return optional unpatch function.

classmethod is_available() bool[source]
name: str = 'gemini-api'
class aiobs.providers.gemini.apis.generate_content.GenerateContentAPI[source]

Bases: BaseGeminiAPIModule

install(collector: Any) Callable[[], None] | None[source]

Install instrumentation and return optional unpatch function.

classmethod is_available() bool[source]
name: str = 'models.generate_content'
class aiobs.providers.gemini.apis.generate_videos.GenerateVideosAPI[source]

Bases: BaseGeminiAPIModule

install(collector: Any) Callable[[], None] | None[source]

Install instrumentation and return optional unpatch function.

classmethod is_available() bool[source]
name: str = 'models.generate_videos'

Gemini API Models

class aiobs.providers.gemini.apis.models.base.BaseGeminiRequest(*, model: str | None = None)[source]

Bases: BaseModel

Base class for Gemini request capture models.

model: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

redacted() BaseGeminiRequest[source]

Return a copy safe for logging (override in subclasses).

class aiobs.providers.gemini.apis.models.base.BaseGeminiResponse(*, model: str | None = None, usage: Dict[str, Any] | None = None)[source]

Bases: BaseModel

Base class for Gemini response capture models.

model: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

redacted() BaseGeminiResponse[source]

Return a copy safe for logging (override in subclasses).

usage: Dict[str, Any] | None
class aiobs.providers.gemini.apis.models.generate_content.Content(*, role: str | None = None, parts: List[ContentPart] | None = None)[source]

Bases: BaseModel

Content structure for Gemini messages.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parts: List[ContentPart] | None
role: str | None
class aiobs.providers.gemini.apis.models.generate_content.ContentPart(*, text: str | None = None, inline_data: Dict[str, Any] | None = None)[source]

Bases: BaseModel

A part of content (text, image, etc.).

inline_data: Dict[str, Any] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

text: str | None
class aiobs.providers.gemini.apis.models.generate_content.GenerateContentRequest(*, model: str | None = None, contents: str | ~typing.List[~aiobs.providers.gemini.apis.models.generate_content.Content] | ~typing.Any | None = None, system_instruction: ~typing.Any | None = None, config: ~typing.Dict[str, ~typing.Any] | None = None, other: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: BaseGeminiRequest

Request model for generate_content API.

config: Dict[str, Any] | None
contents: str | List[Content] | Any | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

other: Dict[str, Any]
system_instruction: Any | None
class aiobs.providers.gemini.apis.models.generate_content.GenerateContentResponse(*, model: str | None = None, usage: Dict[str, Any] | None = None, text: str | None = None, candidates: List[Dict[str, Any]] | None = None)[source]

Bases: BaseGeminiResponse

Response model for generate_content API.

candidates: List[Dict[str, Any]] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

text: str | None
class aiobs.providers.gemini.apis.models.generate_videos.GenerateVideosRequest(*, model: str | None = None, prompt: str | None = None, image: ~typing.Dict[str, ~typing.Any] | None = None, video: ~typing.Dict[str, ~typing.Any] | None = None, config: ~typing.Dict[str, ~typing.Any] | None = None, other: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: BaseGeminiRequest

Request model for generate_videos API.

config: Dict[str, Any] | None
image: Dict[str, Any] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

other: Dict[str, Any]
prompt: str | None
video: Dict[str, Any] | None
class aiobs.providers.gemini.apis.models.generate_videos.GenerateVideosResponse(*, model: str | None = None, usage: Dict[str, Any] | None = None, operation_name: str | None = None, done: bool | None = None, generated_videos: List[Dict[str, Any]] | None = None)[source]

Bases: BaseGeminiResponse

Response model for generate_videos API.

done: bool | None
generated_videos: List[Dict[str, Any]] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

operation_name: str | None
class aiobs.providers.gemini.apis.models.generate_videos.GeneratedVideo(*, video: Dict[str, Any] | None = None)[source]

Bases: BaseModel

A generated video result.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

video: Dict[str, Any] | None
class aiobs.providers.gemini.apis.models.generate_videos.VideoGenerationConfig(*, aspect_ratio: str | None = None, number_of_videos: int | None = None, resolution: str | None = None, duration_seconds: int | None = None, negative_prompt: str | None = None, generate_audio: bool | None = None, enhance_prompt: bool | None = None, person_generation: str | None = None, seed: int | None = None, output_gcs_uri: str | None = None)[source]

Bases: BaseModel

Configuration for video generation.

aspect_ratio: str | None
duration_seconds: int | None
enhance_prompt: bool | None
generate_audio: bool | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

negative_prompt: str | None
number_of_videos: int | None
output_gcs_uri: str | None
person_generation: str | None
resolution: str | None
seed: int | None

Classifiers

Base Classifier

Base classifier interface for aiobs.

class aiobs.classifier.base.BaseClassifier(config: ClassificationConfig | None = None)[source]

Bases: ABC

Abstract base class for response classifiers.

Classifiers evaluate model outputs against user inputs and system prompts to determine if the response quality is good, bad, or uncertain.

Subclasses must implement:
  • classify(): Synchronous classification

  • classify_async(): Asynchronous classification

  • classify_batch(): Batch classification for multiple inputs

Example usage:

from aiobs.classifier import OpenAIClassifier

classifier = OpenAIClassifier(api_key=”…”) result = classifier.classify(

system_prompt=”You are a helpful assistant.”, user_input=”What is 2+2?”, model_output=”2+2 equals 4.”

) print(result.verdict) # ClassificationVerdict.GOOD

abstractmethod classify(user_input: str, model_output: str, system_prompt: str | None = None, **kwargs: Any) ClassificationResult[source]

Classify a model response synchronously.

Parameters:
  • user_input – The user’s input/query to the model.

  • model_output – The model’s generated response.

  • system_prompt – Optional system prompt provided to the model.

  • **kwargs – Additional arguments for the classifier.

Returns:

ClassificationResult with verdict, confidence, and reasoning.

abstractmethod async classify_async(user_input: str, model_output: str, system_prompt: str | None = None, **kwargs: Any) ClassificationResult[source]

Classify a model response asynchronously.

Parameters:
  • user_input – The user’s input/query to the model.

  • model_output – The model’s generated response.

  • system_prompt – Optional system prompt provided to the model.

  • **kwargs – Additional arguments for the classifier.

Returns:

ClassificationResult with verdict, confidence, and reasoning.

abstractmethod classify_batch(inputs: List[ClassificationInput], **kwargs: Any) List[ClassificationResult][source]

Classify multiple model responses in batch.

Parameters:
  • inputs – List of ClassificationInput objects to classify.

  • **kwargs – Additional arguments for the classifier.

Returns:

List of ClassificationResult objects, one per input.

abstractmethod async classify_batch_async(inputs: List[ClassificationInput], **kwargs: Any) List[ClassificationResult][source]

Classify multiple model responses asynchronously in batch.

Parameters:
  • inputs – List of ClassificationInput objects to classify.

  • **kwargs – Additional arguments for the classifier.

Returns:

List of ClassificationResult objects, one per input.

classmethod is_available() bool[source]

Check if this classifier can be used (dependencies present).

Returns:

True if all required dependencies are available.

name: str = 'base'

Classification Models

Pydantic models for classification inputs and outputs.

class aiobs.classifier.models.classification.ClassificationConfig(*, model: str = 'gpt-4o-mini', temperature: Annotated[float, Ge(ge=0.0), Le(le=2.0)] = 0.0, max_tokens: int = 1024, classification_prompt: str | None = None, confidence_threshold: Annotated[float, Ge(ge=0.0), Le(le=1.0)] = 0.7)[source]

Bases: BaseModel

Configuration for classifier behavior.

classification_prompt: str | None
confidence_threshold: float
max_tokens: int
model: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

temperature: float
class aiobs.classifier.models.classification.ClassificationInput(*, system_prompt: str | None = None, user_input: str, model_output: str, context: Dict[str, Any] | None = None)[source]

Bases: BaseModel

Input model for classification.

Contains the system prompt, user input, and model output that will be evaluated by the classifier.

context: Dict[str, Any] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_output: str
system_prompt: str | None
user_input: str
class aiobs.classifier.models.classification.ClassificationResult(*, verdict: ClassificationVerdict, confidence: Annotated[float, Ge(ge=0.0), Le(le=1.0)], reasoning: str | None = None, categories: List[str] | None = None, raw_response: Any | None = None, metadata: Dict[str, Any] | None = None)[source]

Bases: BaseModel

Result model for classification.

Contains the verdict (good/bad/uncertain), confidence score, reasoning, and any additional metadata.

categories: List[str] | None
confidence: float
metadata: Dict[str, Any] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

raw_response: Any | None
reasoning: str | None
verdict: ClassificationVerdict
class aiobs.classifier.models.classification.ClassificationVerdict(value)[source]

Bases: str, Enum

Verdict for classification result.

BAD = 'bad'
GOOD = 'good'
UNCERTAIN = 'uncertain'

OpenAI Classifier

OpenAI-based classifier implementation.

class aiobs.classifier.openai.classifier.OpenAIClassifier(api_key: str | None = None, config: ClassificationConfig | None = None, client: Any | None = None, async_client: Any | None = None)[source]

Bases: BaseClassifier

Classifier using OpenAI’s models to evaluate response quality.

Uses OpenAI’s chat completion API to analyze model outputs and determine if they are good, bad, or uncertain.

Example

from aiobs.classifier import OpenAIClassifier

classifier = OpenAIClassifier(api_key=”sk-…”) result = classifier.classify(

user_input=”What is the capital of France?”, model_output=”The capital of France is Paris.”, system_prompt=”You are a helpful geography assistant.”

)

if result.verdict == ClassificationVerdict.GOOD:

print(“Response is good!”)

else:

print(f”Issues: {result.categories}”)

classify(user_input: str, model_output: str, system_prompt: str | None = None, **kwargs: Any) ClassificationResult[source]

Classify a model response synchronously using OpenAI.

Parameters:
  • user_input – The user’s input/query to the model.

  • model_output – The model’s generated response.

  • system_prompt – Optional system prompt provided to the model.

  • **kwargs – Additional arguments (passed to context).

Returns:

ClassificationResult with verdict, confidence, and reasoning.

async classify_async(user_input: str, model_output: str, system_prompt: str | None = None, **kwargs: Any) ClassificationResult[source]

Classify a model response asynchronously using OpenAI.

Parameters:
  • user_input – The user’s input/query to the model.

  • model_output – The model’s generated response.

  • system_prompt – Optional system prompt provided to the model.

  • **kwargs – Additional arguments (passed to context).

Returns:

ClassificationResult with verdict, confidence, and reasoning.

classify_batch(inputs: List[ClassificationInput], **kwargs: Any) List[ClassificationResult][source]

Classify multiple model responses in batch (sequential).

Note: This runs classifications sequentially. For true parallel execution, use classify_batch_async.

Parameters:
  • inputs – List of ClassificationInput objects to classify.

  • **kwargs – Additional arguments for the classifier.

Returns:

List of ClassificationResult objects, one per input.

async classify_batch_async(inputs: List[ClassificationInput], **kwargs: Any) List[ClassificationResult][source]

Classify multiple model responses asynchronously in parallel.

Uses asyncio.gather for concurrent classification requests.

Parameters:
  • inputs – List of ClassificationInput objects to classify.

  • **kwargs – Additional arguments for the classifier.

Returns:

List of ClassificationResult objects, one per input.

classmethod is_available() bool[source]

Check if OpenAI library is available.

name: str = 'openai'

LLM Abstraction

The LLM module provides a unified interface for interacting with different LLM providers. It is used internally by LLM-based evaluators like HallucinationDetectionEval.

LLM Factory

LLM factory for auto-detecting and creating LLM adapters.

class aiobs.llm.factory.LLM[source]

Bases: object

Factory class for creating LLM adapters.

Provides a unified interface for different LLM providers through automatic client detection or explicit provider specification.

Example

from openai import OpenAI from aiobs.llm import LLM

# Auto-detect from client client = OpenAI() llm = LLM.from_client(client, model=”gpt-4o”)

response = llm.complete(“What is 2+2?”) print(response.content) # “4”

# Async usage response = await llm.complete_async(“What is 2+2?”)

static anthropic(client: Any, model: str = 'claude-3-sonnet-20240229', temperature: float = 0.0, max_tokens: int | None = 1024) AnthropicLLM[source]

Create an Anthropic LLM adapter explicitly.

Parameters:
  • client – Anthropic client instance.

  • model – Model name (default: “claude-3-sonnet-20240229”).

  • temperature – Sampling temperature.

  • max_tokens – Maximum tokens to generate (default: 1024).

Returns:

AnthropicLLM adapter instance.

static from_client(client: Any, model: str, temperature: float = 0.0, max_tokens: int | None = None) BaseLLM[source]

Create an LLM adapter by auto-detecting the client type.

Parameters:
  • client – The LLM provider’s client instance.

  • model – Model name/identifier.

  • temperature – Sampling temperature (0.0 = deterministic).

  • max_tokens – Maximum tokens to generate.

Returns:

Appropriate LLM adapter instance.

Raises:

ValueError – If client type is not recognized.

Example

from openai import OpenAI llm = LLM.from_client(OpenAI(), model=”gpt-4o”)

from google import genai llm = LLM.from_client(genai.Client(), model=”gemini-2.0-flash”)

from anthropic import Anthropic llm = LLM.from_client(Anthropic(), model=”claude-3-sonnet-20240229”)

static gemini(client: Any, model: str = 'gemini-2.0-flash', temperature: float = 0.0, max_tokens: int | None = None) GeminiLLM[source]

Create a Gemini LLM adapter explicitly.

Parameters:
  • client – Google GenAI client instance.

  • model – Model name (default: “gemini-2.0-flash”).

  • temperature – Sampling temperature.

  • max_tokens – Maximum tokens to generate.

Returns:

GeminiLLM adapter instance.

static openai(client: Any, model: str = 'gpt-4o-mini', temperature: float = 0.0, max_tokens: int | None = None) OpenAILLM[source]

Create an OpenAI LLM adapter explicitly.

Parameters:
  • client – OpenAI client instance.

  • model – Model name (default: “gpt-4o-mini”).

  • temperature – Sampling temperature.

  • max_tokens – Maximum tokens to generate.

Returns:

OpenAILLM adapter instance.

Base LLM

Base LLM interface for aiobs.

class aiobs.llm.base.BaseLLM(client: Any, model: str, temperature: float = 0.0, max_tokens: int | None = None)[source]

Bases: ABC

Abstract base class for LLM adapters.

Provides a unified interface for different LLM providers.

abstractmethod complete(prompt: str, system_prompt: str | None = None, **kwargs: Any) LLMResponse[source]

Generate a completion synchronously.

Parameters:
  • prompt – The user prompt.

  • system_prompt – Optional system prompt.

  • **kwargs – Additional provider-specific arguments.

Returns:

LLMResponse with generated content.

abstractmethod async complete_async(prompt: str, system_prompt: str | None = None, **kwargs: Any) LLMResponse[source]

Generate a completion asynchronously.

Parameters:
  • prompt – The user prompt.

  • system_prompt – Optional system prompt.

  • **kwargs – Additional provider-specific arguments.

Returns:

LLMResponse with generated content.

complete_messages(messages: List[LLMMessage], **kwargs: Any) LLMResponse[source]

Generate a completion from a list of messages.

Parameters:
  • messages – List of conversation messages.

  • **kwargs – Additional provider-specific arguments.

Returns:

LLMResponse with generated content.

async complete_messages_async(messages: List[LLMMessage], **kwargs: Any) LLMResponse[source]

Generate a completion from messages asynchronously.

Parameters:
  • messages – List of conversation messages.

  • **kwargs – Additional provider-specific arguments.

Returns:

LLMResponse with generated content.

provider: str = 'base'
class aiobs.llm.base.LLMMessage(*, role: str, content: str)[source]

Bases: BaseModel

A message in a conversation.

content: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

role: str
class aiobs.llm.base.LLMResponse(*, content: str, model: str, usage: Dict[str, int] | None = None, raw_response: Any | None = None)[source]

Bases: BaseModel

Response from an LLM completion.

content: str
model: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

raw_response: Any | None
usage: Dict[str, int] | None

OpenAI LLM

OpenAI LLM adapter.

class aiobs.llm.openai.OpenAILLM(client: Any, model: str, temperature: float = 0.0, max_tokens: int | None = None)[source]

Bases: BaseLLM

LLM adapter for OpenAI and OpenAI-compatible APIs.

Works with: - OpenAI - Azure OpenAI - Groq - Together AI - Any OpenAI-compatible API

Example

from openai import OpenAI from aiobs.llm import LLM

client = OpenAI() llm = LLM.from_client(client, model=”gpt-4o”) response = llm.complete(“Hello!”)

complete(prompt: str, system_prompt: str | None = None, **kwargs: Any) LLMResponse[source]

Generate a completion synchronously.

Parameters:
  • prompt – The user prompt.

  • system_prompt – Optional system prompt.

  • **kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

async complete_async(prompt: str, system_prompt: str | None = None, **kwargs: Any) LLMResponse[source]

Generate a completion asynchronously.

Parameters:
  • prompt – The user prompt.

  • system_prompt – Optional system prompt.

  • **kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

complete_messages(messages: List[LLMMessage], **kwargs: Any) LLMResponse[source]

Generate a completion from a list of messages.

Parameters:
  • messages – List of conversation messages.

  • **kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

classmethod is_compatible(client: Any) bool[source]

Check if client is OpenAI-compatible.

Parameters:

client – Client instance to check.

Returns:

True if client has OpenAI-compatible interface.

provider: str = 'openai'

Gemini LLM

Google Gemini LLM adapter.

class aiobs.llm.gemini.GeminiLLM(client: Any, model: str, temperature: float = 0.0, max_tokens: int | None = None)[source]

Bases: BaseLLM

LLM adapter for Google Gemini API.

Example

from google import genai from aiobs.llm import LLM

client = genai.Client() llm = LLM.from_client(client, model=”gemini-2.0-flash”) response = llm.complete(“Hello!”)

complete(prompt: str, system_prompt: str | None = None, **kwargs: Any) LLMResponse[source]

Generate a completion synchronously.

Parameters:
  • prompt – The user prompt.

  • system_prompt – Optional system prompt.

  • **kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

async complete_async(prompt: str, system_prompt: str | None = None, **kwargs: Any) LLMResponse[source]

Generate a completion asynchronously.

Parameters:
  • prompt – The user prompt.

  • system_prompt – Optional system prompt.

  • **kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

complete_messages(messages: List[LLMMessage], **kwargs: Any) LLMResponse[source]

Generate a completion from a list of messages.

Parameters:
  • messages – List of conversation messages.

  • **kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

classmethod is_compatible(client: Any) bool[source]

Check if client is Gemini-compatible.

Parameters:

client – Client instance to check.

Returns:

True if client has Gemini-compatible interface.

provider: str = 'gemini'

Anthropic LLM

Anthropic Claude LLM adapter.

class aiobs.llm.anthropic.AnthropicLLM(client: Any, model: str, temperature: float = 0.0, max_tokens: int | None = 1024)[source]

Bases: BaseLLM

LLM adapter for Anthropic Claude API.

Example

from anthropic import Anthropic from aiobs.llm import LLM

client = Anthropic() llm = LLM.from_client(client, model=”claude-3-sonnet-20240229”) response = llm.complete(“Hello!”)

complete(prompt: str, system_prompt: str | None = None, **kwargs: Any) LLMResponse[source]

Generate a completion synchronously.

Parameters:
  • prompt – The user prompt.

  • system_prompt – Optional system prompt.

  • **kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

async complete_async(prompt: str, system_prompt: str | None = None, **kwargs: Any) LLMResponse[source]

Generate a completion asynchronously.

Parameters:
  • prompt – The user prompt.

  • system_prompt – Optional system prompt.

  • **kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

complete_messages(messages: List[LLMMessage], **kwargs: Any) LLMResponse[source]

Generate a completion from a list of messages.

Parameters:
  • messages – List of conversation messages.

  • **kwargs – Additional arguments passed to the API.

Returns:

LLMResponse with generated content.

classmethod is_compatible(client: Any) bool[source]

Check if client is Anthropic-compatible.

Parameters:

client – Client instance to check.

Returns:

True if client has Anthropic-compatible interface.

provider: str = 'anthropic'