Usage¶

Session Labels¶

Add labels to sessions for filtering and categorization in enterprise dashboards:

from aiobs import observer

observer.observe(
    session_name="my-session",
    labels={
        "environment": "production",
        "team": "ml-platform",
        "project": "recommendation-engine",
        "version": "v2.3.1",
    }
)

# ... your LLM calls ...

observer.end()
observer.flush()

Labels are key-value string pairs that enable:

Dashboard filtering: Filter sessions by environment, team, project, etc.
Cost attribution: Track usage by team or project
Comparison: Compare metrics across environments (prod vs staging)

Label Constraints¶

Constraint	Specification
Key format	Lowercase alphanumeric with underscores (`^[a-z][a-z0-9_]{0,62}$`)
Value format	UTF-8 string, max 256 characters
Max labels	64 per session
Reserved prefix	`aiobs_` (used for system labels)

Dynamic Label Updates¶

Update labels during an active session:

from aiobs import observer

observer.observe(labels={"environment": "staging"})

# Add a single label
observer.add_label("user_tier", "enterprise")

# Update multiple labels (merge with existing)
observer.set_labels({"experiment_id": "exp-42", "feature_flag": "new_model"})

# Replace all user labels (system labels preserved)
observer.set_labels({"environment": "production"}, merge=False)

# Remove a label
observer.remove_label("experiment_id")

# Get current labels
labels = observer.get_labels()
print(labels)  # {'environment': 'production', 'aiobs_sdk_version': '0.1.0', ...}

observer.end()
observer.flush()

Environment Variable Labels¶

Labels can be auto-populated from environment variables:

# Set in shell or .env
export AIOBS_LABEL_ENVIRONMENT=production
export AIOBS_LABEL_TEAM=ml-platform
export AIOBS_LABEL_SERVICE=api-gateway

These are automatically merged with explicit labels (explicit takes precedence):

# AIOBS_LABEL_ENVIRONMENT=staging is set in env

observer.observe(labels={"environment": "production"})
labels = observer.get_labels()
print(labels["environment"])  # "production" (explicit overrides env)

System Labels¶

The following labels are automatically added to every session:

aiobs_sdk_version: SDK version
aiobs_python_version: Python runtime version
aiobs_hostname: Machine hostname
aiobs_os: Operating system

Simple Chat Completions (OpenAI)¶

The repository includes a simple example at example/simple-chat-completion/chat.py.

Key lines:

from aiobs import observer

observer.observe()
# Call OpenAI Chat Completions via openai>=1
observer.end()
observer.flush()

OpenAI Embeddings¶

aiobs automatically instruments OpenAI’s embeddings.create API:

from aiobs import observer
from openai import OpenAI

observer.observe()

client = OpenAI()
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Hello world"
)

observer.end()
observer.flush()

The captured data includes:

Request: model, input text(s), encoding_format, dimensions
Response: embedding vectors, dimensions, usage statistics
Timing: start/end timestamps, duration_ms

For batch embeddings with multiple inputs:

from aiobs import observer
from openai import OpenAI

observer.observe()

client = OpenAI()
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=["Hello world", "Goodbye world", "How are you?"]
)

observer.end()
observer.flush()

Gemini Generate Content¶

Example using Google Gemini at example/gemini/main.py.

Key lines:

from aiobs import observer
from google import genai

observer.observe()

client = genai.Client()
response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="Explain quantum computing"
)

observer.end()
observer.flush()

Gemini Video Generation (Veo)¶

aiobs automatically instruments Google’s Veo video generation API (models.generate_videos):

from aiobs import observer
from google import genai

observer.observe()

client = genai.Client()
operation = client.models.generate_videos(
    model="veo-3.1-generate-preview",
    prompt="A cinematic shot of waves crashing on a beach at sunset",
)

# Poll until video is ready
while not operation.done:
    time.sleep(10)
    operation = client.operations.get(operation)

observer.end()
observer.flush()

The captured data includes:

Request: model, prompt, image (for image-to-video), video (for video extension), config
Response: operation_name, done status, generated_videos metadata
Timing: start/end timestamps, duration_ms
Config options: aspect_ratio, resolution, number_of_videos, generate_audio, etc.

For image-to-video generation:

from aiobs import observer
from google import genai

observer.observe()

client = genai.Client()
operation = client.models.generate_videos(
    model="veo-3.1-generate-preview",
    prompt="Animate this landscape",
    image=image_object,  # Generated or loaded image
    config={
        "aspect_ratio": "16:9",
        "resolution": "720p",
    }
)

observer.end()
observer.flush()

Function Tracing with @observe¶

Trace any function (sync or async) with the @observe decorator:

from aiobs import observer, observe

@observe
def research(query: str) -> list:
    # your logic here
    return results

@observe(name="custom_name")
async def fetch_data(url: str) -> dict:
    # async logic here
    return data

observer.observe(session_name="my-pipeline")
research("What is an API?")
observer.end()
observer.flush()

Decorator Options¶

Option	Default	Description
`name`	function name	Custom display name for the traced function
`capture_args`	`True`	Whether to capture function arguments
`capture_result`	`True`	Whether to capture the return value
`enh_prompt`	`False`	Mark trace for enhanced prompt analysis
`auto_enhance_after`	`None`	Number of traces after which to run auto prompt enhancer

Examples:

# Don't capture sensitive arguments
@observe(capture_args=False)
def login(username: str, password: str):
    ...

# Don't capture large return values
@observe(capture_result=False)
def get_large_dataset():
    ...

Enhanced Prompt Tracing¶

Mark functions for automatic prompt enhancement analysis:

from aiobs import observer, observe

@observe(enh_prompt=True, auto_enhance_after=10)
def summarize(text: str) -> str:
    """After 10 traces, auto prompt enhancer will run."""
    response = client.chat.completions.create(...)
    return response.choices[0].message.content

@observe(enh_prompt=True, auto_enhance_after=5)
def analyze(data: dict) -> dict:
    """Different threshold for this function."""
    return process(data)

When enh_prompt=True, the decorator generates a unique enh_prompt_id for each function call. The JSON output includes:

enh_prompt_id: Unique identifier for each enhanced prompt trace
auto_enhance_after: Configured threshold for auto-enhancement
enh_prompt_traces: List of all enh_prompt_id values in the export

This allows collecting traces across multiple JSON files and rendering them in a UI for analysis.

Pipeline Example¶

Chained tasks with multiple API calls:

python -m example.pipeline.main "Your prompt here"

This runs a three-step pipeline (research → summarize → critique) and writes a single JSON file with all events.

Output¶

By default, observer.flush() writes ./llm_observability.json. Override with the LLM_OBS_OUT environment variable:

LLM_OBS_OUT=/path/to/output.json python your_script.py

What Gets Captured¶

For each session:

Session ID: Unique identifier
Session name: Optional custom name
Labels: Key-value pairs for filtering (user-defined + system labels)
Metadata: Process ID, working directory
Timing: start/end timestamps

For each LLM API call:

Provider: openai or gemini
API: e.g., chat.completions.create, embeddings.create, models.generate_content, or models.generate_videos
Request: model, messages/contents/input/prompt, core parameters
Response: text (for completions), embeddings (for embeddings API), operation info (for video generation), model, token usage (when available)
Timing: start/end timestamps, duration_ms
Errors: exception name and message if the call fails
Callsite: file path, line number, and function name where the API was called

For decorated functions (@observe):

Function name and module
Input arguments (args/kwargs)
Return value
Timing: start/end timestamps, duration_ms
Errors: exception name and message if the call fails
Callsite: file path, line number where the function was defined
Enhanced prompt metadata (enh_prompt_id, auto_enhance_after) when enabled