feat(instrumentation): Add litellm package instrumentation by galzilber · Pull Request #4322 · traceloop/openllmetry

galzilber · 2026-06-28T10:34:16Z

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

Summary by CodeRabbit

New Features
- Added LiteLLM tracing across chat/completions (sync/async), streaming, and embeddings with spans, metrics, and structured event logging.
- Added tool-calling serialization for inputs/outputs and consistent provider resolution/finish-reason mapping.
- Added Traceloop SDK support for the LiteLLM instrument and new sync/async tool-calling sample apps.
Documentation
- Added README with install/setup and content-redaction/privacy guidance.
Tests
- Added/expanded coverage for provider prefix resolution, streaming completion lifecycle, tool definitions/tool calls, metrics emission, content tracing gating, and custom-provider behavior.

CLAassistant · 2026-06-28T10:34:23Z

All committers have signed the CLA.

coderabbitai · 2026-06-28T10:34:31Z

📝 Walkthrough

Walkthrough

Adds a new LiteLLM OpenTelemetry instrumentation package with span, event, and metric handling, plus SDK integration, tests, and sample tool-calling apps.

Changes

LiteLLM OpenTelemetry Instrumentation Package

Layer / File(s)	Summary
Package scaffold and config `packages/opentelemetry-instrumentation-litellm/README.md`, `.../config.py`, `.../utils.py`, `.../event_models.py`, `.../version.py`, `packages/opentelemetry-instrumentation-litellm/project.json`, `.../pyproject.toml`, `.../pytest.ini`	Adds package metadata, defaults, utility toggles, event payload models, version, build/test config, dependency metadata, and README usage/privacy text.
Event emission helpers `packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/event_emitter.py`	Emits LiteLLM message and choice log records with role mapping and prompt-suppression redaction.
Span attributes, serialization, and metrics `packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py`	Resolves providers, sets request/response attributes, serializes messages and tool calls, normalizes finish reasons, emits events or attributes, and builds metric instruments.
Streaming wrappers and LiteLLMInstrumentor `packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py`	Accumulates streaming chunks, finalizes spans once, and registers LiteLLM wrappers through `LiteLLMInstrumentor`.
SDK wiring for LiteLLM `packages/traceloop-sdk/traceloop/sdk/instruments.py`, `.../tracing/tracing.py`, `packages/traceloop-sdk/pyproject.toml`	Adds LiteLLM to SDK instrument selection, initializes the instrumentor, unfreezes span attributes during postprocessing, and wires the dependency into the SDK package.
Test coverage `packages/opentelemetry-instrumentation-litellm/tests/*`	Adds fixture setup and tests for completions, embeddings, streaming, events, provider resolution, nesting, suppression, and metrics.
Sample applications `packages/sample-app/sample_app/litellm_completion.py`, `.../litellm_async_streaming.py`	Adds synchronous and async LiteLLM agent examples with tool calling and Traceloop workflows.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

traceloop/openllmetry#4133: Threads the LiteLLM use_attributes/use_legacy_attributes flag through SDK initialization, which directly connects to the new LiteLLM instrumentor behavior here.

Suggested reviewers

doronkopit5
max-deygin-traceloop
netanel-tl

Poem

🐇 I hopped through spans from dusk till dawn,
Litellm traces now prance along.
Tools and streams in tidy rows,
Metrics bloom where the data flows.
A whiskered wink, a happy beat—
This tracer patch is neat!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.46% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding LiteLLM package instrumentation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch gz/litellm_insturments

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.23.0)

packages/opentelemetry-instrumentation-litellm/tests/test_completion.py

┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

[00.12][ERROR]: unable to find a config; path .coderabbit-opengrep-fallback.yml does not exist

packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py

┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

[00.14][ERROR]: unable to find a config; path .coderabbit-opengrep-fallback.yml does not exist

packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/event_models.py

┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

[00.26][ERROR]: unable to find a config; path .coderabbit-opengrep-fallback.yml does not exist

2 others

_{Comment @coderabbitai help to get the list of available commands.}

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

max-deygin-traceloop

Semconv + Finish Reason Compliance Review

Overall this is a well-structured instrumentation. The parts-based message schema, provider normalization, streaming accumulation, and nesting suppression are all solid. Below are findings organized by severity.

P1 -- Embedding input uses deprecated indexed `gen_ai.prompt.{i}.*` pattern

File: __init__.py, _set_prompts() (embedding branch)

for index, prompt in enumerate(embedding_input or []):
    _set_span_attribute(
        span, f"{GenAIAttributes.GEN_AI_PROMPT}.{index}.role", "user"
    )
    _set_span_attribute(
        span, f"{GenAIAttributes.GEN_AI_PROMPT}.{index}.content", prompt
    )

GEN_AI_PROMPT is explicitly deprecated upstream (Deprecated: Removed, no replacement at this time). The OpenAI instrumentation in this same repo already uses gen_ai.input.messages JSON for embeddings:

# opentelemetry-instrumentation-openai/shared/embeddings_wrappers.py
messages = [
    {"role": "user", "parts": [{"type": "text", "content": p}]}
    for p in prompts
]
_set_span_attribute(span, GenAIAttributes.GEN_AI_INPUT_MESSAGES, json.dumps(messages))

Fix: Replace the indexed pattern with gen_ai.input.messages JSON, matching the OpenAI embeddings instrumentation. Update test_embeddings.py to assert the JSON structure.

P2 -- Events path uses `"unknown"` fallback instead of `""`

File: __init__.py, _emit_choice_events()

finish_reason=_map_finish_reason(choice.get("finish_reason")) or "unknown",

When _map_finish_reason returns "" (for None input), the or "unknown" kicks in. This diverges from the repo-wide convention of "" for unavailable finish reasons (used by OpenAI, Anthropic, Google GenAI, and OpenAI Agents instrumentors in this repo). The ChoiceEvent dataclass also defaults to "unknown".

The attribute path (_set_output_messages) correctly uses "" -- the event path should match.

Fix: Change or "unknown" to just pass through the "", and update ChoiceEvent.finish_reason default from "unknown" to "".

P2 -- `gen_ai.tool.definitions` not captured

LiteLLM's completion() accepts a tools kwarg. The instrumentation doesn't capture tool definitions in gen_ai.tool.definitions (opt-in, gated by should_send_prompts()). The OpenAI instrumentation sets this. Not a blocker but worth adding for parity -- tool-calling is a key LiteLLM use-case.

P2 -- Streaming tests don't verify finish_reason

test_streaming_completion and test_astreaming_completion verify output message content but don't assert:

output[0]["finish_reason"] (per-message)
attrs[GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS] (top-level span attr)

The non-streaming test_completion does verify both. Add the same assertions to streaming tests to catch any streaming-specific finish_reason accumulation regressions.

P3 -- Minor: key order inconsistency in `_map_content_parts`

# str branch -- content first
return [{"content": content, "type": "text"}]

# dict branch in _map_content_block -- type first
return {"type": "text", "content": block.get("text", "")}

Semantically identical but inconsistent. Prefer {"type": "text", "content": ...} everywhere (type-first, matching the OTel schema examples).

Finish Reason Checklist

Check	Status
`_map_finish_reason(None)` returns `""` (per-message fallback)	Pass
`_map_finish_reason("tool_calls")` returns `"tool_call"` (plural to singular)	Pass
`_map_finish_reason("function_call")` returns `"tool_call"` (legacy)	Pass
Per-message `finish_reason` always present in output JSON (required by schema)	Pass
Per-message `finish_reason` is always a string, never null	Pass
Top-level `gen_ai.response.finish_reasons` omitted when no meaningful reasons	Pass
Top-level attr NOT gated by `should_send_prompts()`	Pass
Top-level attr deduplicates while preserving order	Pass
Streaming accumulates finish_reason correctly	Pass
All code paths (sync, async, streaming sync, streaming async, error) covered	Pass

Semconv Checklist

Check	Status
`gen_ai.operation.name` set at span creation	Pass (`chat` / `embeddings`)
`gen_ai.provider.name` set (not deprecated `gen_ai.system`)	Pass
Provider aliases normalized to OTel well-known values	Pass
`gen_ai.input.messages` uses parts-based JSON (chat path)	Pass
`gen_ai.output.messages` uses parts-based JSON with `finish_reason`	Pass
Text parts use `{"type": "text", "content": ...}` (not `"text"` key)	Pass
Tool calls use `{"type": "tool_call", ...}` with parsed `arguments`	Pass
Tool responses use `{"type": "tool_call_response", ...}`	Pass
Image content mapped to `BlobPart`/`UriPart`	Pass
`gen_ai.input.messages` for embeddings (not deprecated indexed)	Fail (P1 above)
No indexed `gen_ai.prompt.{i}.` / `gen_ai.completion.{i}.` emitted alongside JSON	N/A (only embedding path uses indexed)
semconv-ai version range `>=0.5.1,<0.6.0` compatible with other packages	Pass

coderabbitai

Actionable comments posted: 13

🧹 Nitpick comments (1)

packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py (1)
826-827: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Preserve the LiteLLM init traceback.

Line 827 only logs str(e), which hides whether the failure came from import, constructor wiring, or patching. logging.exception(...) or exc_info=True would make broken instrumentation much easier to debug.
Proposed fix
-    except Exception as e:
-        logging.error(f"Error initializing LiteLLM instrumentor: {e}")
+    except Exception:
+        logging.exception("Error initializing LiteLLM instrumentor")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py` around lines 826 -
827, The LiteLLM instrumentor initialization error handling in tracing.py only
logs the exception message, which drops the traceback and makes the failure
source unclear. Update the exception handler around the LiteLLM initialization
path in the tracing module to log with traceback information using the existing
logging call site, so failures during import, constructor setup, or patching are
all visible when debugging.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py`:
- Around line 831-834: The exception counter in the Litellm instrumentation is
mislabeled with a time unit even though it records occurrences. Update the
metric declaration in the counter setup near the exception_counter creation to
use a count-like unit instead of unit="time", keeping the existing
Meters.LLM_COMPLETIONS_EXCEPTIONS counter and its description intact.
- Around line 748-754: The streaming error path in _finish_error currently
ignores the provider already resolved by _accumulate_chunk and re-derives it
from kwargs, which can lose the real provider for exception telemetry. Update
_finish_error in the streaming helper to preserve and pass through the cached
resolved provider state from the chunk accumulator when calling
_handle_exception, so failures after the first chunk report under the correct
provider instead of falling back to generic litellm.
- Around line 645-657: The duration timing in the LiteLLM instrumentation
currently uses a wall-clock source, which can produce skewed or negative values
when the system clock changes. Update the timing in the wrapped call path and
the streaming/direct histogram paths to use a monotonic clock instead of
time.time(), keeping the existing start/end measurement flow in __init__.py and
the related span/metrics handling intact.
- Around line 882-883: The _uninstrument() method in the LiteLLM instrumentor
still assumes litellm is installed, which can raise ModuleNotFoundError during
cleanup even though _instrument() treats it as optional. Update _uninstrument()
to guard the import of litellm and return early when the package is missing,
keeping the cleanup path safe in environments without LiteLLM.

In
`@packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/event_models.py`:
- Around line 5-7: The _FunctionToolCall TypedDict does not match the runtime
tool-call payload shape used by ToolCall and downstream consumers. Update the
typed contract in event_models.py so the nested function data is represented
with a name field matching tool_call["function"]["name"], and keep arguments as
the raw emitted value instead of a dict-only shape. Make sure any related
ToolCall typing or helpers in this module stay consistent with the actual
emitter schema.
- Around line 18-23: `CompletionMessage.role` is currently annotated like it has
a default, but `TypedDict` does not enforce or supply runtime defaults. Update
`CompletionMessage` in `event_models.py` to make `role` a required field, and
then verify the Litellm emitter/consumer code that reads `event.message["role"]`
is aligned with that contract. If any call sites can still omit the field,
adjust them to set `role` explicitly or switch those reads to a safe
`get("role")` access.

In
`@packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/utils.py`:
- Around line 40-41: The dont_throw path in the utils helper should remain
non-throwing even if Config.exception_logger itself fails. Update the exception
handling around the Config.exception_logger call so any error from that callback
is caught in its own try/except and only logged, while preserving the original
swallowed behavior in the surrounding wrapper. Use the dont_throw logic and
Config.exception_logger reference to locate the fix.

In `@packages/opentelemetry-instrumentation-litellm/README.md`:
- Around line 3-5: Add alt text to the PyPI badge image in the README by
updating the inline badge markup so the img element includes a descriptive alt
attribute; locate the badge link near the top of the README and keep the
existing badge URL while adding accessible text that identifies the
package/version badge for screen readers.

In `@packages/opentelemetry-instrumentation-litellm/tests/conftest.py`:
- Around line 56-67: The fixture teardown in instrument_legacy is not protected
against failures during LiteLLMInstrumentor.instrument, which can leave partial
patches behind and affect later tests. Update instrument_legacy to follow the
same try/finally pattern used by instrument_with_content: create the
LiteLLMInstrumentor, run instrument(...) in the protected block, yield the
instrumentor, and ensure instrumentor.uninstrument() always runs in the finally
path even if instrumentation raises.

In `@packages/opentelemetry-instrumentation-litellm/tests/test_nesting.py`:
- Around line 66-87: The nesting test only verifies the top-level LiteLLM span
because mock_response="hello" may bypass the OpenAI SDK entirely. Update
test_both_openai_and_litellm_instrumented to drive a stubbed provider path that
actually reaches OpenAIInstrumentor, and then assert the downstream OpenAI
boundary behavior as well as the litellm.chat span count so the test proves both
instrumentors interact correctly without double-counting.

In `@packages/sample-app/sample_app/litellm_async_streaming.py`:
- Around line 98-102: The get_current_weather tool in litellm_async_streaming.py
currently returns the fake DB temperature unchanged even when unit is
"fahrenheit". Update get_current_weather to detect Fahrenheit requests and
convert the Celsius value before building the json.dumps response, while keeping
the existing behavior for Celsius and preserving the same function signature and
output shape.

In `@packages/sample-app/sample_app/litellm_completion.py`:
- Around line 117-121: The issue is that issue_refund always returns a
successful refund for any model-supplied order_id and amount. Update
issue_refund to validate the order against known refund-eligible records first,
including that the order exists, was delivered/paid, and the requested amount is
within the refundable limit; only then return the refunded response. If the
order is unknown or ineligible, have issue_refund return a non-success result
instead of confirming the refund.

In `@packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py`:
- Around line 151-158: The span post-processing in tracing.py can prevent export
when span_postprocess_callback() raises because original_on_end() is skipped.
Update the on_end wrapper to catch and log callback failures in the
span_postprocess_callback path, always restore attributes._immutable in the
finally block, and then unconditionally call original_on_end(span) so the real
processor still receives every span.

---

Nitpick comments:
In `@packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py`:
- Around line 826-827: The LiteLLM instrumentor initialization error handling in
tracing.py only logs the exception message, which drops the traceback and makes
the failure source unclear. Update the exception handler around the LiteLLM
initialization path in the tracing module to log with traceback information
using the existing logging call site, so failures during import, constructor
setup, or patching are all visible when debugging.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4ac35d93-7407-45c9-b4b0-0be118e912aa

📥 Commits

Reviewing files that changed from the base of the PR and between fb292d0 and 839b787.

⛔ Files ignored due to path filters (3)

packages/opentelemetry-instrumentation-litellm/uv.lock is excluded by !**/*.lock
packages/sample-app/uv.lock is excluded by !**/*.lock
packages/traceloop-sdk/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (23)

packages/opentelemetry-instrumentation-litellm/README.md
packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py
packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/config.py
packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/event_emitter.py
packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/event_models.py
packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/utils.py
packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/version.py
packages/opentelemetry-instrumentation-litellm/project.json
packages/opentelemetry-instrumentation-litellm/pyproject.toml
packages/opentelemetry-instrumentation-litellm/pytest.ini
packages/opentelemetry-instrumentation-litellm/tests/conftest.py
packages/opentelemetry-instrumentation-litellm/tests/test_completion.py
packages/opentelemetry-instrumentation-litellm/tests/test_custom_provider.py
packages/opentelemetry-instrumentation-litellm/tests/test_embeddings.py
packages/opentelemetry-instrumentation-litellm/tests/test_events.py
packages/opentelemetry-instrumentation-litellm/tests/test_nesting.py
packages/opentelemetry-instrumentation-litellm/tests/test_provider_resolution.py
packages/opentelemetry-instrumentation-litellm/tests/test_streaming.py
packages/sample-app/sample_app/litellm_async_streaming.py
packages/sample-app/sample_app/litellm_completion.py
packages/traceloop-sdk/pyproject.toml
packages/traceloop-sdk/traceloop/sdk/instruments.py
packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py

coderabbitai · 2026-06-28T10:49:48Z

+        start = time.time()
+        token = context_api.attach(
+            context_api.set_value(SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY, True)
+        )
+        try:
+            response = wrapped(*args, **kwargs)
+        except Exception as error:
+            _handle_exception(span, metrics, error, model, kwargs)
+            raise
+        finally:
+            context_api.detach(token)
+
+        duration = time.time() - start


🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Use a monotonic clock for duration measurement.

time.time() can jump on wall-clock adjustments, so both the direct and streaming histograms can record skewed or even negative durations. Capture elapsed time with a monotonic source instead.

Proposed fix

- start = time.time() + start = time.perf_counter() @@ - duration = time.time() - start + duration = time.perf_counter() - start @@ - start = time.time() + start = time.perf_counter() @@ - duration = time.time() - start + duration = time.perf_counter() - start @@ - time.time() - start, accumulated_provider=accumulated.get("provider"), + time.perf_counter() - start, accumulated_provider=accumulated.get("provider"),

Also applies to: 681-693, 706-710

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py` around lines 645 - 657, The duration timing in the LiteLLM instrumentation currently uses a wall-clock source, which can produce skewed or negative values when the system clock changes. Update the timing in the wrapped call path and the streaming/direct histogram paths to use a monotonic clock instead of time.time(), keeping the existing start/end measurement flow in __init__.py and the related span/metrics handling intact.

coderabbitai · 2026-06-28T10:49:48Z

+    def _finish_error(self, error):
+        if self._self_done:
+            return
+        self._self_done = True
+        _handle_exception(
+            self._self_span, self._self_metrics, error, self._self_model, self._self_kwargs
+        )


🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Preserve the resolved provider on streaming error paths.

_accumulate_chunk() already captures the provider once a chunk exposes it, but _finish_error() ignores that state and falls back to kwargs-only resolution. A stream that fails after the first chunk can therefore emit exception telemetry under generic litellm instead of the real provider.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py` around lines 748 - 754, The streaming error path in _finish_error currently ignores the provider already resolved by _accumulate_chunk and re-derives it from kwargs, which can lose the real provider for exception telemetry. Update _finish_error in the streaming helper to preserve and pass through the cached resolved provider state from the chunk accumulator when calling _handle_exception, so failures after the first chunk report under the correct provider instead of falling back to generic litellm.

coderabbitai · 2026-06-28T10:49:48Z

+        "exception_counter": meter.create_counter(
+            name=Meters.LLM_COMPLETIONS_EXCEPTIONS,
+            unit="time",
+            description="Number of exceptions occurred during chat completions",


🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Fix the exception counter unit.

This counter tracks occurrences, but it is declared with unit="time". That mislabels the metric in downstream backends and dashboards. Use a count-like unit instead.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py` around lines 831 - 834, The exception counter in the Litellm instrumentation is mislabeled with a time unit even though it records occurrences. Update the metric declaration in the counter setup near the exception_counter creation to use a count-like unit instead of unit="time", keeping the existing Meters.LLM_COMPLETIONS_EXCEPTIONS counter and its description intact.

coderabbitai · 2026-06-28T10:49:48Z

+    def _uninstrument(self, **kwargs):
+        import litellm


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Make _uninstrument() tolerate a missing optional LiteLLM install.

_instrument() already treats litellm as optional, but this unconditional import will raise ModuleNotFoundError during cleanup in the same environment. Guard the import and return early.

Proposed fix

def _uninstrument(self, **kwargs): - import litellm + try: + import litellm + except ModuleNotFoundError: + return

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _uninstrument(self, **kwargs):

import litellm

def _uninstrument(self, **kwargs):

try:

import litellm

except ModuleNotFoundError:

return

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py` around lines 882 - 883, The _uninstrument() method in the LiteLLM instrumentor still assumes litellm is installed, which can raise ModuleNotFoundError during cleanup even though _instrument() treats it as optional. Update _uninstrument() to guard the import of litellm and return early when the package is missing, keeping the cleanup path safe in environments without LiteLLM.

coderabbitai · 2026-06-28T10:49:48Z

+class _FunctionToolCall(TypedDict):
+    function_name: str
+    arguments: Optional[dict[str, Any]]


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Align ToolCall.function with the actual payload shape.

Downstream code and tests use tool_call["function"]["name"] and preserve arguments as a raw value, but this TypedDict advertises function_name plus dict arguments. That makes the public type contract diverge from the runtime schema and will mislead any typed caller integrating with the emitter.

Proposed fix

class _FunctionToolCall(TypedDict): - function_name: str - arguments: Optional[dict[str, Any]] + name: str + arguments: Optional[Any]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class _FunctionToolCall(TypedDict):

function_name: str

arguments: Optional[dict[str, Any]]

class _FunctionToolCall(TypedDict):

name: str

arguments: Optional[Any]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/event_models.py` around lines 5 - 7, The _FunctionToolCall TypedDict does not match the runtime tool-call payload shape used by ToolCall and downstream consumers. Update the typed contract in event_models.py so the nested function data is represented with a name field matching tool_call["function"]["name"], and keep arguments as the raw emitted value instead of a dict-only shape. Make sure any related ToolCall typing or helpers in this module stay consistent with the actual emitter schema.

coderabbitai · 2026-06-28T10:49:49Z

+@pytest.fixture(scope="function")
+def instrument_legacy(tracer_provider, meter_provider):
+    instrumentor = LiteLLMInstrumentor()
+    instrumentor.instrument(
+        tracer_provider=tracer_provider,
+        meter_provider=meter_provider,
+    )
+
+    yield instrumentor
+
+    instrumentor.uninstrument()
+


🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Wrap instrument_legacy teardown in try/finally.

If instrumentor.instrument(...) fails after partially patching LiteLLM, this fixture skips uninstrument() and can leak wrappers into later tests. Mirror the instrument_with_content pattern here.

Suggested fix

`@pytest.fixture`(scope="function") def instrument_legacy(tracer_provider, meter_provider): instrumentor = LiteLLMInstrumentor() - instrumentor.instrument( - tracer_provider=tracer_provider, - meter_provider=meter_provider, - ) - - yield instrumentor - - instrumentor.uninstrument() + try: + instrumentor.instrument( + tracer_provider=tracer_provider, + meter_provider=meter_provider, + ) + yield instrumentor + finally: + instrumentor.uninstrument()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@pytest.fixture(scope="function")

def instrument_legacy(tracer_provider, meter_provider):

instrumentor = LiteLLMInstrumentor()

instrumentor.instrument(

tracer_provider=tracer_provider,

meter_provider=meter_provider,

)

yield instrumentor

instrumentor.uninstrument()

`@pytest.fixture`(scope="function")

def instrument_legacy(tracer_provider, meter_provider):

instrumentor = LiteLLMInstrumentor()

try:

instrumentor.instrument(

tracer_provider=tracer_provider,

meter_provider=meter_provider,

)

yield instrumentor

finally:

instrumentor.uninstrument()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-litellm/tests/conftest.py` around lines 56 - 67, The fixture teardown in instrument_legacy is not protected against failures during LiteLLMInstrumentor.instrument, which can leave partial patches behind and affect later tests. Update instrument_legacy to follow the same try/finally pattern used by instrument_with_content: create the LiteLLMInstrumentor, run instrument(...) in the protected block, yield the instrumentor, and ensure instrumentor.uninstrument() always runs in the finally path even if instrumentation raises.

coderabbitai · 2026-06-28T10:49:49Z

+def test_both_openai_and_litellm_instrumented(span_exporter, tracer_provider, meter_provider):
+    """Instrumenting both must not raise and must not double-count."""
+    pytest.importorskip("opentelemetry.instrumentation.openai")
+    from opentelemetry.instrumentation.litellm import LiteLLMInstrumentor
+    from opentelemetry.instrumentation.openai import OpenAIInstrumentor
+
+    openai_instrumentor = OpenAIInstrumentor()
+    litellm_instrumentor = LiteLLMInstrumentor()
+    try:
+        openai_instrumentor.instrument(
+            tracer_provider=tracer_provider, meter_provider=meter_provider
+        )
+        litellm_instrumentor.instrument(
+            tracer_provider=tracer_provider, meter_provider=meter_provider
+        )
+        litellm.completion(
+            model="gpt-3.5-turbo",
+            messages=MESSAGES,
+            mock_response="hello",
+        )
+        spans = span_exporter.get_finished_spans()
+        assert [span.name for span in spans] == ["litellm.chat"]


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Exercise the OpenAI boundary here
mock_response="hello" can short-circuit before the OpenAI SDK is invoked, so this only proves the litellm.chat span. Use a stubbed provider call that actually reaches OpenAIInstrumentor, or assert suppression at that downstream boundary too.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-litellm/tests/test_nesting.py` around lines 66 - 87, The nesting test only verifies the top-level LiteLLM span because mock_response="hello" may bypass the OpenAI SDK entirely. Update test_both_openai_and_litellm_instrumented to drive a stubbed provider path that actually reaches OpenAIInstrumentor, and then assert the downstream OpenAI boundary behavior as well as the litellm.chat span count so the test proves both instrumentors interact correctly without double-counting.

coderabbitai · 2026-06-28T10:49:49Z

+def get_current_weather(location: str, unit: str = "celsius") -> str:
+    """Pretend tool implementation — a real one would call a weather API."""
+    fake_db = {"paris": 18, "london": 14, "new york": 22}
+    temp = fake_db.get(location.lower(), 20)
+    return json.dumps({"location": location, "temperature": temp, "unit": unit})


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Convert temperatures when unit="fahrenheit".

The tool schema allows Fahrenheit, but the fake DB values are returned unchanged, so 18°C can be reported as 18°F.

Proposed fix

def get_current_weather(location: str, unit: str = "celsius") -> str: """Pretend tool implementation — a real one would call a weather API.""" fake_db = {"paris": 18, "london": 14, "new york": 22} temp = fake_db.get(location.lower(), 20) + if unit == "fahrenheit": + temp = temp * 9 / 5 + 32 + elif unit != "celsius": + return json.dumps({"error": f"unsupported unit {unit}"}) return json.dumps({"location": location, "temperature": temp, "unit": unit})

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_current_weather(location: str, unit: str = "celsius") -> str:

"""Pretend tool implementation — a real one would call a weather API."""

fake_db = {"paris": 18, "london": 14, "new york": 22}

temp = fake_db.get(location.lower(), 20)

return json.dumps({"location": location, "temperature": temp, "unit": unit})

def get_current_weather(location: str, unit: str = "celsius") -> str:

"""Pretend tool implementation — a real one would call a weather API."""

fake_db = {"paris": 18, "london": 14, "new york": 22}

temp = fake_db.get(location.lower(), 20)

if unit == "fahrenheit":

temp = temp * 9 / 5 + 32

elif unit != "celsius":

return json.dumps({"error": f"unsupported unit {unit}"})

return json.dumps({"location": location, "temperature": temp, "unit": unit})

🧰 Tools

🪛 ast-grep (0.44.0)

[info] 101-101: use jsonify instead of json.dumps for JSON output
Context: json.dumps({"location": location, "temperature": temp, "unit": unit})
Note: [CWE-116] Improper Encoding or Escaping of Output.

(use-jsonify)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/sample-app/sample_app/litellm_async_streaming.py` around lines 98 - 102, The get_current_weather tool in litellm_async_streaming.py currently returns the fake DB temperature unchanged even when unit is "fahrenheit". Update get_current_weather to detect Fahrenheit requests and convert the Celsius value before building the json.dumps response, while keeping the existing behavior for Celsius and preserving the same function signature and output shape.

coderabbitai · 2026-06-28T10:49:49Z

+def issue_refund(order_id: str, amount: float) -> str:
+    # A real implementation would call a payments API; here we just confirm.
+    return json.dumps(
+        {"order_id": order_id, "amount": amount, "refund_id": f"RF-{order_id}", "status": "refunded"}
+    )


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Validate refund eligibility before confirming.

issue_refund currently confirms any model-supplied order_id/amount, even for unknown or undelivered orders. That can make the sample report an invalid refund as successful.

Proposed fix

def issue_refund(order_id: str, amount: float) -> str: # A real implementation would call a payments API; here we just confirm. + order = _ORDERS.get(order_id) + if order is None: + return json.dumps({"error": f"no order {order_id}"}) + if order["status"] != "delivered": + return json.dumps({"error": f"order {order_id} is not eligible for refund"}) + if amount <= 0 or amount > order["total"]: + return json.dumps({"error": f"invalid refund amount {amount}"}) return json.dumps( {"order_id": order_id, "amount": amount, "refund_id": f"RF-{order_id}", "status": "refunded"} )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def issue_refund(order_id: str, amount: float) -> str:

# A real implementation would call a payments API; here we just confirm.

return json.dumps(

{"order_id": order_id, "amount": amount, "refund_id": f"RF-{order_id}", "status": "refunded"}

)

def issue_refund(order_id: str, amount: float) -> str:

# A real implementation would call a payments API; here we just confirm.

order = _ORDERS.get(order_id)

if order is None:

return json.dumps({"error": f"no order {order_id}"})

if order["status"] != "delivered":

return json.dumps({"error": f"order {order_id} is not eligible for refund"})

if amount <= 0 or amount > order["total"]:

return json.dumps({"error": f"invalid refund amount {amount}"})

return json.dumps(

{"order_id": order_id, "amount": amount, "refund_id": f"RF-{order_id}", "status": "refunded"}

)

🧰 Tools

🪛 ast-grep (0.44.0)

[info] 118-120: use jsonify instead of json.dumps for JSON output
Context: json.dumps(
{"order_id": order_id, "amount": amount, "refund_id": f"RF-{order_id}", "status": "refunded"}
)
Note: [CWE-116] Improper Encoding or Escaping of Output.

(use-jsonify)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/sample-app/sample_app/litellm_completion.py` around lines 117 - 121, The issue is that issue_refund always returns a successful refund for any model-supplied order_id and amount. Update issue_refund to validate the order against known refund-eligible records first, including that the order exists, was delivered/paid, and the requested amount is within the refundable limit; only then return the refunded response. If the order is unknown or ineligible, have issue_refund return a non-success result instead of confirming the refund.

coderabbitai · 2026-06-28T10:49:49Z

+                        try:
+                            # Call the custom on_end first
+                            span_postprocess_callback(span)
+                        finally:
+                            if was_immutable:
+                                attributes._immutable = True
                        # Then call the original to ensure normal processing
                        original_on_end(span)


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Always forward spans to the real processor.

If span_postprocess_callback() raises on Line 153, original_on_end() on Line 158 never runs, so that span is never exported. Catch/log callback failures after restoring immutability, then always invoke the original on_end.

Proposed fix

def wrapped_on_end(span): # OTel freezes a span's attributes (sets `_immutable`) in # Span.end() before on_end runs, so the postprocess callback # can't modify them. Temporarily unfreeze around the callback # so it can redact/mutate attributes, then restore the flag. attributes = getattr(span, "_attributes", None) was_immutable = getattr(attributes, "_immutable", False) if was_immutable: attributes._immutable = False try: - # Call the custom on_end first - span_postprocess_callback(span) + span_postprocess_callback(span) + except Exception: + logging.exception("span_postprocess_callback failed") finally: if was_immutable: attributes._immutable = True # Then call the original to ensure normal processing original_on_end(span)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try:

# Call the custom on_end first

span_postprocess_callback(span)

finally:

if was_immutable:

attributes._immutable = True

# Then call the original to ensure normal processing

original_on_end(span)

try:

span_postprocess_callback(span)

except Exception:

logging.exception("span_postprocess_callback failed")

finally:

if was_immutable:

attributes._immutable = True

# Then call the original to ensure normal processing

original_on_end(span)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py` around lines 151 - 158, The span post-processing in tracing.py can prevent export when span_postprocess_callback() raises because original_on_end() is skipped. Update the on_end wrapper to catch and log callback failures in the span_postprocess_callback path, always restore attributes._immutable in the finally block, and then unconditionally call original_on_end(span) so the real processor still receives every span.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opentelemetry-instrumentation-litellm/tests/test_embeddings.py`:
- Around line 42-44: The list-input embeddings test currently reads attributes
from spans[0] without verifying the emitted span contract, so it can miss extra
wrapper or internal spans. Update the test around
span_exporter.get_finished_spans() in the list-input case to assert there is
exactly one span and that its name is litellm.embeddings before accessing attrs
or GenAIAttributes.GEN_AI_INPUT_MESSAGES, mirroring the existing test_embedding
assertion.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e96e6aa2-0e1d-4ce9-84cd-5d903d2c53cf

📥 Commits

Reviewing files that changed from the base of the PR and between 839b787 and fdbe72a.

📒 Files selected for processing (5)

packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py
packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/event_models.py
packages/opentelemetry-instrumentation-litellm/tests/test_completion.py
packages/opentelemetry-instrumentation-litellm/tests/test_embeddings.py
packages/opentelemetry-instrumentation-litellm/tests/test_streaming.py

🚧 Files skipped from review as they are similar to previous changes (3)

packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/event_models.py
packages/opentelemetry-instrumentation-litellm/tests/test_streaming.py
packages/opentelemetry-instrumentation-litellm/tests/test_completion.py

coderabbitai · 2026-06-28T10:53:49Z

+    spans = span_exporter.get_finished_spans()
+    attrs = spans[0].attributes
+    messages = json.loads(attrs[GenAIAttributes.GEN_AI_INPUT_MESSAGES])


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Assert the emitted span contract in the list-input test.

This test indexes spans[0] but never checks that the call still emits exactly one litellm.embeddings span. If list-input handling starts producing an extra wrapper/internal span, this case will still pass while missing the regression. Mirror the test_embedding span-count/name assertion here before reading attributes.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-litellm/tests/test_embeddings.py` around lines 42 - 44, The list-input embeddings test currently reads attributes from spans[0] without verifying the emitted span contract, so it can miss extra wrapper or internal spans. Update the test around span_exporter.get_finished_spans() in the list-input case to assert there is exactly one span and that its name is litellm.embeddings before accessing attrs or GenAIAttributes.GEN_AI_INPUT_MESSAGES, mirroring the existing test_embedding assertion.

max-deygin-traceloop

All five findings from the previous review are addressed in fdbe72a:

P1 fixed: Embedding input now uses gen_ai.input.messages JSON (matching OpenAI instrumentation). New test_embedding_list_input covers multi-input case.
P2 fixed: Events path finish_reason fallback changed from "unknown" to "", ChoiceEvent default updated to match.
P2 fixed: gen_ai.tool.definitions now captured via _build_tool_def / _set_tool_definitions. Implementation matches the OpenAI instrumentation's flattened format. Test added.
P2 fixed: Streaming tests now assert both per-message finish_reason and top-level gen_ai.response.finish_reasons.
P3 fixed: Key order in _map_content_parts is now type-first everywhere.

All 30 tests pass locally. Semconv and finish reason compliance verified.

feat(litellm): align instrumentation with OTel GenAI semconv

839b787

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

galzilber force-pushed the gz/litellm_insturments branch from cdbc173 to 839b787 Compare June 28, 2026 10:38

max-deygin-traceloop requested changes Jun 28, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 28, 2026

View reviewed changes

fx

fdbe72a

coderabbitai Bot reviewed Jun 28, 2026

View reviewed changes

max-deygin-traceloop approved these changes Jun 28, 2026

View reviewed changes

galzilber merged commit 1dc1e89 into main Jun 28, 2026
12 checks passed

Uh oh!

Conversation

galzilber commented Jun 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

CLAassistant commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

max-deygin-traceloop left a comment

Choose a reason for hiding this comment

Semconv + Finish Reason Compliance Review

P1 -- Embedding input uses deprecated indexed gen_ai.prompt.{i}.* pattern

P2 -- Events path uses "unknown" fallback instead of ""

P2 -- gen_ai.tool.definitions not captured

P2 -- Streaming tests don't verify finish_reason

P3 -- Minor: key order inconsistency in _map_content_parts

Finish Reason Checklist

Semconv Checklist

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

max-deygin-traceloop left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

galzilber commented Jun 28, 2026 •

edited by coderabbitai Bot

Loading

CLAassistant commented Jun 28, 2026 •

edited

Loading

coderabbitai Bot commented Jun 28, 2026 •

edited

Loading

P1 -- Embedding input uses deprecated indexed `gen_ai.prompt.{i}.*` pattern

P2 -- Events path uses `"unknown"` fallback instead of `""`

P2 -- `gen_ai.tool.definitions` not captured

P3 -- Minor: key order inconsistency in `_map_content_parts`