Skip to content

feat(instrumentation): Add litellm package instrumentation#4322

Merged
galzilber merged 2 commits into
mainfrom
gz/litellm_insturments
Jun 28, 2026
Merged

feat(instrumentation): Add litellm package instrumentation#4322
galzilber merged 2 commits into
mainfrom
gz/litellm_insturments

Conversation

@galzilber

@galzilber galzilber commented Jun 28, 2026

Copy link
Copy Markdown
Contributor
  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Summary by CodeRabbit

  • New Features
    • Added LiteLLM tracing across chat/completions (sync/async), streaming, and embeddings with spans, metrics, and structured event logging.
    • Added tool-calling serialization for inputs/outputs and consistent provider resolution/finish-reason mapping.
    • Added Traceloop SDK support for the LiteLLM instrument and new sync/async tool-calling sample apps.
  • Documentation
    • Added README with install/setup and content-redaction/privacy guidance.
  • Tests
    • Added/expanded coverage for provider prefix resolution, streaming completion lifecycle, tool definitions/tool calls, metrics emission, content tracing gating, and custom-provider behavior.

@CLAassistant

CLAassistant commented Jun 28, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a new LiteLLM OpenTelemetry instrumentation package with span, event, and metric handling, plus SDK integration, tests, and sample tool-calling apps.

Changes

LiteLLM OpenTelemetry Instrumentation Package

Layer / File(s) Summary
Package scaffold and config
packages/opentelemetry-instrumentation-litellm/README.md, .../config.py, .../utils.py, .../event_models.py, .../version.py, packages/opentelemetry-instrumentation-litellm/project.json, .../pyproject.toml, .../pytest.ini
Adds package metadata, defaults, utility toggles, event payload models, version, build/test config, dependency metadata, and README usage/privacy text.
Event emission helpers
packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/event_emitter.py
Emits LiteLLM message and choice log records with role mapping and prompt-suppression redaction.
Span attributes, serialization, and metrics
packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py
Resolves providers, sets request/response attributes, serializes messages and tool calls, normalizes finish reasons, emits events or attributes, and builds metric instruments.
Streaming wrappers and LiteLLMInstrumentor
packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py
Accumulates streaming chunks, finalizes spans once, and registers LiteLLM wrappers through LiteLLMInstrumentor.
SDK wiring for LiteLLM
packages/traceloop-sdk/traceloop/sdk/instruments.py, .../tracing/tracing.py, packages/traceloop-sdk/pyproject.toml
Adds LiteLLM to SDK instrument selection, initializes the instrumentor, unfreezes span attributes during postprocessing, and wires the dependency into the SDK package.
Test coverage
packages/opentelemetry-instrumentation-litellm/tests/*
Adds fixture setup and tests for completions, embeddings, streaming, events, provider resolution, nesting, suppression, and metrics.
Sample applications
packages/sample-app/sample_app/litellm_completion.py, .../litellm_async_streaming.py
Adds synchronous and async LiteLLM agent examples with tool calling and Traceloop workflows.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • traceloop/openllmetry#4133: Threads the LiteLLM use_attributes/use_legacy_attributes flag through SDK initialization, which directly connects to the new LiteLLM instrumentor behavior here.

Suggested reviewers

  • doronkopit5
  • max-deygin-traceloop
  • netanel-tl

Poem

🐇 I hopped through spans from dusk till dawn,
Litellm traces now prance along.
Tools and streams in tidy rows,
Metrics bloom where the data flows.
A whiskered wink, a happy beat—
This tracer patch is neat!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.46% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding LiteLLM package instrumentation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch gz/litellm_insturments

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.23.0)
packages/opentelemetry-instrumentation-litellm/tests/test_completion.py

┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

[00.12][ERROR]: unable to find a config; path .coderabbit-opengrep-fallback.yml does not exist

packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py

┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

[00.14][ERROR]: unable to find a config; path .coderabbit-opengrep-fallback.yml does not exist

packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/event_models.py

┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

[00.26][ERROR]: unable to find a config; path .coderabbit-opengrep-fallback.yml does not exist

  • 2 others

Comment @coderabbitai help to get the list of available commands.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@galzilber galzilber force-pushed the gz/litellm_insturments branch from cdbc173 to 839b787 Compare June 28, 2026 10:38

@max-deygin-traceloop max-deygin-traceloop left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semconv + Finish Reason Compliance Review

Overall this is a well-structured instrumentation. The parts-based message schema, provider normalization, streaming accumulation, and nesting suppression are all solid. Below are findings organized by severity.


P1 -- Embedding input uses deprecated indexed gen_ai.prompt.{i}.* pattern

File: __init__.py, _set_prompts() (embedding branch)

for index, prompt in enumerate(embedding_input or []):
    _set_span_attribute(
        span, f"{GenAIAttributes.GEN_AI_PROMPT}.{index}.role", "user"
    )
    _set_span_attribute(
        span, f"{GenAIAttributes.GEN_AI_PROMPT}.{index}.content", prompt
    )

GEN_AI_PROMPT is explicitly deprecated upstream (Deprecated: Removed, no replacement at this time). The OpenAI instrumentation in this same repo already uses gen_ai.input.messages JSON for embeddings:

# opentelemetry-instrumentation-openai/shared/embeddings_wrappers.py
messages = [
    {"role": "user", "parts": [{"type": "text", "content": p}]}
    for p in prompts
]
_set_span_attribute(span, GenAIAttributes.GEN_AI_INPUT_MESSAGES, json.dumps(messages))

Fix: Replace the indexed pattern with gen_ai.input.messages JSON, matching the OpenAI embeddings instrumentation. Update test_embeddings.py to assert the JSON structure.


P2 -- Events path uses "unknown" fallback instead of ""

File: __init__.py, _emit_choice_events()

finish_reason=_map_finish_reason(choice.get("finish_reason")) or "unknown",

When _map_finish_reason returns "" (for None input), the or "unknown" kicks in. This diverges from the repo-wide convention of "" for unavailable finish reasons (used by OpenAI, Anthropic, Google GenAI, and OpenAI Agents instrumentors in this repo). The ChoiceEvent dataclass also defaults to "unknown".

The attribute path (_set_output_messages) correctly uses "" -- the event path should match.

Fix: Change or "unknown" to just pass through the "", and update ChoiceEvent.finish_reason default from "unknown" to "".


P2 -- gen_ai.tool.definitions not captured

LiteLLM's completion() accepts a tools kwarg. The instrumentation doesn't capture tool definitions in gen_ai.tool.definitions (opt-in, gated by should_send_prompts()). The OpenAI instrumentation sets this. Not a blocker but worth adding for parity -- tool-calling is a key LiteLLM use-case.


P2 -- Streaming tests don't verify finish_reason

test_streaming_completion and test_astreaming_completion verify output message content but don't assert:

  • output[0]["finish_reason"] (per-message)
  • attrs[GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS] (top-level span attr)

The non-streaming test_completion does verify both. Add the same assertions to streaming tests to catch any streaming-specific finish_reason accumulation regressions.


P3 -- Minor: key order inconsistency in _map_content_parts

# str branch -- content first
return [{"content": content, "type": "text"}]

# dict branch in _map_content_block -- type first
return {"type": "text", "content": block.get("text", "")}

Semantically identical but inconsistent. Prefer {"type": "text", "content": ...} everywhere (type-first, matching the OTel schema examples).


Finish Reason Checklist

Check Status
_map_finish_reason(None) returns "" (per-message fallback) Pass
_map_finish_reason("tool_calls") returns "tool_call" (plural to singular) Pass
_map_finish_reason("function_call") returns "tool_call" (legacy) Pass
Per-message finish_reason always present in output JSON (required by schema) Pass
Per-message finish_reason is always a string, never null Pass
Top-level gen_ai.response.finish_reasons omitted when no meaningful reasons Pass
Top-level attr NOT gated by should_send_prompts() Pass
Top-level attr deduplicates while preserving order Pass
Streaming accumulates finish_reason correctly Pass
All code paths (sync, async, streaming sync, streaming async, error) covered Pass

Semconv Checklist

Check Status
gen_ai.operation.name set at span creation Pass (chat / embeddings)
gen_ai.provider.name set (not deprecated gen_ai.system) Pass
Provider aliases normalized to OTel well-known values Pass
gen_ai.input.messages uses parts-based JSON (chat path) Pass
gen_ai.output.messages uses parts-based JSON with finish_reason Pass
Text parts use {"type": "text", "content": ...} (not "text" key) Pass
Tool calls use {"type": "tool_call", ...} with parsed arguments Pass
Tool responses use {"type": "tool_call_response", ...} Pass
Image content mapped to BlobPart/UriPart Pass
gen_ai.input.messages for embeddings (not deprecated indexed) Fail (P1 above)
No indexed gen_ai.prompt.{i}.* / gen_ai.completion.{i}.* emitted alongside JSON N/A (only embedding path uses indexed)
semconv-ai version range >=0.5.1,<0.6.0 compatible with other packages Pass

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (1)
packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py (1)

826-827: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Preserve the LiteLLM init traceback.

Line 827 only logs str(e), which hides whether the failure came from import, constructor wiring, or patching. logging.exception(...) or exc_info=True would make broken instrumentation much easier to debug.

Proposed fix
-    except Exception as e:
-        logging.error(f"Error initializing LiteLLM instrumentor: {e}")
+    except Exception:
+        logging.exception("Error initializing LiteLLM instrumentor")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py` around lines 826 -
827, The LiteLLM instrumentor initialization error handling in tracing.py only
logs the exception message, which drops the traceback and makes the failure
source unclear. Update the exception handler around the LiteLLM initialization
path in the tracing module to log with traceback information using the existing
logging call site, so failures during import, constructor setup, or patching are
all visible when debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py`:
- Around line 831-834: The exception counter in the Litellm instrumentation is
mislabeled with a time unit even though it records occurrences. Update the
metric declaration in the counter setup near the exception_counter creation to
use a count-like unit instead of unit="time", keeping the existing
Meters.LLM_COMPLETIONS_EXCEPTIONS counter and its description intact.
- Around line 748-754: The streaming error path in _finish_error currently
ignores the provider already resolved by _accumulate_chunk and re-derives it
from kwargs, which can lose the real provider for exception telemetry. Update
_finish_error in the streaming helper to preserve and pass through the cached
resolved provider state from the chunk accumulator when calling
_handle_exception, so failures after the first chunk report under the correct
provider instead of falling back to generic litellm.
- Around line 645-657: The duration timing in the LiteLLM instrumentation
currently uses a wall-clock source, which can produce skewed or negative values
when the system clock changes. Update the timing in the wrapped call path and
the streaming/direct histogram paths to use a monotonic clock instead of
time.time(), keeping the existing start/end measurement flow in __init__.py and
the related span/metrics handling intact.
- Around line 882-883: The _uninstrument() method in the LiteLLM instrumentor
still assumes litellm is installed, which can raise ModuleNotFoundError during
cleanup even though _instrument() treats it as optional. Update _uninstrument()
to guard the import of litellm and return early when the package is missing,
keeping the cleanup path safe in environments without LiteLLM.

In
`@packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/event_models.py`:
- Around line 5-7: The _FunctionToolCall TypedDict does not match the runtime
tool-call payload shape used by ToolCall and downstream consumers. Update the
typed contract in event_models.py so the nested function data is represented
with a name field matching tool_call["function"]["name"], and keep arguments as
the raw emitted value instead of a dict-only shape. Make sure any related
ToolCall typing or helpers in this module stay consistent with the actual
emitter schema.
- Around line 18-23: `CompletionMessage.role` is currently annotated like it has
a default, but `TypedDict` does not enforce or supply runtime defaults. Update
`CompletionMessage` in `event_models.py` to make `role` a required field, and
then verify the Litellm emitter/consumer code that reads `event.message["role"]`
is aligned with that contract. If any call sites can still omit the field,
adjust them to set `role` explicitly or switch those reads to a safe
`get("role")` access.

In
`@packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/utils.py`:
- Around line 40-41: The dont_throw path in the utils helper should remain
non-throwing even if Config.exception_logger itself fails. Update the exception
handling around the Config.exception_logger call so any error from that callback
is caught in its own try/except and only logged, while preserving the original
swallowed behavior in the surrounding wrapper. Use the dont_throw logic and
Config.exception_logger reference to locate the fix.

In `@packages/opentelemetry-instrumentation-litellm/README.md`:
- Around line 3-5: Add alt text to the PyPI badge image in the README by
updating the inline badge markup so the img element includes a descriptive alt
attribute; locate the badge link near the top of the README and keep the
existing badge URL while adding accessible text that identifies the
package/version badge for screen readers.

In `@packages/opentelemetry-instrumentation-litellm/tests/conftest.py`:
- Around line 56-67: The fixture teardown in instrument_legacy is not protected
against failures during LiteLLMInstrumentor.instrument, which can leave partial
patches behind and affect later tests. Update instrument_legacy to follow the
same try/finally pattern used by instrument_with_content: create the
LiteLLMInstrumentor, run instrument(...) in the protected block, yield the
instrumentor, and ensure instrumentor.uninstrument() always runs in the finally
path even if instrumentation raises.

In `@packages/opentelemetry-instrumentation-litellm/tests/test_nesting.py`:
- Around line 66-87: The nesting test only verifies the top-level LiteLLM span
because mock_response="hello" may bypass the OpenAI SDK entirely. Update
test_both_openai_and_litellm_instrumented to drive a stubbed provider path that
actually reaches OpenAIInstrumentor, and then assert the downstream OpenAI
boundary behavior as well as the litellm.chat span count so the test proves both
instrumentors interact correctly without double-counting.

In `@packages/sample-app/sample_app/litellm_async_streaming.py`:
- Around line 98-102: The get_current_weather tool in litellm_async_streaming.py
currently returns the fake DB temperature unchanged even when unit is
"fahrenheit". Update get_current_weather to detect Fahrenheit requests and
convert the Celsius value before building the json.dumps response, while keeping
the existing behavior for Celsius and preserving the same function signature and
output shape.

In `@packages/sample-app/sample_app/litellm_completion.py`:
- Around line 117-121: The issue is that issue_refund always returns a
successful refund for any model-supplied order_id and amount. Update
issue_refund to validate the order against known refund-eligible records first,
including that the order exists, was delivered/paid, and the requested amount is
within the refundable limit; only then return the refunded response. If the
order is unknown or ineligible, have issue_refund return a non-success result
instead of confirming the refund.

In `@packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py`:
- Around line 151-158: The span post-processing in tracing.py can prevent export
when span_postprocess_callback() raises because original_on_end() is skipped.
Update the on_end wrapper to catch and log callback failures in the
span_postprocess_callback path, always restore attributes._immutable in the
finally block, and then unconditionally call original_on_end(span) so the real
processor still receives every span.

---

Nitpick comments:
In `@packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py`:
- Around line 826-827: The LiteLLM instrumentor initialization error handling in
tracing.py only logs the exception message, which drops the traceback and makes
the failure source unclear. Update the exception handler around the LiteLLM
initialization path in the tracing module to log with traceback information
using the existing logging call site, so failures during import, constructor
setup, or patching are all visible when debugging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4ac35d93-7407-45c9-b4b0-0be118e912aa

📥 Commits

Reviewing files that changed from the base of the PR and between fb292d0 and 839b787.

⛔ Files ignored due to path filters (3)
  • packages/opentelemetry-instrumentation-litellm/uv.lock is excluded by !**/*.lock
  • packages/sample-app/uv.lock is excluded by !**/*.lock
  • packages/traceloop-sdk/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (23)
  • packages/opentelemetry-instrumentation-litellm/README.md
  • packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py
  • packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/config.py
  • packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/event_emitter.py
  • packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/event_models.py
  • packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/utils.py
  • packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/version.py
  • packages/opentelemetry-instrumentation-litellm/project.json
  • packages/opentelemetry-instrumentation-litellm/pyproject.toml
  • packages/opentelemetry-instrumentation-litellm/pytest.ini
  • packages/opentelemetry-instrumentation-litellm/tests/conftest.py
  • packages/opentelemetry-instrumentation-litellm/tests/test_completion.py
  • packages/opentelemetry-instrumentation-litellm/tests/test_custom_provider.py
  • packages/opentelemetry-instrumentation-litellm/tests/test_embeddings.py
  • packages/opentelemetry-instrumentation-litellm/tests/test_events.py
  • packages/opentelemetry-instrumentation-litellm/tests/test_nesting.py
  • packages/opentelemetry-instrumentation-litellm/tests/test_provider_resolution.py
  • packages/opentelemetry-instrumentation-litellm/tests/test_streaming.py
  • packages/sample-app/sample_app/litellm_async_streaming.py
  • packages/sample-app/sample_app/litellm_completion.py
  • packages/traceloop-sdk/pyproject.toml
  • packages/traceloop-sdk/traceloop/sdk/instruments.py
  • packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py

Comment on lines +645 to +657
start = time.time()
token = context_api.attach(
context_api.set_value(SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY, True)
)
try:
response = wrapped(*args, **kwargs)
except Exception as error:
_handle_exception(span, metrics, error, model, kwargs)
raise
finally:
context_api.detach(token)

duration = time.time() - start

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Use a monotonic clock for duration measurement.

time.time() can jump on wall-clock adjustments, so both the direct and streaming histograms can record skewed or even negative durations. Capture elapsed time with a monotonic source instead.

Proposed fix
-        start = time.time()
+        start = time.perf_counter()
@@
-        duration = time.time() - start
+        duration = time.perf_counter() - start
@@
-        start = time.time()
+        start = time.perf_counter()
@@
-        duration = time.time() - start
+        duration = time.perf_counter() - start
@@
-        time.time() - start, accumulated_provider=accumulated.get("provider"),
+        time.perf_counter() - start, accumulated_provider=accumulated.get("provider"),

Also applies to: 681-693, 706-710

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py`
around lines 645 - 657, The duration timing in the LiteLLM instrumentation
currently uses a wall-clock source, which can produce skewed or negative values
when the system clock changes. Update the timing in the wrapped call path and
the streaming/direct histogram paths to use a monotonic clock instead of
time.time(), keeping the existing start/end measurement flow in __init__.py and
the related span/metrics handling intact.

Comment on lines +748 to +754
def _finish_error(self, error):
if self._self_done:
return
self._self_done = True
_handle_exception(
self._self_span, self._self_metrics, error, self._self_model, self._self_kwargs
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Preserve the resolved provider on streaming error paths.

_accumulate_chunk() already captures the provider once a chunk exposes it, but _finish_error() ignores that state and falls back to kwargs-only resolution. A stream that fails after the first chunk can therefore emit exception telemetry under generic litellm instead of the real provider.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py`
around lines 748 - 754, The streaming error path in _finish_error currently
ignores the provider already resolved by _accumulate_chunk and re-derives it
from kwargs, which can lose the real provider for exception telemetry. Update
_finish_error in the streaming helper to preserve and pass through the cached
resolved provider state from the chunk accumulator when calling
_handle_exception, so failures after the first chunk report under the correct
provider instead of falling back to generic litellm.

Comment on lines +831 to +834
"exception_counter": meter.create_counter(
name=Meters.LLM_COMPLETIONS_EXCEPTIONS,
unit="time",
description="Number of exceptions occurred during chat completions",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Fix the exception counter unit.

This counter tracks occurrences, but it is declared with unit="time". That mislabels the metric in downstream backends and dashboards. Use a count-like unit instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py`
around lines 831 - 834, The exception counter in the Litellm instrumentation is
mislabeled with a time unit even though it records occurrences. Update the
metric declaration in the counter setup near the exception_counter creation to
use a count-like unit instead of unit="time", keeping the existing
Meters.LLM_COMPLETIONS_EXCEPTIONS counter and its description intact.

Comment on lines +882 to +883
def _uninstrument(self, **kwargs):
import litellm

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Make _uninstrument() tolerate a missing optional LiteLLM install.

_instrument() already treats litellm as optional, but this unconditional import will raise ModuleNotFoundError during cleanup in the same environment. Guard the import and return early.

Proposed fix
     def _uninstrument(self, **kwargs):
-        import litellm
+        try:
+            import litellm
+        except ModuleNotFoundError:
+            return
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _uninstrument(self, **kwargs):
import litellm
def _uninstrument(self, **kwargs):
try:
import litellm
except ModuleNotFoundError:
return
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py`
around lines 882 - 883, The _uninstrument() method in the LiteLLM instrumentor
still assumes litellm is installed, which can raise ModuleNotFoundError during
cleanup even though _instrument() treats it as optional. Update _uninstrument()
to guard the import of litellm and return early when the package is missing,
keeping the cleanup path safe in environments without LiteLLM.

Comment on lines +5 to +7
class _FunctionToolCall(TypedDict):
function_name: str
arguments: Optional[dict[str, Any]]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Align ToolCall.function with the actual payload shape.

Downstream code and tests use tool_call["function"]["name"] and preserve arguments as a raw value, but this TypedDict advertises function_name plus dict arguments. That makes the public type contract diverge from the runtime schema and will mislead any typed caller integrating with the emitter.

Proposed fix
 class _FunctionToolCall(TypedDict):
-    function_name: str
-    arguments: Optional[dict[str, Any]]
+    name: str
+    arguments: Optional[Any]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class _FunctionToolCall(TypedDict):
function_name: str
arguments: Optional[dict[str, Any]]
class _FunctionToolCall(TypedDict):
name: str
arguments: Optional[Any]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/event_models.py`
around lines 5 - 7, The _FunctionToolCall TypedDict does not match the runtime
tool-call payload shape used by ToolCall and downstream consumers. Update the
typed contract in event_models.py so the nested function data is represented
with a name field matching tool_call["function"]["name"], and keep arguments as
the raw emitted value instead of a dict-only shape. Make sure any related
ToolCall typing or helpers in this module stay consistent with the actual
emitter schema.

Comment on lines +56 to +67
@pytest.fixture(scope="function")
def instrument_legacy(tracer_provider, meter_provider):
instrumentor = LiteLLMInstrumentor()
instrumentor.instrument(
tracer_provider=tracer_provider,
meter_provider=meter_provider,
)

yield instrumentor

instrumentor.uninstrument()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Wrap instrument_legacy teardown in try/finally.

If instrumentor.instrument(...) fails after partially patching LiteLLM, this fixture skips uninstrument() and can leak wrappers into later tests. Mirror the instrument_with_content pattern here.

Suggested fix
 `@pytest.fixture`(scope="function")
 def instrument_legacy(tracer_provider, meter_provider):
     instrumentor = LiteLLMInstrumentor()
-    instrumentor.instrument(
-        tracer_provider=tracer_provider,
-        meter_provider=meter_provider,
-    )
-
-    yield instrumentor
-
-    instrumentor.uninstrument()
+    try:
+        instrumentor.instrument(
+            tracer_provider=tracer_provider,
+            meter_provider=meter_provider,
+        )
+        yield instrumentor
+    finally:
+        instrumentor.uninstrument()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.fixture(scope="function")
def instrument_legacy(tracer_provider, meter_provider):
instrumentor = LiteLLMInstrumentor()
instrumentor.instrument(
tracer_provider=tracer_provider,
meter_provider=meter_provider,
)
yield instrumentor
instrumentor.uninstrument()
`@pytest.fixture`(scope="function")
def instrument_legacy(tracer_provider, meter_provider):
instrumentor = LiteLLMInstrumentor()
try:
instrumentor.instrument(
tracer_provider=tracer_provider,
meter_provider=meter_provider,
)
yield instrumentor
finally:
instrumentor.uninstrument()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opentelemetry-instrumentation-litellm/tests/conftest.py` around
lines 56 - 67, The fixture teardown in instrument_legacy is not protected
against failures during LiteLLMInstrumentor.instrument, which can leave partial
patches behind and affect later tests. Update instrument_legacy to follow the
same try/finally pattern used by instrument_with_content: create the
LiteLLMInstrumentor, run instrument(...) in the protected block, yield the
instrumentor, and ensure instrumentor.uninstrument() always runs in the finally
path even if instrumentation raises.

Comment on lines +66 to +87
def test_both_openai_and_litellm_instrumented(span_exporter, tracer_provider, meter_provider):
"""Instrumenting both must not raise and must not double-count."""
pytest.importorskip("opentelemetry.instrumentation.openai")
from opentelemetry.instrumentation.litellm import LiteLLMInstrumentor
from opentelemetry.instrumentation.openai import OpenAIInstrumentor

openai_instrumentor = OpenAIInstrumentor()
litellm_instrumentor = LiteLLMInstrumentor()
try:
openai_instrumentor.instrument(
tracer_provider=tracer_provider, meter_provider=meter_provider
)
litellm_instrumentor.instrument(
tracer_provider=tracer_provider, meter_provider=meter_provider
)
litellm.completion(
model="gpt-3.5-turbo",
messages=MESSAGES,
mock_response="hello",
)
spans = span_exporter.get_finished_spans()
assert [span.name for span in spans] == ["litellm.chat"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Exercise the OpenAI boundary here
mock_response="hello" can short-circuit before the OpenAI SDK is invoked, so this only proves the litellm.chat span. Use a stubbed provider call that actually reaches OpenAIInstrumentor, or assert suppression at that downstream boundary too.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opentelemetry-instrumentation-litellm/tests/test_nesting.py` around
lines 66 - 87, The nesting test only verifies the top-level LiteLLM span because
mock_response="hello" may bypass the OpenAI SDK entirely. Update
test_both_openai_and_litellm_instrumented to drive a stubbed provider path that
actually reaches OpenAIInstrumentor, and then assert the downstream OpenAI
boundary behavior as well as the litellm.chat span count so the test proves both
instrumentors interact correctly without double-counting.

Comment on lines +98 to +102
def get_current_weather(location: str, unit: str = "celsius") -> str:
"""Pretend tool implementation — a real one would call a weather API."""
fake_db = {"paris": 18, "london": 14, "new york": 22}
temp = fake_db.get(location.lower(), 20)
return json.dumps({"location": location, "temperature": temp, "unit": unit})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Convert temperatures when unit="fahrenheit".

The tool schema allows Fahrenheit, but the fake DB values are returned unchanged, so 18°C can be reported as 18°F.

Proposed fix
 def get_current_weather(location: str, unit: str = "celsius") -> str:
     """Pretend tool implementation — a real one would call a weather API."""
     fake_db = {"paris": 18, "london": 14, "new york": 22}
     temp = fake_db.get(location.lower(), 20)
+    if unit == "fahrenheit":
+        temp = temp * 9 / 5 + 32
+    elif unit != "celsius":
+        return json.dumps({"error": f"unsupported unit {unit}"})
     return json.dumps({"location": location, "temperature": temp, "unit": unit})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_current_weather(location: str, unit: str = "celsius") -> str:
"""Pretend tool implementation — a real one would call a weather API."""
fake_db = {"paris": 18, "london": 14, "new york": 22}
temp = fake_db.get(location.lower(), 20)
return json.dumps({"location": location, "temperature": temp, "unit": unit})
def get_current_weather(location: str, unit: str = "celsius") -> str:
"""Pretend tool implementation — a real one would call a weather API."""
fake_db = {"paris": 18, "london": 14, "new york": 22}
temp = fake_db.get(location.lower(), 20)
if unit == "fahrenheit":
temp = temp * 9 / 5 + 32
elif unit != "celsius":
return json.dumps({"error": f"unsupported unit {unit}"})
return json.dumps({"location": location, "temperature": temp, "unit": unit})
🧰 Tools
🪛 ast-grep (0.44.0)

[info] 101-101: use jsonify instead of json.dumps for JSON output
Context: json.dumps({"location": location, "temperature": temp, "unit": unit})
Note: [CWE-116] Improper Encoding or Escaping of Output.

(use-jsonify)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/sample-app/sample_app/litellm_async_streaming.py` around lines 98 -
102, The get_current_weather tool in litellm_async_streaming.py currently
returns the fake DB temperature unchanged even when unit is "fahrenheit". Update
get_current_weather to detect Fahrenheit requests and convert the Celsius value
before building the json.dumps response, while keeping the existing behavior for
Celsius and preserving the same function signature and output shape.

Comment on lines +117 to +121
def issue_refund(order_id: str, amount: float) -> str:
# A real implementation would call a payments API; here we just confirm.
return json.dumps(
{"order_id": order_id, "amount": amount, "refund_id": f"RF-{order_id}", "status": "refunded"}
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Validate refund eligibility before confirming.

issue_refund currently confirms any model-supplied order_id/amount, even for unknown or undelivered orders. That can make the sample report an invalid refund as successful.

Proposed fix
 def issue_refund(order_id: str, amount: float) -> str:
     # A real implementation would call a payments API; here we just confirm.
+    order = _ORDERS.get(order_id)
+    if order is None:
+        return json.dumps({"error": f"no order {order_id}"})
+    if order["status"] != "delivered":
+        return json.dumps({"error": f"order {order_id} is not eligible for refund"})
+    if amount <= 0 or amount > order["total"]:
+        return json.dumps({"error": f"invalid refund amount {amount}"})
     return json.dumps(
         {"order_id": order_id, "amount": amount, "refund_id": f"RF-{order_id}", "status": "refunded"}
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def issue_refund(order_id: str, amount: float) -> str:
# A real implementation would call a payments API; here we just confirm.
return json.dumps(
{"order_id": order_id, "amount": amount, "refund_id": f"RF-{order_id}", "status": "refunded"}
)
def issue_refund(order_id: str, amount: float) -> str:
# A real implementation would call a payments API; here we just confirm.
order = _ORDERS.get(order_id)
if order is None:
return json.dumps({"error": f"no order {order_id}"})
if order["status"] != "delivered":
return json.dumps({"error": f"order {order_id} is not eligible for refund"})
if amount <= 0 or amount > order["total"]:
return json.dumps({"error": f"invalid refund amount {amount}"})
return json.dumps(
{"order_id": order_id, "amount": amount, "refund_id": f"RF-{order_id}", "status": "refunded"}
)
🧰 Tools
🪛 ast-grep (0.44.0)

[info] 118-120: use jsonify instead of json.dumps for JSON output
Context: json.dumps(
{"order_id": order_id, "amount": amount, "refund_id": f"RF-{order_id}", "status": "refunded"}
)
Note: [CWE-116] Improper Encoding or Escaping of Output.

(use-jsonify)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/sample-app/sample_app/litellm_completion.py` around lines 117 - 121,
The issue is that issue_refund always returns a successful refund for any
model-supplied order_id and amount. Update issue_refund to validate the order
against known refund-eligible records first, including that the order exists,
was delivered/paid, and the requested amount is within the refundable limit;
only then return the refunded response. If the order is unknown or ineligible,
have issue_refund return a non-success result instead of confirming the refund.

Comment on lines +151 to 158
try:
# Call the custom on_end first
span_postprocess_callback(span)
finally:
if was_immutable:
attributes._immutable = True
# Then call the original to ensure normal processing
original_on_end(span)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Always forward spans to the real processor.

If span_postprocess_callback() raises on Line 153, original_on_end() on Line 158 never runs, so that span is never exported. Catch/log callback failures after restoring immutability, then always invoke the original on_end.

Proposed fix
                     def wrapped_on_end(span):
                         # OTel freezes a span's attributes (sets `_immutable`) in
                         # Span.end() before on_end runs, so the postprocess callback
                         # can't modify them. Temporarily unfreeze around the callback
                         # so it can redact/mutate attributes, then restore the flag.
                         attributes = getattr(span, "_attributes", None)
                         was_immutable = getattr(attributes, "_immutable", False)
                         if was_immutable:
                             attributes._immutable = False
                         try:
-                            # Call the custom on_end first
-                            span_postprocess_callback(span)
+                            span_postprocess_callback(span)
+                        except Exception:
+                            logging.exception("span_postprocess_callback failed")
                         finally:
                             if was_immutable:
                                 attributes._immutable = True
                         # Then call the original to ensure normal processing
                         original_on_end(span)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
# Call the custom on_end first
span_postprocess_callback(span)
finally:
if was_immutable:
attributes._immutable = True
# Then call the original to ensure normal processing
original_on_end(span)
try:
span_postprocess_callback(span)
except Exception:
logging.exception("span_postprocess_callback failed")
finally:
if was_immutable:
attributes._immutable = True
# Then call the original to ensure normal processing
original_on_end(span)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py` around lines 151 -
158, The span post-processing in tracing.py can prevent export when
span_postprocess_callback() raises because original_on_end() is skipped. Update
the on_end wrapper to catch and log callback failures in the
span_postprocess_callback path, always restore attributes._immutable in the
finally block, and then unconditionally call original_on_end(span) so the real
processor still receives every span.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opentelemetry-instrumentation-litellm/tests/test_embeddings.py`:
- Around line 42-44: The list-input embeddings test currently reads attributes
from spans[0] without verifying the emitted span contract, so it can miss extra
wrapper or internal spans. Update the test around
span_exporter.get_finished_spans() in the list-input case to assert there is
exactly one span and that its name is litellm.embeddings before accessing attrs
or GenAIAttributes.GEN_AI_INPUT_MESSAGES, mirroring the existing test_embedding
assertion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e96e6aa2-0e1d-4ce9-84cd-5d903d2c53cf

📥 Commits

Reviewing files that changed from the base of the PR and between 839b787 and fdbe72a.

📒 Files selected for processing (5)
  • packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/__init__.py
  • packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/event_models.py
  • packages/opentelemetry-instrumentation-litellm/tests/test_completion.py
  • packages/opentelemetry-instrumentation-litellm/tests/test_embeddings.py
  • packages/opentelemetry-instrumentation-litellm/tests/test_streaming.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/opentelemetry-instrumentation-litellm/opentelemetry/instrumentation/litellm/event_models.py
  • packages/opentelemetry-instrumentation-litellm/tests/test_streaming.py
  • packages/opentelemetry-instrumentation-litellm/tests/test_completion.py

Comment on lines +42 to +44
spans = span_exporter.get_finished_spans()
attrs = spans[0].attributes
messages = json.loads(attrs[GenAIAttributes.GEN_AI_INPUT_MESSAGES])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Assert the emitted span contract in the list-input test.

This test indexes spans[0] but never checks that the call still emits exactly one litellm.embeddings span. If list-input handling starts producing an extra wrapper/internal span, this case will still pass while missing the regression. Mirror the test_embedding span-count/name assertion here before reading attributes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opentelemetry-instrumentation-litellm/tests/test_embeddings.py`
around lines 42 - 44, The list-input embeddings test currently reads attributes
from spans[0] without verifying the emitted span contract, so it can miss extra
wrapper or internal spans. Update the test around
span_exporter.get_finished_spans() in the list-input case to assert there is
exactly one span and that its name is litellm.embeddings before accessing attrs
or GenAIAttributes.GEN_AI_INPUT_MESSAGES, mirroring the existing test_embedding
assertion.

@max-deygin-traceloop max-deygin-traceloop left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All five findings from the previous review are addressed in fdbe72a:

  • P1 fixed: Embedding input now uses gen_ai.input.messages JSON (matching OpenAI instrumentation). New test_embedding_list_input covers multi-input case.
  • P2 fixed: Events path finish_reason fallback changed from "unknown" to "", ChoiceEvent default updated to match.
  • P2 fixed: gen_ai.tool.definitions now captured via _build_tool_def / _set_tool_definitions. Implementation matches the OpenAI instrumentation's flattened format. Test added.
  • P2 fixed: Streaming tests now assert both per-message finish_reason and top-level gen_ai.response.finish_reasons.
  • P3 fixed: Key order in _map_content_parts is now type-first everywhere.

All 30 tests pass locally. Semconv and finish reason compliance verified.

@galzilber galzilber merged commit 1dc1e89 into main Jun 28, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants