Skip to content

Audit Together AI models and reasoning controls#2135

Open
rekram1-node wants to merge 3 commits into
devfrom
audit/togetherai-reasoning-20260610
Open

Audit Together AI models and reasoning controls#2135
rekram1-node wants to merge 3 commits into
devfrom
audit/togetherai-reasoning-20260610

Conversation

@rekram1-node

@rekram1-node rekram1-node commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • audit every Together AI serverless chat model against official provider catalog, exact model pages, changelog, reasoning docs, and deprecation schedule
  • add 9 missing active models, yielding coverage of all 22 currently listed serverless chat models
  • retain 7 historical files as deprecated and preserve capabilities independently of lifecycle
  • encode reasoning controls only where Together documents exact provider behavior

Evidence

Reasoning semantics

  • MiniMax M2.7 and deprecated DeepSeek R1: fixed reasoning (reasoning_options = []) based on positive fixed-mode evidence.
  • Documented hybrid models use toggle; GPT-OSS uses exact low, medium, high; DeepSeek V4 Pro uses toggle plus exact high, max.
  • Nemotron retains its verified hybrid toggle. Its medium/high depth switch is intentionally not encoded as generic effort because Together exposes chat_template_kwargs={"medium_effort": true}, not reasoning_effort, and the generic schema cannot represent that transport-specific boolean accurately.
  • Google and Pearl Gemma 4 exact endpoint pages explicitly describe configurable thinking, so both retain reasoning = true. Their omission from Together’s serverless reasoning table conflicts with those exact pages, and no current first-party wire parameter or accepted values were found; reasoning_options is intentionally omitted.
  • Deprecated MiniMax M2.5 retains reasoning = true because its exact endpoint page classifies it as reasoning. No positive Together evidence establishes fixed/toggle/effort/budget semantics, so options remain unresolved rather than fabricated.
  • Qwen3 235B A22B Instruct 2507 remains reasoning = false: Together recommends it for reasoning workloads and markets reasoning aptitude, but does not document reasoning-token output or a fixed/toggle/effort/budget mode.
  • All reasoning = false models omit reasoning_options.

Metadata corrections

  • DeepSeek V4 Pro: $1.74 input / $3.48 output / $0.20 cached input, effective June 9.
  • Google Gemma 4: $0.39 input / $0.97 output, effective May 21; structured outputs enabled. The current catalog table still shows the pre-May-21 price, while the dated changelog and exact page show the effective values used here.

Matrix

  • 29 files: 22 active, 7 deprecated.
  • Active: 14 reasoning, 8 non-reasoning. Controls: 1 fixed, 8 toggle, 2 effort, 1 toggle+effort, 2 unresolved.
  • Deprecated: 4 reasoning, 3 non-reasoning. Controls: 1 fixed, 2 toggle, 1 unresolved.
  • Explicit unresolved allowlist: google/gemma-4-31B-it, pearl-ai/gemma-4-31b-it, MiniMaxAI/MiniMax-M2.5.

Verification

  • bun validate
  • bun test packages/core/test/sync-runner.test.ts (9 passed)
  • full 29-file matrix check with explicit unresolved allowlist
  • generated Together output assertions for corrected reasoning, status, structured output, and prices
  • git diff --check
  • no Together API credential was available, so no live inference tests were run

Unresolved gaps

  • Gemma 4: exact pages establish configurable thinking, but current reasoning docs omit the endpoints and no exact Together wire control is documented.
  • MiniMax M2.5: historical reasoning capability is established, but exact control semantics are not.
  • Nemotron medium/high depth is documented but is not representable by current generic reasoning option types without mislabeling it as reasoning_effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant