Skip to content

Impact analysis: URL normalization long-term fix #248

@Kakudou

Description

@Kakudou

Impact analysis: URL normalization long-term fix

Context

A temporary dependency constraint is being tracked separately to keep requests below 2.34.0 while preserving the security update from 2.33.0.

Related issue:

The reason for this constraint is that requests 2.34.0 changed URL path handling and no longer strips duplicate leading slashes in URI paths.

Official Requests release notes:

This can expose unsafe URL construction patterns where a configured base URL ending with / is combined with a path starting with /.

Example:

base_url = "https://example.com/"
endpoint = base_url + "/graphql"

Result:

https://example.com//graphql

With requests >=2.34.0, this duplicated path may be preserved when the request is sent, which can break servers or routers that treat //graphql differently from /graphql.

Goal

Evaluate the impact of applying the long-term fix: normalize URL construction across the codebase so we no longer rely on requests to clean malformed paths.

This issue is only for analysis and estimation. The implementation can be handled in a follow-up issue if needed.

Scope

Review places where URLs are built manually, especially when a configurable base URL is combined with an application path.

Patterns to check include:

url = base_url + "/path"
url = f"{base_url}/path"
url = "{}/path".format(base_url)

The review should focus on API clients, connector clients, SDK helpers, authentication endpoints, GraphQL endpoints, REST endpoints, health-check endpoints, upload/download URLs, webhook URLs, and callback URLs.

Expected analysis output

The analysis should produce a short summary with:

  • affected files or modules
  • examples of risky URL construction
  • risk level for each area: low / medium / high
  • recommended fix strategy
  • estimated implementation effort
  • tests that should be added or updated

Long-term fix direction

The preferred direction is to centralize URL construction instead of repeating raw string concatenation throughout the codebase.

Python standard URL utilities from urllib.parse should be evaluated for this, especially urljoin or a small wrapper around it.

References:

Example direction:

from urllib.parse import urljoin


def build_url(base_url: str, path: str) -> str:
    normalized_base = base_url.rstrip("/") + "/"
    normalized_path = path.lstrip("/")
    return urljoin(normalized_base, normalized_path)

The analysis should also check whether any path values can be user-controlled, because urljoin must be used carefully with untrusted paths.

Acceptance criteria

  • URL construction patterns have been reviewed across the relevant codebase.
  • Potentially affected locations have been listed.
  • Each affected area has a risk level: low / medium / high.
  • A recommended URL normalization strategy has been proposed.
  • The implementation effort has been estimated.
  • Required tests have been identified.
  • A follow-up implementation issue can be created from the analysis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dependenciesuse for pull requests that update a dependency filefiligran teamuse to identify PR from the Filigran teamneeds triageuse to identify issue needing triage from Filigran Product team

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions