Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 97 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,103 @@ asyncio.run(main())
```

For request-level enforcement, use `SupertabConnect.handle_request()` with an
`httpx.Request`. See the `examples` directory for complete merchant and customer
examples.
`httpx.Request`. It extracts the license token from the `Authorization` header,
verifies it, optionally emits a relay analytics event, and applies bot detection
and enforcement mode when no token is present. It returns either
`{"action": HandlerAction.ALLOW, ...}` or
`{"action": HandlerAction.BLOCK, "status": ..., "body": ..., "headers": ...}`.

`handle_request()` accepts an optional second argument, a `HandleRequestContext`,
which carries per-request signals supplied by an upstream CDN/proxy
(`source_cdn`, `client_ip`, `request_id`, `request_country`, `request_asn`,
`tls_fingerprint`, and `cdn_signals`). These are recorded on the analytics event
when present; for direct SDK use the context can be omitted.

`cdn_signals` is a `CdnRequestSignals` object carrying the richer
spoof-detection signals that cannot be read from the portable request — TLS
fingerprinting fields, the verified-bot category, the negotiated protocol, and
so on. These are platform-specific (for example, Cloudflare exposes them on
`request.cf`), so the SDK takes them from the caller rather than extracting them
itself. Everything left unset stays `null` on the event.

See the `examples` directory for complete merchant and customer examples.

## Analytics

The SDK can emit one analytics event per request to the Supertab Connect
**relay** endpoint at `{base_url}/ingest/events`. This is **off by default** —
enable it by passing `analytics_enabled=True`:

```python
from supertab_connect import SupertabConnect, SupertabConnectConfig

client = SupertabConnect(
SupertabConnectConfig(
api_key="stc_live_your_api_key",
analytics_enabled=True,
)
)
```

**No extra credentials are required.** Analytics requests are authenticated with
your configured merchant `api_key` using `Authorization: Bearer <api_key>`. The
backend derives merchant identity from the API key, so the SDK sends **no
merchant identifier** in the analytics payload.

Each `AnalyticsEvent` captures the request id, source CDN, a normalized client
IP, the request path (with percent-encoding preserved), method, and selected
headers — plus, when an upstream CDN exposes them via `HandleRequestContext`, the
request country, ASN, TLS fingerprint, and HTTP Message Signature headers — along
with the verification/enforcement decision for the request.

Events emit at **`schema_version: 2`** ("capture v2"), which adds raw
spoof-detection signals for query-time classification in the warehouse (the SDK
never classifies — it emits raw signals only):

- **Portable header signals**, read directly from the request: `sec_fetch_*`,
the `sec_ch_ua*` client hints, `accept`, `host`, `has_cookies`, and
`header_names` — the lowercased, deduped, sorted set of request-header names
with edge-injected headers (`cf-*`, `fastly-*`, `cloudfront-*`,
`x-forwarded-*`, `x-real-ip`, the synthesized `Host`, …) stripped so it
reflects only what the client sent.
- **Query-string derived signals**: `query_length`, `query_param_count`, and
`query_suspicious` (a coarse exploit-marker heuristic). The raw query string
is **never** stored.
- **CDN plumbing** supplied via `HandleRequestContext.cdn_signals`:
`accept_encoding`, `http_protocol`, `tls_version`, `tls_cipher`,
`tls_client_hello_length`, `tls_client_extensions_sha1`, `as_organization`,
`client_tcp_rtt`, `cdn_verified_bot_category`, `request_priority`, and
`tls_fingerprint_ja4`.

`accept`, `sec_ch_ua`, and `as_organization` are truncated to 512 characters.
Every capture-v2 field is fail-open: anything unavailable is emitted as `null`.

**Fail-open:** analytics emission is fire-and-forget and can never block, slow,
or alter request handling. If emission fails, the error is swallowed and the
request proceeds exactly as it would with analytics disabled. Analytics is sent
only to the relay at `/ingest/events`, independent of billing event recording.

Point analytics at another environment by setting `supertab_base_url` on the
config (or `SupertabConnect.set_base_url(...)`).

For advanced use, the `AnalyticsTransport` protocol lets you inject a custom
transport (for example, an in-memory recorder in tests) via the internal
`analytics_transport` config field; `AnalyticsEvent` and `HandleRequestContext`
are exported from the package root.

### Native Fastly logging (not applicable to the Python SDK)

The TypeScript SDK can deliver analytics through a **native Fastly Compute
logging endpoint** (`FastlyLogTransport` / the `logEndpoint` option on
`fastlyHandleRequests`) instead of the HTTP relay, letting Fastly ship events
off-path to S3. That path is intentionally **not ported here**: Python does not
run on Fastly Compute (the `fastly:logger` built-in has no Python equivalent),
and — consistent with this SDK's design — the Python SDK does not embed CDN edge
handlers, receiving CDN-derived signals through `HandleRequestContext` instead.

If you need to deliver analytics somewhere other than the relay (for example, to
a log shipper that forwards to S3/Tinybird), implement the `AnalyticsTransport`
protocol and pass it via the `analytics_transport` config field.

## Error Handling

Expand Down
2 changes: 1 addition & 1 deletion examples/merchant_handle_request.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ async def main() -> None:
client = SupertabConnect(
SupertabConnectConfig(
api_key="your_api_key",
enforcement=EnforcementMode.STRICT,
enforcement=EnforcementMode.ENFORCE,
debug=True,
)
)
Expand Down
2 changes: 1 addition & 1 deletion examples/merchant_verify_and_record_event.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ async def main() -> None:
client = SupertabConnect(
SupertabConnectConfig(
api_key="your_api_key",
enforcement=EnforcementMode.SOFT,
enforcement=EnforcementMode.OBSERVE,
debug=True,
)
)
Expand Down
10 changes: 10 additions & 0 deletions supertab_connect/__init__.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
"""Supertab Connect SDK."""

from supertab_connect.analytics.types import (
AnalyticsEvent,
AnalyticsTransport,
CdnRequestSignals,
)
from supertab_connect.customer.token import obtain_license_token
from supertab_connect.exceptions import SupertabConnectError
from supertab_connect.merchant.bots import default_bot_detector
from supertab_connect.merchant.client import SupertabConnect
from supertab_connect.merchant.license import verify_license_token
from supertab_connect.types import (
EnforcementMode,
HandleRequestContext,
HandlerAction,
HandlerResult,
RSLVerificationResult,
Expand All @@ -15,7 +21,11 @@
)

__all__ = [
"AnalyticsEvent",
"AnalyticsTransport",
"CdnRequestSignals",
"EnforcementMode",
"HandleRequestContext",
"HandlerAction",
"HandlerResult",
"RSLVerificationResult",
Expand Down
43 changes: 43 additions & 0 deletions supertab_connect/analytics/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
"""Relay analytics for Supertab Connect (mirrors the TS SDK `analytics/` module)."""

from supertab_connect.analytics.build_analytics_event import (
BuildAnalyticsEventContext,
build_analytics_event,
)
from supertab_connect.analytics.ip import normalize_client_ip
from supertab_connect.analytics.transport import (
ANALYTICS_EVENTS_PATH,
HttpAnalyticsTransport,
NoopAnalyticsTransport,
aclose_http_client as aclose_analytics_http_client,
)
from supertab_connect.analytics.types import (
SCHEMA_VERSION,
TOKEN_OUTCOME_BY_REASON,
AnalyticsEvent,
AnalyticsTransport,
CdnRequestSignals,
Decision,
FinalAction,
SourceCdn,
TokenOutcome,
)

__all__ = [
"ANALYTICS_EVENTS_PATH",
"SCHEMA_VERSION",
"TOKEN_OUTCOME_BY_REASON",
"AnalyticsEvent",
"AnalyticsTransport",
"BuildAnalyticsEventContext",
"CdnRequestSignals",
"Decision",
"FinalAction",
"HttpAnalyticsTransport",
"NoopAnalyticsTransport",
"SourceCdn",
"TokenOutcome",
"aclose_analytics_http_client",
"build_analytics_event",
"normalize_client_ip",
]
172 changes: 172 additions & 0 deletions supertab_connect/analytics/build_analytics_event.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
"""Build a relay AnalyticsEvent from a request + decision (mirrors TS `buildAnalyticsEvent.ts`)."""

import uuid
from dataclasses import dataclass
from datetime import datetime, timezone
from urllib.parse import unquote

from httpx import Request

from supertab_connect.analytics.ip import normalize_client_ip
from supertab_connect.analytics.types import (
SCHEMA_VERSION,
AnalyticsEvent,
CdnRequestSignals,
Decision,
EnforcementWire,
SourceCdn,
)
from supertab_connect.types import EnforcementMode

# Defensive cap on client-controlled free-form strings, applied at the edge (mirrored by the relay).
MAX_FIELD_LENGTH = 512

# Edge-injected headers are CDN artifacts, not client signals — strip them so ``header_names``
# reflects only what the client actually sent. Covers all three CDNs: Cloudflare (``cf-*``),
# Fastly (``fastly-*``), CloudFront (``cloudfront-*``), the shared ``x-forwarded-*`` / ``x-real-ip``,
# and the SDK's own routing header ``x-original-request-url``.
_EDGE_HEADER_PREFIXES = ("cf-", "fastly-", "cloudfront-", "x-forwarded-")
# ``host`` is included here because httpx synthesizes a Host header on Request construction; the JS
# fetch ``Request`` hides it as a forbidden header, so the TS SDK never emits it in ``header_names``.
# Stripping it keeps the cross-SDK header-name set consistent (host is captured in its own field).
_EDGE_HEADER_NAMES = frozenset({"x-real-ip", "x-original-request-url", "host"})

# Mechanical exploit markers for the query-string heuristic, matched case-insensitively against the
# raw and URL-decoded query. A coarse signal only — real classification stays query-time in the
# warehouse.
_SUSPICIOUS_QUERY_MARKERS = (
"../",
"..\\",
"union select",
"<script",
"onerror=",
"/etc/passwd",
)


@dataclass(frozen=True)
class BuildAnalyticsEventContext:
# Omitted (None) when the request did not pass through a CDN (e.g. invoked directly via the SDK).
source_cdn: SourceCdn | None = None
request_id: str | None = None
client_ip: str | None = None
timestamp: datetime | None = None
request_country: str | None = None
request_asn: int | None = None
tls_fingerprint: str | None = None
# CDN plumbing not derivable from the portable Request (request.cf, etc.).
cdn_signals: CdnRequestSignals | None = None


def _iso_utc(value: datetime) -> str:
"""Format as ``YYYY-MM-DDTHH:MM:SS.mmmZ`` to match the TS `Date.toISOString()` wire form."""
return value.astimezone(timezone.utc).isoformat(timespec="milliseconds").replace("+00:00", "Z")


def _safe_pathname(request: Request) -> str:
"""Return the request path with percent-encoding preserved.

``request.url.path`` percent-*decodes* (``/a%2Fb`` → ``/a/b``), which loses encoded path
semantics. We read ``raw_path`` (``path[?query]`` bytes), drop the query, and decode without
URL-decoding — matching the TS SDK's ``new URL(request.url).pathname``.
"""
path_bytes = request.url.raw_path.split(b"?", 1)[0]
return path_bytes.decode("utf-8", "replace")


def _enforcement_to_wire(mode: EnforcementMode) -> EnforcementWire:
# EnforcementMode values are already the wire strings ("observe"/"enforce"/"disabled").
return mode.value # type: ignore[return-value]


def _truncate(value: str | None, max_length: int = MAX_FIELD_LENGTH) -> str | None:
if value is None:
return None
return value[:max_length] if len(value) > max_length else value


def _is_edge_header(name: str) -> bool:
if name in _EDGE_HEADER_NAMES:
return True
return any(name.startswith(prefix) for prefix in _EDGE_HEADER_PREFIXES)


def _collect_header_names(request: Request) -> list[str]:
names = {name.lower() for name in request.headers.keys()}
return sorted(name for name in names if not _is_edge_header(name))


def _query_signals(request: Request) -> tuple[int, int, bool]:
# request.url.query is the raw, percent-encoded query bytes (no leading "?"), matching the
# TS SDK's ``url.search.slice(1)``. The raw query itself is never stored on the event.
raw = request.url.query.decode("utf-8", "replace")
params = [p for p in raw.split("&") if p] if raw else []

haystack = raw.lower() + "\n" + unquote(raw).lower()
suspicious = any(marker in haystack for marker in _SUSPICIOUS_QUERY_MARKERS)

return len(raw), len(params), suspicious


def build_analytics_event(
request: Request,
decision: Decision,
context: BuildAnalyticsEventContext,
) -> AnalyticsEvent:
headers = request.headers
timestamp = context.timestamp if context.timestamp is not None else datetime.now(timezone.utc)
request_id = context.request_id if context.request_id is not None else str(uuid.uuid4())
query_length, query_param_count, query_suspicious = _query_signals(request)
cdn = context.cdn_signals if context.cdn_signals is not None else CdnRequestSignals()

return AnalyticsEvent(
timestamp=_iso_utc(timestamp),
request_id=request_id,
schema_version=SCHEMA_VERSION,
source_cdn=context.source_cdn,
user_agent=headers.get("user-agent", ""),
client_ip=normalize_client_ip(context.client_ip),
path=_safe_pathname(request),
method=request.method,
referer=headers.get("referer", ""),
accept_language=headers.get("accept-language", ""),
request_country=context.request_country,
request_asn=context.request_asn,
tls_fingerprint=context.tls_fingerprint,
has_token=decision.has_token,
token_outcome=decision.token_outcome,
final_action=decision.final_action,
enforcement_mode=_enforcement_to_wire(decision.enforcement_mode),
signature_agent=headers.get("signature-agent"),
signature_input=headers.get("signature-input"),
signature=headers.get("signature"),
# --- Capture v2: portable header signals ---
sec_fetch_mode=headers.get("sec-fetch-mode"),
sec_fetch_site=headers.get("sec-fetch-site"),
sec_fetch_dest=headers.get("sec-fetch-dest"),
sec_fetch_user=headers.get("sec-fetch-user"),
sec_ch_ua=_truncate(headers.get("sec-ch-ua")),
sec_ch_ua_mobile=headers.get("sec-ch-ua-mobile"),
sec_ch_ua_platform=headers.get("sec-ch-ua-platform"),
accept=_truncate(headers.get("accept")),
# httpx synthesizes the Host header from the URL, so this is effectively the parsed host.
host=headers.get("host") or request.url.host or None,
has_cookies="cookie" in headers,
header_names=_collect_header_names(request),
# Query-string derived signals (raw query never stored).
query_length=query_length,
query_param_count=query_param_count,
query_suspicious=query_suspicious,
# --- Capture v2: CDN plumbing (passthrough from the handler context) ---
accept_encoding=cdn.accept_encoding,
http_protocol=cdn.http_protocol,
tls_version=cdn.tls_version,
tls_cipher=cdn.tls_cipher,
tls_client_hello_length=cdn.tls_client_hello_length,
tls_client_extensions_sha1=cdn.tls_client_extensions_sha1,
as_organization=_truncate(cdn.as_organization),
client_tcp_rtt=cdn.client_tcp_rtt,
cdn_verified_bot_category=cdn.cdn_verified_bot_category,
request_priority=cdn.request_priority,
tls_fingerprint_ja4=cdn.tls_fingerprint_ja4,
)
27 changes: 27 additions & 0 deletions supertab_connect/analytics/ip.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
"""Client-IP normalization (mirrors TS `analytics/ip.ts`).

IPv4 addresses are mapped to their IPv6-mapped form (``::ffff:<v4>``); valid IPv6
addresses pass through unchanged; anything else collapses to the unspecified address.
"""

import ipaddress

UNSPECIFIED = "::"


def normalize_client_ip(raw: str | None) -> str:
if not raw:
return UNSPECIFIED
trimmed = raw.strip()
if not trimmed:
return UNSPECIFIED

try:
parsed = ipaddress.ip_address(trimmed)
except ValueError:
return UNSPECIFIED

if parsed.version == 4:
return f"::ffff:{trimmed}"
# IPv6 passes through unchanged (the original textual form, not a re-compressed one).
return trimmed
Loading
Loading