Monitors the Salesforce Agentforce Life Sciences admin guide for content changes and sends Telegram notifications when documentation is updated.
- TOC extraction — fetches the main guide page and extracts all article URLs from the table of contents using Puppeteer + browserless/chrome (required because the site is fully JavaScript-rendered via Salesforce Experience Cloud / LWC).
- New section detection — compares the current TOC against the previous run and notifies if new articles appear.
- Content change detection — fetches each article, extracts visible text, and computes a SHA-256 hash. If the hash differs from the stored one, the content has changed.
- Git-based diffing — full article text is saved as
.txtfiles in acontent/git repository. On the first run a baseline commit is made. On subsequent runs, any changes produce a new commit so you can review exactly what changed withgit difforgit log -p. - Telegram notifications — you receive a message when:
- A new section appears in the guide
- One or more articles change (with a link to the content repo)
- A critical error occurs (browser unreachable, TOC unavailable)
- More than 20% of pages fail in a single run
- Optional push — if
CONTENT_REMOTEis set, the content repo is pushed to GitHub automatically after every commit.
The watcher runs inside a Docker container that stays on permanently. A cron job inside the container triggers the script at the configured time (default: 2:00 AM daily). You do not need to configure anything on the host machine — just keep the container running with docker compose up -d.
To change the schedule, set CRON_SCHEDULE in your .env using standard cron syntax:
CRON_SCHEDULE=0 2 * * * # 2:00 AM every day (default)
CRON_SCHEDULE=0 8 * * 1 # 8:00 AM every Monday- Docker and Docker Compose
- A Telegram bot and chat ID (see below)
git clone git@github.com:dcarnicer/sf-documentation-watcher.git
cd sf-documentation-watcher- Open Telegram and search for @BotFather
- Send
/newbotand follow the prompts — copy the token you receive - Send any message to your new bot (so it has a chat to respond to)
- Run the helper script to get your
chat_id:It prints yourTELEGRAM_TOKEN=123456:ABC... node get-chat-id.mjs
chat_id. Copy it for the next step.
cp .env.example .envEdit .env with your values:
TELEGRAM_TOKEN=123456:ABC-your-token-here
TELEGRAM_CHAT_ID=123456789If you want the downloaded article snapshots pushed to GitHub so you can browse diffs online:
- Create a new empty repository on GitHub (no README, no .gitignore)
- Add its SSH URL to
.env:CONTENT_REMOTE=git@github.com:your-username/your-content-repo.git
- Make sure the machine running the watcher has an SSH key added to your GitHub account.
Inside Docker, mount your SSH key by adding this to the
watcherservice indocker-compose.yml:volumes: - watcher-data:/data - ~/.ssh:/root/.ssh:ro
The first run pushes a baseline commit with all 192 articles. Subsequent runs push only when content changes.
docker compose up -dDocker pulls the images, builds the watcher container, and starts the cron schedule. The first run creates the baseline (no Telegram notification sent). From the second run onwards, any changes trigger a notification.
All commands below run on the host machine — you do not need to enter the container.
# Live logs from the watcher (updated on each cron run)
docker compose logs -f watcher
# Run manually without waiting for the cron schedule
docker exec sfdc-watcher node /app/sfdc-watcher.mjs# See which articles changed in the last commit
docker exec sfdc-watcher git -C /data/content log -1 --name-only
# See exactly what changed (full diff)
docker exec sfdc-watcher git -C /data/content log -p -1
# Open a shell inside the container if you need to explore further
docker exec -it sfdc-watcher bashdocker compose downData in the watcher-data volume (content repo, state, logs) is preserved across restarts. To delete it as well:
docker compose down -v- Install Docker
- Clone the repo:
git clone git@github.com:dcarnicer/sf-documentation-watcher.git cd sf-documentation-watcher - Create
.envwith your credentials (same as step 3 above) - Start:
docker compose up -d
The first run rebuilds the baseline (no notification). From the second run onwards, changes trigger Telegram notifications.
When run interactively, the watcher shows a live interface that updates in place:
────────────────────────────────────────────────────────
SF Documentation Watcher · 2026-04-01 02:00:00
────────────────────────────────────────────────────────
◆ TOC: 192 pages found
████████░░░░░░░░░░░░░░░░░░░░░░ 52/192 27% eta 213s
⟳ ind.lsc_customer_engagement_personas.htm
~ 3 changed
✗ 1 error(s) ind.lsc_something.htm
────────────────────────────────────────────────────────
✓ 192 pages · 3 changed · 1 error · 47m 12s
────────────────────────────────────────────────────────
Failed pages:
• ind.lsc_something.htm
When running via cron (no TTY), it automatically falls back to plain line-by-line output with no ANSI codes, suitable for log files.
| Package | Version | Purpose |
|---|---|---|
puppeteer-core |
latest | Headless browser automation — connects to the browserless/chrome container |
chalk |
latest | Terminal colors and styling |
Docker images:
| Image | Purpose |
|---|---|
browserless/chrome |
Headless Chrome over WebSocket — required because Salesforce Help pages are fully JS-rendered |
- 15 second cooldown between pages
- Reconnects to the browser every 20 pages to prevent connection drops
- Each page times out after 60 seconds; failed pages are retried once
- Full run takes ~50 minutes for 192 pages
-
Content validation — before saving a page, check that the downloaded text is plausible: minimum length threshold, absence of error strings ("Sorry to interrupt", "Page not found", "Access denied", "CSS Error"), and basic sanity checks. Invalid pages would be skipped and retried on the next run rather than overwriting a good snapshot with a bad one. A standalone audit script could also scan the entire
content/repo and report suspicious files. -
Multi-guide support — the watcher is currently hardcoded to the Agentforce Life Sciences admin guide. All Salesforce Help guides share the same URL pattern (
help.salesforce.com/s/articleView?id=...) and page structure (LWC TOC, same DOM layout), so adding support for multiple guides via configuration would be straightforward. The main change would be accepting a list of source URLs in.envand namespacing the content files by guide. -
Salesforce Developer Guides — developer documentation lives at
developer.salesforce.comand uses a different site structure (static HTML, different navigation). Would need a separate fetcher and TOC extractor, but could share the same git-based diffing and Telegram notification logic. -
AI-powered change summaries — instead of sending a raw list of changed files, use the Claude API to summarise the diff in plain language before the Telegram notification. The git diff is already available after each commit, so the flow would be: diff → Claude API → human-readable summary (e.g. "The installation prerequisites for package X have changed and a new step has been added to section Y") → Telegram. Optional feature, requires an Anthropic API key.
-
RAG-ready export — generate chunked, structured files from the article text suitable for ingestion into a vector database or knowledge base. Each chunk would include metadata (article ID, URL, section title, last updated) alongside the content, making it straightforward to build a retrieval-augmented generation pipeline on top of the documentation for AI agents to consume.
-
Google Drive integration — automatically upload the latest article snapshots (or the RAG-ready export) to a Google Drive folder after each run, so the documentation is accessible to other tools and team members without needing access to the git repo. Would use the Google Drive API with a service account for authentication.
-
Cascade failure detection — if a configurable number of consecutive errors is reached mid-run (e.g. 10 in a row), assume the browser or the site is temporarily unavailable, send a Telegram alert, sleep for a few hours, and then resume from where it left off rather than aborting the entire run. This would avoid losing a full night's run due to a transient issue.
-
Confluence integration (low priority) — publish the downloaded documentation to a Confluence space, keeping pages in sync with the Salesforce source. When changes are detected, the corresponding Confluence page would be updated automatically via the Confluence REST API.
| File | Description |
|---|---|
sfdc-watcher.mjs |
Main watcher script |
ui.mjs |
Terminal UI — live progress bar, status and error tracking |
get-chat-id.mjs |
Helper to find your Telegram chat ID |
Dockerfile |
Watcher container image |
docker-compose.yml |
Orchestrates watcher + browserless |
entrypoint.sh |
Container startup: initialises git repo, sets up cron |
.env.example |
Credentials template |
.env |
Your credentials — never commit this file |