SF Documentation Watcher

Monitors the Salesforce Agentforce Life Sciences admin guide for content changes and sends Telegram notifications when documentation is updated.

How it works

TOC extraction — fetches the main guide page and extracts all article URLs from the table of contents using Puppeteer + browserless/chrome (required because the site is fully JavaScript-rendered via Salesforce Experience Cloud / LWC).
New section detection — compares the current TOC against the previous run and notifies if new articles appear.
Content change detection — fetches each article, extracts visible text, and computes a SHA-256 hash. If the hash differs from the stored one, the content has changed.
Git-based diffing — full article text is saved as .txt files in a content/ git repository. On the first run a baseline commit is made. On subsequent runs, any changes produce a new commit so you can review exactly what changed with git diff or git log -p.
Telegram notifications — you receive a message when:
- A new section appears in the guide
- One or more articles change (with a link to the content repo)
- A critical error occurs (browser unreachable, TOC unavailable)
- More than 20% of pages fail in a single run
Optional push — if CONTENT_REMOTE is set, the content repo is pushed to GitHub automatically after every commit.

How scheduling works

The watcher runs inside a Docker container that stays on permanently. A cron job inside the container triggers the script at the configured time (default: 2:00 AM daily). You do not need to configure anything on the host machine — just keep the container running with docker compose up -d.

To change the schedule, set CRON_SCHEDULE in your .env using standard cron syntax:

CRON_SCHEDULE=0 2 * * *   # 2:00 AM every day (default)
CRON_SCHEDULE=0 8 * * 1   # 8:00 AM every Monday

Quick start (Docker)

Requirements

Docker and Docker Compose
A Telegram bot and chat ID (see below)

1. Clone the repo

git clone git@github.com:dcarnicer/sf-documentation-watcher.git
cd sf-documentation-watcher

2. Create a Telegram bot

Open Telegram and search for @BotFather
Send /newbot and follow the prompts — copy the token you receive
Send any message to your new bot (so it has a chat to respond to)
Run the helper script to get your chat_id:
```
TELEGRAM_TOKEN=123456:ABC... node get-chat-id.mjs
```
It prints your chat_id. Copy it for the next step.

3. Create the `.env` file

cp .env.example .env

Edit .env with your values:

TELEGRAM_TOKEN=123456:ABC-your-token-here
TELEGRAM_CHAT_ID=123456789

4. (Optional) Track content changes in your own GitHub repo

If you want the downloaded article snapshots pushed to GitHub so you can browse diffs online:

Create a new empty repository on GitHub (no README, no .gitignore)

Add its SSH URL to .env:

CONTENT_REMOTE=git@github.com:your-username/your-content-repo.git

Make sure the machine running the watcher has an SSH key added to your GitHub account. Inside Docker, mount your SSH key by adding this to the watcher service in docker-compose.yml:
```
volumes:
  - watcher-data:/data
  - ~/.ssh:/root/.ssh:ro
```

The first run pushes a baseline commit with all 192 articles. Subsequent runs push only when content changes.

5. Start

docker compose up -d

Docker pulls the images, builds the watcher container, and starts the cron schedule. The first run creates the baseline (no Telegram notification sent). From the second run onwards, any changes trigger a notification.

6. Check logs

All commands below run on the host machine — you do not need to enter the container.

# Live logs from the watcher (updated on each cron run)
docker compose logs -f watcher

# Run manually without waiting for the cron schedule
docker exec sfdc-watcher node /app/sfdc-watcher.mjs

7. Inspect content and diffs

# See which articles changed in the last commit
docker exec sfdc-watcher git -C /data/content log -1 --name-only

# See exactly what changed (full diff)
docker exec sfdc-watcher git -C /data/content log -p -1

# Open a shell inside the container if you need to explore further
docker exec -it sfdc-watcher bash

8. Stop

docker compose down

Data in the watcher-data volume (content repo, state, logs) is preserved across restarts. To delete it as well:

docker compose down -v

Moving to another machine

Install Docker

Clone the repo:

git clone git@github.com:dcarnicer/sf-documentation-watcher.git
cd sf-documentation-watcher

Create .env with your credentials (same as step 3 above)
Start:
```
docker compose up -d
```

The first run rebuilds the baseline (no notification). From the second run onwards, changes trigger Telegram notifications.

Terminal UI

When run interactively, the watcher shows a live interface that updates in place:

────────────────────────────────────────────────────────
  SF Documentation Watcher  ·  2026-04-01 02:00:00
────────────────────────────────────────────────────────

  ◆  TOC: 192 pages found

  ████████░░░░░░░░░░░░░░░░░░░░░░  52/192  27%  eta 213s
  ⟳  ind.lsc_customer_engagement_personas.htm
  ~  3 changed
  ✗  1 error(s)  ind.lsc_something.htm

────────────────────────────────────────────────────────
  ✓  192 pages  ·  3 changed  ·  1 error  ·  47m 12s
────────────────────────────────────────────────────────

  Failed pages:
    • ind.lsc_something.htm

When running via cron (no TTY), it automatically falls back to plain line-by-line output with no ANSI codes, suitable for log files.

Dependencies

Package	Version	Purpose
`puppeteer-core`	latest	Headless browser automation — connects to the browserless/chrome container
`chalk`	latest	Terminal colors and styling

Docker images:

Image	Purpose
`browserless/chrome`	Headless Chrome over WebSocket — required because Salesforce Help pages are fully JS-rendered

Performance

15 second cooldown between pages
Reconnects to the browser every 20 pages to prevent connection drops
Each page times out after 60 seconds; failed pages are retried once
Full run takes ~50 minutes for 192 pages

Roadmap

Content validation — before saving a page, check that the downloaded text is plausible: minimum length threshold, absence of error strings ("Sorry to interrupt", "Page not found", "Access denied", "CSS Error"), and basic sanity checks. Invalid pages would be skipped and retried on the next run rather than overwriting a good snapshot with a bad one. A standalone audit script could also scan the entire content/ repo and report suspicious files.
Multi-guide support — the watcher is currently hardcoded to the Agentforce Life Sciences admin guide. All Salesforce Help guides share the same URL pattern (help.salesforce.com/s/articleView?id=...) and page structure (LWC TOC, same DOM layout), so adding support for multiple guides via configuration would be straightforward. The main change would be accepting a list of source URLs in .env and namespacing the content files by guide.
Salesforce Developer Guides — developer documentation lives at developer.salesforce.com and uses a different site structure (static HTML, different navigation). Would need a separate fetcher and TOC extractor, but could share the same git-based diffing and Telegram notification logic.
AI-powered change summaries — instead of sending a raw list of changed files, use the Claude API to summarise the diff in plain language before the Telegram notification. The git diff is already available after each commit, so the flow would be: diff → Claude API → human-readable summary (e.g. "The installation prerequisites for package X have changed and a new step has been added to section Y") → Telegram. Optional feature, requires an Anthropic API key.
RAG-ready export — generate chunked, structured files from the article text suitable for ingestion into a vector database or knowledge base. Each chunk would include metadata (article ID, URL, section title, last updated) alongside the content, making it straightforward to build a retrieval-augmented generation pipeline on top of the documentation for AI agents to consume.
Google Drive integration — automatically upload the latest article snapshots (or the RAG-ready export) to a Google Drive folder after each run, so the documentation is accessible to other tools and team members without needing access to the git repo. Would use the Google Drive API with a service account for authentication.
Cascade failure detection — if a configurable number of consecutive errors is reached mid-run (e.g. 10 in a row), assume the browser or the site is temporarily unavailable, send a Telegram alert, sleep for a few hours, and then resume from where it left off rather than aborting the entire run. This would avoid losing a full night's run due to a transient issue.
Confluence integration (low priority) — publish the downloaded documentation to a Confluence space, keeping pages in sync with the Salesforce source. When changes are detected, the corresponding Confluence page would be updated automatically via the Confluence REST API.

Files

File	Description
`sfdc-watcher.mjs`	Main watcher script
`ui.mjs`	Terminal UI — live progress bar, status and error tracking
`get-chat-id.mjs`	Helper to find your Telegram chat ID
`Dockerfile`	Watcher container image
`docker-compose.yml`	Orchestrates watcher + browserless
`entrypoint.sh`	Container startup: initialises git repo, sets up cron
`.env.example`	Credentials template
`.env`	Your credentials — never commit this file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SF Documentation Watcher

How it works

How scheduling works

Quick start (Docker)

Requirements

1. Clone the repo

2. Create a Telegram bot

3. Create the `.env` file

4. (Optional) Track content changes in your own GitHub repo

5. Start

6. Check logs

7. Inspect content and diffs

8. Stop

Moving to another machine

Terminal UI

Dependencies

Performance

Roadmap

Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
get-chat-id.mjs		get-chat-id.mjs
package-lock.json		package-lock.json
package.json		package.json
sfdc-watcher.mjs		sfdc-watcher.mjs
ui.mjs		ui.mjs

Folders and files

Latest commit

History

Repository files navigation

SF Documentation Watcher

How it works

How scheduling works

Quick start (Docker)

Requirements

1. Clone the repo

2. Create a Telegram bot

3. Create the .env file

4. (Optional) Track content changes in your own GitHub repo

5. Start

6. Check logs

7. Inspect content and diffs

8. Stop

Moving to another machine

Terminal UI

Dependencies

Performance

Roadmap

Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3. Create the `.env` file

Packages