revcli

A command-line tool that scrapes Google Maps location reviews using browser automation. No API key required.

Why revcli?

No Google API key, no API quotas – reads the same public pages a regular browser sees via Playwright automation
EEA auth wall handled – persistent Chrome profile + one-shot revcli auth command sign you in once, then reuse the session across runs
Rich review data – bilingual text (translated + original), owner responses, photo counts, author URLs
Schema-validated output – every review round-trips through a Zod schema before landing in JSON/CSV, so data quality is enforced at the boundary
Resumable batches – interrupted large runs pick up where they left off via an atomic state file

Features

Single or batch scraping – scrape one location or hundreds from a file
Full review data – author, rating, text, photos count, owner responses
Bilingual support – captures both translated and original language text
Sort control – newest, most relevant, highest, or lowest rated
Sort verification – verifies the selected sort order is active before scraping begins
Persistent authentication – sign in to Google once, session reused across runs
JSON and CSV output – structured data ready for analysis
Resumable batches – interrupted batch runs pick up where they left off
Schema validation – verify output files against the expected schema

Prerequisites

Node.js 22+
Chromium (downloaded automatically via Playwright)

Installation

Not yet published on npm. Install from source:

git clone https://github.com/qodeca/revcli.git
cd revcli
npm install
npm run build
npx playwright install chromium
npm link              # optional: exposes `revcli` as a global command

After npm link, the revcli binary is available system-wide. Without linking, use npm run dev -- scrape '<url>' from the project directory.

Note: The first install requires npx playwright install chromium to download the browser binary (~165 MB).

Quick start

# Authenticate with Google (required once – opens browser)
revcli auth

# Scrape 50 reviews and save to file
revcli scrape 'https://maps.app.goo.gl/MTVGWdpd8vVqTouv9' -m 50 -o reviews.json

# Batch scrape multiple locations
revcli batch locations.txt -d ./output

# Validate output
revcli validate reviews.json

CLI reference

Global options

Option	Description
`-V, --version`	Print version number
`-h, --help`	Display help for any command

`revcli scrape <url>`

Scrape reviews from a single Google Maps location.

Arguments:

Argument	Description
`<url>`	Google Maps URL, short URL (`maps.app.goo.gl/...`), CID URL (`maps?cid=...`), ftid URL (`maps?ftid=0x...:0x...`), or Place ID (`ChIJ...`)

Options:

Option	Default	Description
`-m, --max-reviews <n>`	all	Maximum number of reviews to collect. Must be a positive integer. When capped, `business.totalReviews` equals the capped count and `business.headerTotalReviews` preserves Google's full count
`-s, --sort <order>`	`newest`	Review sort order. Choices: `newest`, `relevant`, `highest`, `lowest`
`-o, --output <path>`	stdout	Write output to file instead of stdout
`-f, --format <type>`	`json`	Output format. Choices: `json`, `csv`
`--headless`	`false`	Hide the browser window. Browser shows by default so you can observe scraping and handle auth prompts
`--delay <ms>`	`3000`	Delay in milliseconds between scroll actions. Increase if getting rate-limited
`-v, --verbose`	`false`	Enable debug-level logging

Examples:

Important: Always use single quotes ('...') around Google Maps URLs. Double quotes cause zsh/bash to interpret ! characters in the URL's data= parameter as history expansion, mangling the URL.

revcli scrape 'https://maps.app.goo.gl/MTVGWdpd8vVqTouv9'
revcli scrape 'https://maps.app.goo.gl/MTVGWdpd8vVqTouv9' -m 50 -o reviews.json
revcli scrape 'https://maps.app.goo.gl/MTVGWdpd8vVqTouv9' --sort relevant --format csv -o reviews.csv
revcli scrape 'ChIJN1t_tDeuEmsRUsoyG83frY4' -m 20 -o place.json
revcli scrape 'https://www.google.com/maps?ftid=0x3e2ee3641f7016d7:0x8e3a8bcf52dad296'
revcli scrape 'https://maps.app.goo.gl/MTVGWdpd8vVqTouv9' --headless --verbose --delay 5000

`revcli batch <file>`

Scrape reviews from multiple locations listed in a file. Produces one output file per location.

Arguments:

Argument	Description
`<file>`	Path to a file containing Google Maps URLs – one per line (lines starting with `#` are comments) or a JSON array of URL strings

Options:

Option	Default	Description
`-d, --output-dir <path>`	`./output`	Directory for output files. Created automatically if it doesn't exist
`-m, --max-reviews <n>`	all	Maximum number of reviews per location. Must be a positive integer. When capped, `business.totalReviews` equals the capped count and `business.headerTotalReviews` preserves Google's full count
`-s, --sort <order>`	`newest`	Review sort order. Choices: `newest`, `relevant`, `highest`, `lowest`
`-f, --format <type>`	`json`	Output format. Choices: `json`, `csv`
`--headless`	`false`	Hide the browser window
`--delay <ms>`	`3000`	Delay in milliseconds between scroll actions within a location
`--location-delay <ms>`	`10000`	Delay in milliseconds between locations. Increase to reduce rate-limiting risk
`--location-timeout <ms>`	`300000`	Timeout per location in milliseconds (default: 5 minutes). Prevents indefinite hangs
`--resume`	`false`	Skip locations already scraped in a previous run. State tracked via `.revcli-state.json` in the output directory
`-v, --verbose`	`false`	Enable debug-level logging

Examples:

revcli batch locations.txt -d ./output
revcli batch locations.txt -d ./output -m 100 --resume
revcli batch locations.txt -d ./output --format csv --location-delay 15000
revcli batch urls.json -d ./reviews --sort relevant --verbose

Input file formats:

# Newline-delimited (comments and blank lines ignored)
# locations.txt
https://maps.app.goo.gl/MTVGWdpd8vVqTouv9
https://maps.app.goo.gl/n2kgnxXZnYPYb25CA

// JSON array
["https://maps.app.goo.gl/MTVGWdpd8vVqTouv9", "https://maps.app.goo.gl/n2kgnxXZnYPYb25CA"]

`revcli validate <file>`

Validate a JSON output file against the expected schema. Exits with code 1 on failure.

Arguments:

Argument	Description
`<file>`	Path to a JSON file to validate

Examples:

revcli validate reviews.json
revcli validate output/bfit-yasmeen-mens.json

`revcli auth`

Manage Google account authentication. Required for EEA users where Google shows a "limited view" without sign-in.

Subcommand	Description
(none)	Open browser and sign in to Google interactively
`status`	Check if currently signed in (headless, exit code 1 if not)
`logout`	Clear saved browser session

Examples:

revcli auth                # Sign in (opens browser)
revcli auth status         # Check auth state
revcli auth logout         # Clear session

How it works: revcli uses a persistent Chrome profile at ~/.revcli/chrome-profile/. Sign in once with revcli auth, and all subsequent scrapes reuse that session. Google auth cookies persist between CLI runs.

Output schema

JSON output structure

{
  "business": {
    "name": "Business Name",
    "placeId": "0x...:0x...",
    "url": "https://...",
    "address": "Full address",
    "rating": 4.5,
    "totalReviews": 312,
    "headerTotalReviews": 568,
    "scrapeDate": "2026-04-08T15:00:00.000Z"
  },
  "reviews": [
    {
      "id": "google-review-id",
      "author": "Reviewer Name",
      "authorUrl": "https://www.google.com/maps/contrib/...",
      "publishTime": "2 weeks ago",
      "rating": 5,  // 0–5 (0 = could not parse stars)
      "text": "Review text (translated if applicable)",
      "originalText": "Original language text",
      "originalLanguage": "arabic",
      "photos": 2,
      "ownerResponse": {
        "text": "Owner response text",
        "originalText": "Original response",
        "originalLanguage": "english",
        "publishTime": "a month ago"
      }
    }
  ],
  "metadata": {
    "provider": "playwright",
    "scrapeDurationMs": 8243,
    "reviewsCollected": 5,
    "sortOrder": "newest"
  }
}

Business field notes:

business.totalReviews – Number of reviews collected in this file (equals reviews.length). Internally consistent with the reviews array.
business.headerTotalReviews – Google Maps header-reported review count, or null if the header could not be parsed. May differ from totalReviews when Google under-reports, when --max-reviews caps collection, or when the header parser cannot read the value.

Output schema changes

business.totalReviews now reflects the collected review count, not Google's header value.

Previously, business.totalReviews was populated from the Google Maps business listing header at the start of the scrape (e.g., "568 reviews"). As of this release it equals reviews.length in the same file, so the metadata is internally consistent with the payload.

The original Google-reported header value is preserved in a new field, business.headerTotalReviews. It is nullable (null when parsing failed). Old JSON files written by previous revcli versions still load via revcli validate – the missing field defaults to null.

Note: when you pass --max-reviews N, totalReviews === N (the capped collected count), and headerTotalReviews holds Google's larger number.

How it works

revcli uses Playwright to automate a Chromium browser with a persistent profile:

Launches browser using a saved Chrome profile (~/.revcli/chrome-profile/)
Navigates to the Google Maps place URL (strips tracking params, forces English locale)
Handles cookie consent and checks for Google's "limited view" (EEA auth wall)
Opens the Reviews tab and sets the sort order
Scrolls the review panel using mouse wheel events to trigger lazy loading
Extracts review data from the DOM in bulk via page.evaluate()
Validates each review through Zod schemas
Deduplicates by review ID and repeats until all reviews are collected or no more are available

No Google API key is needed – the tool reads the same public page a regular browser would see. The browser shows by default (use --headless to hide it).

Troubleshooting

Symptom	Diagnosis	Fix
"Limited view" appears instead of the Reviews tab	Google's EEA auth wall for unauthenticated users	Run `revcli auth` once to sign in; the persistent profile remembers you
URL in command gets mangled, `!` characters rewritten	zsh/bash history expansion on `!`	Always wrap URLs in single quotes (`'...'`), never double quotes
Scraper returns 0 reviews with no errors	Selectors may be stale – Google rotates obfuscated classes	Run with `--verbose` and see docs/selector-maintenance.md to update selectors
Warning: "N reviews have rating=0 – stars selector may be stale"	The stars selector went stale; reviews still collected, just without star ratings	Update `SELECTORS.stars` in `src/scraper/selectors.ts`
Scraper hangs or times out mid-scroll	Rate limiting or anti-bot throttling	Increase `--delay` (default 3000 ms); for batches, increase `--location-delay` (default 10000 ms)
`business.headerTotalReviews` is `null` in the output	Header parser couldn't read the count; `business.totalReviews` still equals `reviews.length`	Cosmetic – the scrape itself succeeded
Scrape plateaus before reaching the place's full review count	Google Maps filters the `newest` sort by recency/verification; the header number is not ground truth	Expected. Use `--sort relevant` if you need more coverage
"captcha detected" / "UnrecoverableError"	Google challenged the browser; running scrapes from that IP may be throttled	Wait 15–30 minutes; increase delays; consider a different network

Run any command with -v, --verbose for debug-level logs including selector parse decisions, scroll state, and navigation verification.

Limitations

Selector fragility – Google Maps uses obfuscated CSS class names that change periodically. When this happens, the scraper returns zero reviews. All selectors are centralized in src/scraper/selectors.ts for easy updating.
Relative timestamps – Google Maps shows review times as "2 weeks ago" rather than exact dates. These are captured as-is.
No translation toggle – The tool captures whatever text Google displays (usually auto-translated). The original language text requires clicking "See original" which is not currently automated.
Language detection – The originalLanguage field uses a simple Arabic/Latin script heuristic, not full language identification.
newest sort filtering – Google Maps applies hidden recency/verification filters when sorting by newest. The collected review count can plateau below the header-reported total (this is Google-side behaviour, not a scraper bug).

Maintenance

Google rotates the obfuscated CSS class names every few weeks to months. When that happens, the scraper silently returns zero reviews. All selectors live in src/scraper/selectors.ts – update them in one place. The full update procedure, diagnosis steps, and list of stable vs fragile patterns is in docs/selector-maintenance.md. Read the "Expand button lesson" section there before touching SELECTORS.expandButton – it documents a three-revision debugging story that's easy to accidentally undo.

Contributing

git clone https://github.com/qodeca/revcli.git
cd revcli
npm install
npx playwright install chromium

Development workflow

npm run dev -- scrape 'https://maps.app.goo.gl/...' -m 5    # Run from source
npm test                                                      # Run all tests (238)
npx vitest run tests/parser.test.ts                           # Run single test file
npm run typecheck                                             # Type check
npm run build                                                 # Build to dist/

Architecture, data-flow patterns, and coding conventions are documented in CLAUDE.md. Read it before making structural changes – it captures hard-won lessons (Playwright locator scoping, typed unrecoverable errors, hl=en invariant, etc.) that aren't obvious from the source.

Project structure

src/
├── commands/       # CLI command handlers (scrape, batch, validate, auth)
├── scraper/        # Playwright automation
│   ├── browser.ts              # Persistent Chrome profile, anti-detection
│   ├── auth.ts                 # Google sign-in detection, limited view handling
│   ├── navigator.ts            # Orchestrates page navigation flow
│   ├── consent.ts              # Google consent + locale + URL normalization
│   ├── business-extractor.ts   # Business info extraction from DOM
│   ├── scroller.ts, extractor.ts, parser.ts  # Review collection pipeline
│   └── selectors.ts            # All Google Maps CSS selectors (update here when they break)
├── core/           # Schema definitions (Zod), types (SortOrder, OutputFormat), retry, UnrecoverableError, rate limiter
├── output/         # writeOutput() dispatcher, JSON and CSV writers
└── utils/          # URL parser, logger, batch progress

See CLAUDE.md for architecture details and coding conventions.

Reporting a bug

Before opening an issue, please include the following so maintainers can reproduce:

revcli --version output
Command you ran, with the URL in single quotes
--verbose log from the failing run (trim or redact reviewer names / review text if you're worried about PII – the tool's built-in PII scrubbing only covers header-parser debug logs, not full extraction)
OS and Node version (uname -a, node --version)
A Google Maps place URL that reproduces the issue, if possible – selectors sometimes behave differently on different place types (restaurants, hotels, gyms, etc.)

Known classes of issues and where to look:

Zero reviews returned or rating=0 warnings → Google rotated a selector. See docs/selector-maintenance.md.
UnrecoverableError with NAV_VERIFY → cross-contamination guard tripped; the scraper intentionally bails rather than return stale data. Open an issue with the verbose log.
captcha detected → you're being throttled. Not a bug, but an issue helps track frequency.

Legal notice

This tool automates a web browser to access publicly available information on Google Maps. It is provided for personal, educational, and research purposes.

Users are solely responsible for ensuring their use complies with Google's Terms of Service and all applicable laws
The authors do not encourage or condone use of this tool in violation of any terms of service
Use reasonable delays between requests to avoid excessive load on Google's servers
This tool is not affiliated with, endorsed by, or connected to Google in any way

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
docs		docs
fixtures		fixtures
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

revcli

Why revcli?

Table of contents

Features

Prerequisites

Installation

Quick start

CLI reference

Global options

`revcli scrape <url>`

`revcli batch <file>`

`revcli validate <file>`

`revcli auth`

Output schema

Output schema changes

How it works

Troubleshooting

Limitations

Maintenance

Contributing

Development workflow

Project structure

Reporting a bug

Legal notice

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

revcli

Why revcli?

Table of contents

Features

Prerequisites

Installation

Quick start

CLI reference

Global options

revcli scrape <url>

revcli batch <file>

revcli validate <file>

revcli auth

Output schema

Output schema changes

How it works

Troubleshooting

Limitations

Maintenance

Contributing

Development workflow

Project structure

Reporting a bug

Legal notice

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`revcli scrape <url>`

`revcli batch <file>`

`revcli validate <file>`

`revcli auth`

Packages