ACTIVE January 20, 2024

Google Maps Review Scraper

TypeScript npm package for collecting Google Maps place reviews through reverse-engineered Maps endpoints with pagination and parsing.

TypeScript Web Scraping Google Maps Reviews npm package Reverse Engineering

GitHub Registry Docs

Problem Statement

Extracting reviews from Google Maps places manually is time-consuming and inefficient. Developers and businesses need a programmatic way to collect review data for analysis, monitoring, or integration into other applications.

Solution Architecture

The Google Maps Review Scraper reverse-engineers Google Maps’ internal APIs to extract review data without using official Google APIs (which don’t provide comprehensive review access). The scraper follows this architecture:

Google Maps Review Scraper solution architecture showing URL parsing, session token fetching, review API requests, pagination, and parsing

Technical Implementation

Core Components

URL Parser (index.ts): Validates and extracts Place ID from Google Maps URLs
Session Token Fetcher (extraction.ts): Retrieves authentication tokens from Google Maps pages
Review API Client (utils.ts): Makes requests to Google Maps’ internal review endpoints
Pagination Handler (utils.ts): Manages continuation tokens for fetching multiple pages
Data Parser (parser.ts): Transforms raw API responses into structured review objects

Detailed Workflow

URL Validation & Place ID Extraction
- Validates the input URL format using URL constructor
- Ensures host includes “google.com” (strictly desktop web version)
- Extracts the Place ID using regex pattern matching (!1s([a-zA-Z0-9_:]+)!)
- Example: From https://maps.google.com/maps/place/ChIJN1t_tDeuEmsRUsoyG83frY4 extracts ChIJN1t_tDeuEmsRUsoyG83frY4
Session Authentication
- Fetches the Google Maps place page at https://maps.google.com/maps/place/{placeId}?hl=en&gl=US
- Extracts the kEI token from JavaScript variable using html.split("var kEI='")[1]?.split("'")[0]
- This token acts as a session identifier for subsequent API requests
Review Data Fetching
- Makes GET requests to https://www.google.com/maps/rpc/listugcposts (built by listugcposts function)
- Includes required parameters:
  - Place ID
  - Sort type (1=Most Relevant, 2=Newest, 3=Highest Rating, 4=Lowest Rating)
  - Search query (optional, encoded as !3s{query})
  - Session token (kEI)
  - Pagination token (Base64 encoded page number)
- Handles Google’s XSSI protection by stripping )]}' prefix from responses before JSON parsing
Pagination Handling
- Processes continuation tokens from API responses (found in response[1])
- Continues fetching until:
  - Requested page limit is reached (specified by pages parameter)
  - No more reviews are available (empty response[2] array)
  - No new continuation token is returned (indicating end of results)
Data Parsing & Structuring
- Transforms raw array-based API responses into structured objects using parser function
- Extracts nested data including:
  - Review metadata (ID, timestamps for published/last_edited)
  - Author information (name, profile URLs, profile page URL, author ID)
  - Review content (rating as number, text content, language code)
  - Images (array with ID, URL, dimensions, location data, caption)
  - Owner responses (if any, with text and timestamps)

Key Implementation Details

HTTP Client: Uses impit for efficient HTTP requests with connection pooling
Cookie Management: Leverages tough-cookie for handling session cookies automatically
Rate Limiting: Built-in 2-second delay between initial requests to avoid detection
Error Handling: Comprehensive error handling with graceful degradation
Type Safety: Full TypeScript support with explicit type definitions

Data Flow Diagram

Google Maps Review Scraper data flow diagram from Google Maps URL input through review extraction and structured output

Dependencies

Production Dependencies

impit (@apify/impit): Lightweight HTTP client for making requests with automatic cookie handling
tough-cookie (@salesforce/tough-cookie): Robust cookie handling for session management

Development Dependencies

TypeScript: For type-safe development
@types/node: TypeScript definitions for Node.js
@types/tough-cookie: TypeScript definitions for tough-cookie
rimraf: For removing dist/ directory before builds
tsx: For TypeScript execution

Technical Limitations & Considerations

Rate Limiting

Built-in 2-second delay between initial requests to avoid detection
Respects reasonable usage patterns
May require additional delays for high-volume scraping

HTML Structure Dependency

Relies on specific Google Maps HTML structure for token extraction (regex pattern !1s([a-zA-Z0-9_:]+)!)
May break if Google significantly changes their frontend
Internal API endpoints may change over time (observed endpoint: https://www.google.com/maps/rpc/listugcposts)

Legal & Ethical Considerations

Intended for educational and proof-of-concept purposes
Users must comply with Google’s Terms of Service
Not intended for large-scale commercial scraping operations
Should be used responsibly to avoid IP blocking

Data Completeness

May not capture all reviews due to Google’s personalization
Some reviews might be filtered based on location or account status
Response data may be incomplete for very old reviews

Use Cases

Business Applications

Competitive Analysis: Monitor competitor reviews and ratings
Customer Experience: Track sentiment changes over time
Market Research: Analyze trends in customer feedback
Reputation Management: Identify and respond to negative feedback promptly

Technical Applications

Data Enrichment: Enhance business directories with review data
Review Aggregation: Collect reviews from multiple sources
Sentiment Analysis: Feed data into NLP models for opinion mining
API Development: Build custom review-based services

Research Applications

Academic Research: Study consumer behavior patterns
Social Science: Analyze geographic distribution of satisfaction
Linguistics: Study language patterns in reviews
Economics: Correlate reviews with business performance metrics

Architecture Deep Dive

Module Responsibilities

`index.ts` (Entry Point)

Orchestrates the entire scraping process
Validates inputs and handles errors
Coordinates between subsystems
Provides the public API interface

`extraction.ts` (Session Management)

Handles authentication with Google Maps
Extracts session tokens from page content
Manages the initial HTTP request to establish context

`utils.ts` (Core Logic)

Implements parameter validation
Manages API communication with Google’s endpoints
Handles pagination through continuation tokens
Coordinates data fetching across multiple pages

`parser.ts` (Data Transformation)

Converts Google’s internal array format to structured objects
Extracts nested data safely with null checks
Maps API responses to meaningful review properties
Filters out malformed or incomplete data entries

`types.ts` (Type Definitions)

Defines TypeScript interfaces for all data structures
Ensures type safety throughout the application
Provides clear contracts between modules
Documents expected data shapes for consumers

Security Considerations

No sensitive data storage (tokens are ephemeral)
All requests made client-side (no server component)
Respects robots.txt through rate limiting
No attempt to bypass security measures beyond standard session handling

Performance Characteristics

Time Complexity

O(n) where n is the number of pages requested
Each page request involves one HTTP call
Parsing complexity is linear with review count

Space Complexity

O(r) where r is the total number of reviews retrieved
Memory usage scales linearly with collected data
Intermediate buffers are released after processing

Network Efficiency

Connection reuse through HTTP keep-alive
Minimal header overhead
Efficient cookie management
Batched processing where possible

Testing & Reliability

Code Quality Features

Parameter validation logic
URL parsing and place ID extraction
Session token extraction from HTML
Pagination logic and continuation handling
Data parsing and transformation
Error handling pathways

Reliability Features

Graceful degradation on partial failures
Detailed error logging for debugging
Validation at each processing stage

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer

This project is not affiliated with, endorsed by, or associated with Google LLC. It reverse-engineers publicly accessible Google Maps interfaces for educational purposes only. Users are responsible for ensuring their usage complies with applicable laws, regulations, and Google’s Terms of Service.