Skip to content
All projects
ACTIVE January 20, 2024

Google Maps Review Scraper

TypeScript npm package for collecting Google Maps place reviews through reverse-engineered Maps endpoints with pagination and parsing.

TypeScript Web Scraping Google Maps Reviews npm package Reverse Engineering

Problem Statement

Extracting reviews from Google Maps places manually is time-consuming and inefficient. Developers and businesses need a programmatic way to collect review data for analysis, monitoring, or integration into other applications.

Solution Architecture

The Google Maps Review Scraper reverse-engineers Google Maps’ internal APIs to extract review data without using official Google APIs (which don’t provide comprehensive review access). The scraper follows this architecture:

Google Maps Review Scraper solution architecture showing URL parsing, session token fetching, review API requests, pagination, and parsing

Technical Implementation

Core Components

  1. URL Parser (index.ts): Validates and extracts Place ID from Google Maps URLs
  2. Session Token Fetcher (extraction.ts): Retrieves authentication tokens from Google Maps pages
  3. Review API Client (utils.ts): Makes requests to Google Maps’ internal review endpoints
  4. Pagination Handler (utils.ts): Manages continuation tokens for fetching multiple pages
  5. Data Parser (parser.ts): Transforms raw API responses into structured review objects

Detailed Workflow

  1. URL Validation & Place ID Extraction

    • Validates the input URL format using URL constructor
    • Ensures host includes “google.com” (strictly desktop web version)
    • Extracts the Place ID using regex pattern matching (!1s([a-zA-Z0-9_:]+)!)
    • Example: From https://maps.google.com/maps/place/ChIJN1t_tDeuEmsRUsoyG83frY4 extracts ChIJN1t_tDeuEmsRUsoyG83frY4
  2. Session Authentication

    • Fetches the Google Maps place page at https://maps.google.com/maps/place/{placeId}?hl=en&gl=US
    • Extracts the kEI token from JavaScript variable using html.split("var kEI='")[1]?.split("'")[0]
    • This token acts as a session identifier for subsequent API requests
  3. Review Data Fetching

    • Makes GET requests to https://www.google.com/maps/rpc/listugcposts (built by listugcposts function)
    • Includes required parameters:
      • Place ID
      • Sort type (1=Most Relevant, 2=Newest, 3=Highest Rating, 4=Lowest Rating)
      • Search query (optional, encoded as !3s{query})
      • Session token (kEI)
      • Pagination token (Base64 encoded page number)
    • Handles Google’s XSSI protection by stripping )]}' prefix from responses before JSON parsing
  4. Pagination Handling

    • Processes continuation tokens from API responses (found in response[1])
    • Continues fetching until:
      • Requested page limit is reached (specified by pages parameter)
      • No more reviews are available (empty response[2] array)
      • No new continuation token is returned (indicating end of results)
  5. Data Parsing & Structuring

    • Transforms raw array-based API responses into structured objects using parser function
    • Extracts nested data including:
      • Review metadata (ID, timestamps for published/last_edited)
      • Author information (name, profile URLs, profile page URL, author ID)
      • Review content (rating as number, text content, language code)
      • Images (array with ID, URL, dimensions, location data, caption)
      • Owner responses (if any, with text and timestamps)

Key Implementation Details

  • HTTP Client: Uses impit for efficient HTTP requests with connection pooling
  • Cookie Management: Leverages tough-cookie for handling session cookies automatically
  • Rate Limiting: Built-in 2-second delay between initial requests to avoid detection
  • Error Handling: Comprehensive error handling with graceful degradation
  • Type Safety: Full TypeScript support with explicit type definitions

Data Flow Diagram

Google Maps Review Scraper data flow diagram from Google Maps URL input through review extraction and structured output

Dependencies

Production Dependencies

  • impit (@apify/impit): Lightweight HTTP client for making requests with automatic cookie handling
  • tough-cookie (@salesforce/tough-cookie): Robust cookie handling for session management

Development Dependencies

  • TypeScript: For type-safe development
  • @types/node: TypeScript definitions for Node.js
  • @types/tough-cookie: TypeScript definitions for tough-cookie
  • rimraf: For removing dist/ directory before builds
  • tsx: For TypeScript execution

Technical Limitations & Considerations

Rate Limiting

  • Built-in 2-second delay between initial requests to avoid detection
  • Respects reasonable usage patterns
  • May require additional delays for high-volume scraping

HTML Structure Dependency

  • Relies on specific Google Maps HTML structure for token extraction (regex pattern !1s([a-zA-Z0-9_:]+)!)
  • May break if Google significantly changes their frontend
  • Internal API endpoints may change over time (observed endpoint: https://www.google.com/maps/rpc/listugcposts)
  • Intended for educational and proof-of-concept purposes
  • Users must comply with Google’s Terms of Service
  • Not intended for large-scale commercial scraping operations
  • Should be used responsibly to avoid IP blocking

Data Completeness

  • May not capture all reviews due to Google’s personalization
  • Some reviews might be filtered based on location or account status
  • Response data may be incomplete for very old reviews

Use Cases

Business Applications

  • Competitive Analysis: Monitor competitor reviews and ratings
  • Customer Experience: Track sentiment changes over time
  • Market Research: Analyze trends in customer feedback
  • Reputation Management: Identify and respond to negative feedback promptly

Technical Applications

  • Data Enrichment: Enhance business directories with review data
  • Review Aggregation: Collect reviews from multiple sources
  • Sentiment Analysis: Feed data into NLP models for opinion mining
  • API Development: Build custom review-based services

Research Applications

  • Academic Research: Study consumer behavior patterns
  • Social Science: Analyze geographic distribution of satisfaction
  • Linguistics: Study language patterns in reviews
  • Economics: Correlate reviews with business performance metrics

Architecture Deep Dive

Module Responsibilities

index.ts (Entry Point)

  • Orchestrates the entire scraping process
  • Validates inputs and handles errors
  • Coordinates between subsystems
  • Provides the public API interface

extraction.ts (Session Management)

  • Handles authentication with Google Maps
  • Extracts session tokens from page content
  • Manages the initial HTTP request to establish context

utils.ts (Core Logic)

  • Implements parameter validation
  • Manages API communication with Google’s endpoints
  • Handles pagination through continuation tokens
  • Coordinates data fetching across multiple pages

parser.ts (Data Transformation)

  • Converts Google’s internal array format to structured objects
  • Extracts nested data safely with null checks
  • Maps API responses to meaningful review properties
  • Filters out malformed or incomplete data entries

types.ts (Type Definitions)

  • Defines TypeScript interfaces for all data structures
  • Ensures type safety throughout the application
  • Provides clear contracts between modules
  • Documents expected data shapes for consumers

Security Considerations

  • No sensitive data storage (tokens are ephemeral)
  • All requests made client-side (no server component)
  • Respects robots.txt through rate limiting
  • No attempt to bypass security measures beyond standard session handling

Performance Characteristics

Time Complexity

  • O(n) where n is the number of pages requested
  • Each page request involves one HTTP call
  • Parsing complexity is linear with review count

Space Complexity

  • O(r) where r is the total number of reviews retrieved
  • Memory usage scales linearly with collected data
  • Intermediate buffers are released after processing

Network Efficiency

  • Connection reuse through HTTP keep-alive
  • Minimal header overhead
  • Efficient cookie management
  • Batched processing where possible

Testing & Reliability

Code Quality Features

  • Parameter validation logic
  • URL parsing and place ID extraction
  • Session token extraction from HTML
  • Pagination logic and continuation handling
  • Data parsing and transformation
  • Error handling pathways

Reliability Features

  • Graceful degradation on partial failures
  • Detailed error logging for debugging
  • Validation at each processing stage

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer

This project is not affiliated with, endorsed by, or associated with Google LLC. It reverse-engineers publicly accessible Google Maps interfaces for educational purposes only. Users are responsible for ensuring their usage complies with applicable laws, regulations, and Google’s Terms of Service.