Google Maps Review Scraper
A TypeScript npm module that scrapes reviews from Google Maps places using reverse-engineered Google Maps APIs
Problem Statement
Extracting reviews from Google Maps places manually is time-consuming and inefficient. Developers and businesses need a programmatic way to collect review data for analysis, monitoring, or integration into other applications.
Solution Architecture
The Google Maps Review Scraper reverse-engineers Google Maps’ internal APIs to extract review data without using official Google APIs (which don’t provide comprehensive review access). The scraper follows this architecture:
Technical Implementation
Core Components
- URL Parser (
index.ts): Validates and extracts Place ID from Google Maps URLs - Session Token Fetcher (
extraction.ts): Retrieves authentication tokens from Google Maps pages - Review API Client (
utils.ts): Makes requests to Google Maps’ internal review endpoints - Pagination Handler (
utils.ts): Manages continuation tokens for fetching multiple pages - Data Parser (
parser.ts): Transforms raw API responses into structured review objects
Detailed Workflow
-
URL Validation & Place ID Extraction
- Validates the input URL format using URL constructor
- Ensures host includes “google.com” (strictly desktop web version)
- Extracts the Place ID using regex pattern matching (
!1s([a-zA-Z0-9_:]+)!) - Example: From
https://maps.google.com/maps/place/ChIJN1t_tDeuEmsRUsoyG83frY4extractsChIJN1t_tDeuEmsRUsoyG83frY4
-
Session Authentication
- Fetches the Google Maps place page at
https://maps.google.com/maps/place/{placeId}?hl=en&gl=US - Extracts the
kEItoken from JavaScript variable usinghtml.split("var kEI='")[1]?.split("'")[0] - This token acts as a session identifier for subsequent API requests
- Fetches the Google Maps place page at
-
Review Data Fetching
- Makes GET requests to
https://www.google.com/maps/rpc/listugcposts(built by listugcposts function) - Includes required parameters:
- Place ID
- Sort type (1=Most Relevant, 2=Newest, 3=Highest Rating, 4=Lowest Rating)
- Search query (optional, encoded as
!3s{query}) - Session token (
kEI) - Pagination token (Base64 encoded page number)
- Handles Google’s XSSI protection by stripping
)]}'prefix from responses before JSON parsing
- Makes GET requests to
-
Pagination Handling
- Processes continuation tokens from API responses (found in response[1])
- Continues fetching until:
- Requested page limit is reached (specified by pages parameter)
- No more reviews are available (empty response[2] array)
- No new continuation token is returned (indicating end of results)
-
Data Parsing & Structuring
- Transforms raw array-based API responses into structured objects using parser function
- Extracts nested data including:
- Review metadata (ID, timestamps for published/last_edited)
- Author information (name, profile URLs, profile page URL, author ID)
- Review content (rating as number, text content, language code)
- Images (array with ID, URL, dimensions, location data, caption)
- Owner responses (if any, with text and timestamps)
Key Implementation Details
- HTTP Client: Uses
impitfor efficient HTTP requests with connection pooling - Cookie Management: Leverages
tough-cookiefor handling session cookies automatically - Rate Limiting: Built-in 2-second delay between initial requests to avoid detection
- Error Handling: Comprehensive error handling with graceful degradation
- Type Safety: Full TypeScript support with explicit type definitions
Data Flow Diagram
Dependencies
Production Dependencies
impit(@apify/impit): Lightweight HTTP client for making requests with automatic cookie handlingtough-cookie(@salesforce/tough-cookie): Robust cookie handling for session management
Development Dependencies
- TypeScript: For type-safe development
- @types/node: TypeScript definitions for Node.js
- @types/tough-cookie: TypeScript definitions for tough-cookie
- rimraf: For removing dist/ directory before builds
- tsx: For TypeScript execution
Technical Limitations & Considerations
Rate Limiting
- Built-in 2-second delay between initial requests to avoid detection
- Respects reasonable usage patterns
- May require additional delays for high-volume scraping
HTML Structure Dependency
- Relies on specific Google Maps HTML structure for token extraction (regex pattern
!1s([a-zA-Z0-9_:]+)!) - May break if Google significantly changes their frontend
- Internal API endpoints may change over time (observed endpoint:
https://www.google.com/maps/rpc/listugcposts)
Legal & Ethical Considerations
- Intended for educational and proof-of-concept purposes
- Users must comply with Google’s Terms of Service
- Not intended for large-scale commercial scraping operations
- Should be used responsibly to avoid IP blocking
Data Completeness
- May not capture all reviews due to Google’s personalization
- Some reviews might be filtered based on location or account status
- Response data may be incomplete for very old reviews
Use Cases
Business Applications
- Competitive Analysis: Monitor competitor reviews and ratings
- Customer Experience: Track sentiment changes over time
- Market Research: Analyze trends in customer feedback
- Reputation Management: Identify and respond to negative feedback promptly
Technical Applications
- Data Enrichment: Enhance business directories with review data
- Review Aggregation: Collect reviews from multiple sources
- Sentiment Analysis: Feed data into NLP models for opinion mining
- API Development: Build custom review-based services
Research Applications
- Academic Research: Study consumer behavior patterns
- Social Science: Analyze geographic distribution of satisfaction
- Linguistics: Study language patterns in reviews
- Economics: Correlate reviews with business performance metrics
Architecture Deep Dive
Module Responsibilities
index.ts (Entry Point)
- Orchestrates the entire scraping process
- Validates inputs and handles errors
- Coordinates between subsystems
- Provides the public API interface
extraction.ts (Session Management)
- Handles authentication with Google Maps
- Extracts session tokens from page content
- Manages the initial HTTP request to establish context
utils.ts (Core Logic)
- Implements parameter validation
- Manages API communication with Google’s endpoints
- Handles pagination through continuation tokens
- Coordinates data fetching across multiple pages
parser.ts (Data Transformation)
- Converts Google’s internal array format to structured objects
- Extracts nested data safely with null checks
- Maps API responses to meaningful review properties
- Filters out malformed or incomplete data entries
types.ts (Type Definitions)
- Defines TypeScript interfaces for all data structures
- Ensures type safety throughout the application
- Provides clear contracts between modules
- Documents expected data shapes for consumers
Security Considerations
- No sensitive data storage (tokens are ephemeral)
- All requests made client-side (no server component)
- Respects robots.txt through rate limiting
- No attempt to bypass security measures beyond standard session handling
Performance Characteristics
Time Complexity
- O(n) where n is the number of pages requested
- Each page request involves one HTTP call
- Parsing complexity is linear with review count
Space Complexity
- O(r) where r is the total number of reviews retrieved
- Memory usage scales linearly with collected data
- Intermediate buffers are released after processing
Network Efficiency
- Connection reuse through HTTP keep-alive
- Minimal header overhead
- Efficient cookie management
- Batched processing where possible
Testing & Reliability
Code Quality Features
- Parameter validation logic
- URL parsing and place ID extraction
- Session token extraction from HTML
- Pagination logic and continuation handling
- Data parsing and transformation
- Error handling pathways
Reliability Features
- Graceful degradation on partial failures
- Detailed error logging for debugging
- Validation at each processing stage
License
This project is licensed under the MIT License - see the LICENSE file for details.
Disclaimer
This project is not affiliated with, endorsed by, or associated with Google LLC. It reverse-engineers publicly accessible Google Maps interfaces for educational purposes only. Users are responsible for ensuring their usage complies with applicable laws, regulations, and Google’s Terms of Service.