Skip to content

Add X/Twitter scraper support and generic Web Scraper API access with discovery parameters #38

@Ashish-Soni08

Description

@Ashish-Soni08

Summary

The Python SDK advertises broad scraper and dataset coverage, but version
2.3.1 has no client.scrape.x or client.scrape.twitter namespace.

The repository's ScrapeService exposes Amazon, LinkedIn, ChatGPT, Facebook,
Instagram, Perplexity, TikTok, YouTube, DigiKey, and Reddit, but not X.
The generated dataset catalog also does not expose an X/Twitter module.

As a result, a working X Web Scraper API job cannot be represented through the
SDK. I had to use httpx directly against /datasets/v3/trigger.

Use case

Collect public X posts for verified profiles with:

  • type=discover_new
  • discover_by=profile_url
  • start_date
  • end_date
  • asynchronous polling
  • error inclusion
  • raw JSONL output

The corresponding dataset is:

gd_lwxkxvnf1cynvib9co

Equivalent jobs submitted through direct REST and the Control Panel returned
the same 454 records across six profiles, confirming that the REST integration
was correct.

Current gap

There is no supported equivalent to:

await client.scrape.x.posts_by_profile(
    url="https://x.com/sama",
    start_date="2022-11-30T00:00:00Z",
    end_date="2026-06-08T23:59:59Z",
)

There is also no clearly documented generic Web Scraper API method such as:

await client.web_scraper.trigger(
    dataset_id="gd_lwxkxvnf1cynvib9co",
    discovery={"type": "discover_new", "discover_by": "profile_url"},
    inputs=[...],
    include_errors=True,
)

The generic dataset catalog API is not a replacement for triggering a current
Web Scraper API collector with discovery inputs.

Requested improvement

Please add:

  1. An X/Twitter scraper namespace generated from the available X scraper
    endpoints.
  2. A generic typed Web Scraper API client that accepts dataset_id, query
    parameters, structured inputs, and asynchronous trigger/poll/fetch.
  3. Introspection for supported discovery modes and their distinct input
    schemas.
  4. Access to snapshot ID, submitted input, status, error rows, delivered
    record count, and cost when available.
  5. JSONL/NDJSON streaming or file export for large jobs.
  6. Documentation that distinguishes:
    • ready-made scraper methods;
    • marketplace/pre-collected datasets;
    • Scraper Studio collectors; and
    • generic Web Scraper API jobs.

Related documentation concern

The documentation describes hundreds of available scrapers, while the
high-level Python namespace exposes a much smaller platform set. Please clarify
whether the SDK intends to provide full catalog parity or only selected
first-class wrappers.

Expected outcome

Users should not need to leave the SDK and write a separate REST client to
access a Bright Data Web Scraper API collector already available in their
account.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions