A lightweight MCP server for web scraping — handles both static HTML and JavaScript-rendered pages through a single, consistent interface.
Most AI assistants can browse the web, but they often struggle with pages that require JavaScript to render content. When I tried to fetch data from sites like AWS blogs or dashboards, I kept getting back empty results — because the actual content only appears after JavaScript runs.
I already had Python scripts using requests + BeautifulSoup for static pages, but JS-rendered pages meant spinning up Playwright separately, writing boilerplate every time, and context-switching between tools.
So I built this MCP server to handle it all in one place. You tell it what URL and CSS selector you want — it figures out whether to use a lightweight HTTP fetch or headless Chrome, and gives you back the data.
Four tools, each for a different use case:
| Tool | Engine | When to use |
|---|---|---|
scrape_static |
Colly (HTTP) | Fast static HTML pages |
scrape_js |
chromedp (headless Chrome) | JS-rendered SPAs, dashboards |
scrape_multiple |
Colly parallel | Same selector across many URLs |
scrape_crawl |
Colly recursive | Follow links to a given depth |
All tools use the same interface: give a URL and a CSS selector, get back an array of matched values.
- Go 1.21+
- Chrome or Chromium installed on the host (only required for
scrape_js)
macOS / Linux (권장 — Go 설치 불필요):
curl -L "https://github.com/niceysam/scraper-mcp-server/releases/latest/download/scraper-mcp-server-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m | sed 's/x86_64/amd64/;s/aarch64/arm64/')" -o ~/.local/bin/scraper-mcp-server && chmod +x ~/.local/bin/scraper-mcp-server && xattr -d com.apple.quarantine ~/.local/bin/scraper-mcp-server 2>/dev/null; echo "설치 완료"Go가 설치된 환경:
go install github.com/niceysam/scraper-mcp-server@latestOr download a specific binary from Releases.
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"scraper": {
"command": "/Users/YOU/go/bin/scraper-mcp-server"
}
}
}{
"mcpServers": {
"scraper": {
"command": "/path/to/scraper-mcp-server"
}
}
}Fetches raw HTML via HTTP and extracts values using a CSS selector. No JavaScript execution — fast and lightweight.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
url |
string | required | Target URL |
selector |
string | required | CSS selector |
attribute |
string | "text" |
"text" for inner text, or any attribute name ("href", "src", ...) |
Example
{
"url": "https://news.ycombinator.com",
"selector": ".titleline > a",
"attribute": "text"
}Launches headless Chrome, waits for JavaScript to execute, then extracts values. Use this for any page that loads content dynamically.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
url |
string | required | Target URL |
selector |
string | required | CSS selector |
attribute |
string | "text" |
"text" or any attribute name |
wait_for |
string | — | CSS selector to wait for before extracting |
timeout_seconds |
number | 30 |
Total timeout |
Example
{
"url": "https://aws.amazon.com/ko/blogs/tech/",
"selector": "article",
"timeout_seconds": 25
}Scrapes multiple URLs concurrently (5 parallel workers) with the same selector. Returns a map of URL → matched values.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
urls |
string[] | required | List of URLs |
selector |
string | required | CSS selector |
attribute |
string | "text" |
"text" or any attribute name |
Starts at a URL and recursively follows links to a specified depth, collecting matched values from every page visited.
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
url |
string | required | Starting URL |
selector |
string | required | CSS selector |
attribute |
string | "text" |
"text" or any attribute name |
depth |
number | 2 |
How many link levels to follow |
max_pages |
number | 20 |
Maximum pages to visit |
same_domain_only |
boolean | true |
Restrict crawl to the same domain |
timeout_seconds |
number | 60 |
Total timeout |
Example — crawl an AWS blog section two levels deep:
{
"url": "https://aws.amazon.com/ko/blogs/tech/",
"selector": "article p",
"depth": 2,
"max_pages": 15
}- All requests send
User-Agent: Mozilla/5.0 (compatible; scraper-mcp-server/1.0) scrape_jsrequires Chrome or Chromium to be available on the host- Empty and whitespace-only matches are dropped from results
- For RSS feeds, use a dedicated RSS parser instead — this tool is for HTML pages
MIT