A pipe-based news article scraping and metadata extraction library for Python
-
Updated
Mar 20, 2026 - Python
A pipe-based news article scraping and metadata extraction library for Python
A configurable pipeline for extracting and filtering articles from large corpora, tailored for the Delpher Kranten corpus, with support for features like keyword filtering and tf-idf-based relevance scoring.
OpenClaw Skill:读取微信公众号文章、识别公众号并拉取文章列表
A web-based article extractor and text-to-speech converter. Extract content from any URL and listen to articles with natural voice synthesis. Supports multiple extraction methods.
Trafilatura API for html content info extract
Pure-Python article extraction library and HTTP API - Extract clean content from web pages as Markdown or HTML
Add a description, image, and links to the article-extraction topic page so that developers can more easily learn about it.
To associate your repository with the article-extraction topic, visit your repo's landing page and select "manage topics."