From 6c18b6447ededad50ea325424a4b74fbfe086294 Mon Sep 17 00:00:00 2001 From: Vinta Chen Date: Mon, 4 May 2026 21:24:57 +0800 Subject: [PATCH] feat: use explicit Projects section in README --- README.md | 158 ++++++++++++++-------------- website/build.py | 9 +- website/readme_parser.py | 51 +++++---- website/tests/test_build.py | 62 +++++------ website/tests/test_readme_parser.py | 88 ++++++++-------- 5 files changed, 190 insertions(+), 178 deletions(-) diff --git a/README.md b/README.md index e37b01b1..9b40e57a 100644 --- a/README.md +++ b/README.md @@ -2,13 +2,13 @@ An opinionated guide to the best Python frameworks, libraries, tools, and resources. -# **Sponsors** +## **Sponsors** - **[pyr](https://pyrun.dev)**: Zero-config Python project manager. Bootstraps its own runtime, app-convention, and working imports - out the box. > The **#10 most-starred repo on GitHub**. Put your product in front of Python developers. [Become a sponsor](SPONSORSHIP.md). -# Categories +## Categories **AI & ML** @@ -123,11 +123,11 @@ An opinionated guide to the best Python frameworks, libraries, tools, and resour - [Microsoft Windows](#microsoft-windows) - [Miscellaneous](#miscellaneous) ---- +## Projects **AI & ML** -## AI and Agents +### AI and Agents _Libraries for building AI applications, LLM integrations, and autonomous agents._ @@ -163,7 +163,7 @@ _Libraries for building AI applications, LLM integrations, and autonomous agents - [vibevoice](https://github.com/microsoft/VibeVoice) - A family of open-source voice AI models from Microsoft for text-to-speech and long-form speech recognition. - [voxcpm](https://github.com/OpenBMB/VoxCPM) - A tokenizer-free text-to-speech foundation model for multilingual speech generation and voice cloning. -## Deep Learning +### Deep Learning _Frameworks for Neural Networks and Deep Learning. Also see [awesome-deep-learning](https://github.com/ChristosChristofidis/awesome-deep-learning)._ @@ -174,7 +174,7 @@ _Frameworks for Neural Networks and Deep Learning. Also see [awesome-deep-learni - [stable-baselines3](https://github.com/DLR-RM/stable-baselines3) - PyTorch implementations of Stable Baselines (deep) reinforcement learning algorithms. - [tensorflow](https://github.com/tensorflow/tensorflow) - The most popular Deep Learning framework created by Google. -## Machine Learning +### Machine Learning _Libraries for Machine Learning. Also see [awesome-machine-learning](https://github.com/josephmisiti/awesome-machine-learning#python)._ @@ -190,7 +190,7 @@ _Libraries for Machine Learning. Also see [awesome-machine-learning](https://git - [timesfm](https://github.com/google-research/timesfm) - A pretrained foundation model from Google Research for time-series forecasting. - [xgboost](https://github.com/dmlc/xgboost) - A scalable, portable, and distributed gradient boosting library. -## Natural Language Processing +### Natural Language Processing _Libraries for working with human languages._ @@ -203,7 +203,7 @@ _Libraries for working with human languages._ - [funnlp](https://github.com/fighting41love/funNLP) - A collection of tools and datasets for Chinese NLP. - [jieba](https://github.com/fxsjy/jieba) - The most popular Chinese text segmentation library. -## Computer Vision +### Computer Vision _Libraries for Computer Vision._ @@ -212,7 +212,7 @@ _Libraries for Computer Vision._ - [opencv](https://github.com/opencv/opencv-python) - Open Source Computer Vision Library. - [pytesseract](https://github.com/madmaze/pytesseract) - A wrapper for [Google Tesseract OCR](https://github.com/tesseract-ocr). -## Recommender Systems +### Recommender Systems _Libraries for building recommender systems._ @@ -222,7 +222,7 @@ _Libraries for building recommender systems._ **Web Development** -## Web Frameworks +### Web Frameworks _Traditional full stack web frameworks. Also see [Web APIs](#web-apis)._ @@ -245,7 +245,7 @@ _Traditional full stack web frameworks. Also see [Web APIs](#web-apis)._ - [starlette](https://github.com/Kludex/starlette) - A lightweight ASGI framework and toolkit for building high-performance async services. - [tornado](https://github.com/tornadoweb/tornado) - A web framework and asynchronous networking library. -## Web APIs +### Web APIs _Libraries for building RESTful and GraphQL APIs._ @@ -264,7 +264,7 @@ _Libraries for building RESTful and GraphQL APIs._ - [strawberry](https://github.com/strawberry-graphql/strawberry) - A GraphQL library that leverages Python type annotations for schema definition. - [webargs](https://github.com/marshmallow-code/webargs) - A friendly library for parsing HTTP request arguments with built-in support for popular web frameworks. -## Web Servers +### Web Servers _ASGI and WSGI compatible web servers._ @@ -281,7 +281,7 @@ _ASGI and WSGI compatible web servers._ - [grpcio](https://github.com/grpc/grpc) - HTTP/2-based RPC framework with Python bindings, built by Google. - [rpyc](https://github.com/tomerfiliba-org/rpyc) (Remote Python Call) - A transparent and symmetric RPC library for Python. -## WebSocket +### WebSocket _Libraries for working with WebSocket._ @@ -291,21 +291,21 @@ _Libraries for working with WebSocket._ - [picows](https://github.com/tarasko/picows) - Fastest WebSocket clients and servers with a frame level interface for the most demanding use-cases. - [websockets](https://github.com/python-websockets/websockets) - A library for building WebSocket servers and clients with a focus on correctness and simplicity. -## Template Engines +### Template Engines _Libraries and tools for templating and lexing._ - [jinja](https://github.com/pallets/jinja) - A modern and designer friendly templating language. - [mako](https://github.com/sqlalchemy/mako) - Hyperfast and lightweight templating for the Python platform. -## Web Asset Management +### Web Asset Management _Tools for managing, compressing and minifying website assets._ - [django-compressor](https://github.com/django-compressor/django-compressor) - Compresses linked and inline JavaScript or CSS into a single cached file. - [django-storages](https://github.com/jschneier/django-storages) - A collection of custom storage back ends for Django. -## Authentication +### Authentication _Libraries for implementing authentication schemes._ @@ -320,7 +320,7 @@ _Libraries for implementing authentication schemes._ - [django-guardian](https://github.com/django-guardian/django-guardian) - Implementation of per object permissions for Django 1.2+ - [django-rules](https://github.com/dfunckt/django-rules) - A tiny but powerful app providing object-level permissions to Django, without requiring a database. -## Admin Panels +### Admin Panels _Libraries for administrative interfaces._ @@ -332,7 +332,7 @@ _Libraries for administrative interfaces._ - [func-to-web](https://github.com/offerrall/FuncToWeb) - Instantly create web UIs from Python functions using type hints. Zero frontend code required. - [jet-bridge](https://github.com/jet-admin/jet-bridge) - Admin panel framework for any application with nice UI (ex Jet Django). -## CMS +### CMS _Content Management Systems._ @@ -340,7 +340,7 @@ _Content Management Systems._ - [indico](https://github.com/indico/indico) - A feature-rich event management system, made @ [CERN](https://en.wikipedia.org/wiki/CERN). - [wagtail](https://github.com/wagtail/wagtail) - A Django content management system. -## Static Site Generators +### Static Site Generators _Static site generator is a software that takes some text + templates as input and produces HTML files on the output._ @@ -350,7 +350,7 @@ _Static site generator is a software that takes some text + templates as input a **HTTP & Scraping** -## HTTP Clients +### HTTP Clients _Libraries for working with HTTP._ @@ -361,7 +361,7 @@ _Libraries for working with HTTP._ - [requests](https://github.com/psf/requests) - HTTP Requests for Humans. - [urllib3](https://github.com/urllib3/urllib3) - A HTTP library with thread-safe connection pooling, file post support, sanity friendly. -## Web Scraping +### Web Scraping _Libraries to automate web scraping and extract web content._ @@ -377,7 +377,7 @@ _Libraries to automate web scraping and extract web content._ - [sumy](https://github.com/miso-belica/sumy) - A module for automatic summarization of text documents and HTML pages. - [trafilatura](https://github.com/adbar/trafilatura) - A tool for gathering text and metadata from the web, with built-in content filtering. -## Email +### Email _Libraries for sending and parsing email, and mail server management._ @@ -386,7 +386,7 @@ _Libraries for sending and parsing email, and mail server management._ **Database & Storage** -## ORM +### ORM _Libraries that implement Object-Relational Mapping or data mapping techniques._ @@ -404,7 +404,7 @@ _Libraries that implement Object-Relational Mapping or data mapping techniques._ - [mongoengine](https://github.com/MongoEngine/mongoengine) - A Python Object-Document-Mapper for working with MongoDB. - [pynamodb](https://github.com/pynamodb/PynamoDB) - A Pythonic interface for [Amazon DynamoDB](https://aws.amazon.com/dynamodb/). -## Database Drivers +### Database Drivers _Libraries for connecting and operating databases._ @@ -425,7 +425,7 @@ _Libraries for connecting and operating databases._ - [pymongo](https://github.com/mongodb/mongo-python-driver) - The official Python client for MongoDB. - [redis-py](https://github.com/redis/redis-py) - The Python client for Redis. -## Database +### Database _Databases implemented in Python._ @@ -435,7 +435,7 @@ _Databases implemented in Python._ - [tinydb](https://github.com/msiemens/tinydb) - A tiny, document-oriented database. - [ZODB](https://github.com/zopefoundation/ZODB) - A native object database for Python. A key-value and object graph database. -## Caching +### Caching _Libraries for caching data._ @@ -444,7 +444,7 @@ _Libraries for caching data._ - [dogpile.cache](https://github.com/sqlalchemy/dogpile.cache) - dogpile.cache is a next generation replacement for Beaker made by the same authors. - [python-diskcache](https://github.com/grantjenks/python-diskcache) - SQLite and file backed cache backend with faster lookups than memcached and redis. -## Search +### Search _Libraries and software for indexing and performing search queries on data._ @@ -452,7 +452,7 @@ _Libraries and software for indexing and performing search queries on data._ - [elasticsearch-py](https://github.com/elastic/elasticsearch-py) - The official low-level Python client for [Elasticsearch](https://www.elastic.co/products/elasticsearch). - [pysolr](https://github.com/django-haystack/pysolr) - A lightweight Python wrapper for [Apache Solr](https://lucene.apache.org/solr/). -## Serialization +### Serialization _Libraries for serializing complex data types._ @@ -462,7 +462,7 @@ _Libraries for serializing complex data types._ **Data & Science** -## Data Analysis +### Data Analysis _Libraries for data analysis._ @@ -483,7 +483,7 @@ _Libraries for data analysis._ - [openbb](https://github.com/OpenBB-finance/OpenBB) - A financial data platform for analysts, quants and AI agents. - [yfinance](https://github.com/ranaroussi/yfinance) - Easy Pythonic way to download market and financial data from Yahoo Finance. -## Data Validation +### Data Validation _Libraries for validating data. Used for forms in many cases._ @@ -493,7 +493,7 @@ _Libraries for validating data. Used for forms in many cases._ - [pydantic](https://github.com/pydantic/pydantic) - Data validation using Python type hints. - [voluptuous](https://github.com/alecthomas/voluptuous) - A Python data validation library primarily intended for validating data from untrusted sources. -## Data Visualization +### Data Visualization _Libraries for visualizing data. Also see [awesome-javascript](https://github.com/sorrycc/awesome-javascript#data-visualization)._ @@ -516,7 +516,7 @@ _Libraries for visualizing data. Also see [awesome-javascript](https://github.co - [gradio](https://github.com/gradio-app/gradio) - Build and share machine learning apps, all in Python. - [streamlit](https://github.com/streamlit/streamlit) - A framework which lets you build dashboards, generate reports, or create chat apps in minutes. -## Geolocation +### Geolocation _Libraries for geocoding addresses and working with latitudes and longitudes._ @@ -526,7 +526,7 @@ _Libraries for geocoding addresses and working with latitudes and longitudes._ - [geopandas](https://github.com/geopandas/geopandas) - Python tools for geographic data (GeoSeries/GeoDataFrame) built on pandas. - [geopy](https://github.com/geopy/geopy) - Python Geocoding Toolbox. -## Science +### Science _Libraries for scientific computing. Also see [Python-for-Scientists](https://github.com/TomNicholas/Python-for-Scientists)._ @@ -557,7 +557,7 @@ _Libraries for scientific computing. Also see [Python-for-Scientists](https://gi - [networkx](https://github.com/networkx/networkx) - A high-productivity software for complex networks. - [shapely](https://github.com/shapely/shapely) - Manipulation and analysis of geometric objects in the Cartesian plane. -## Quantum Computing +### Quantum Computing _Libraries for quantum computing._ @@ -568,7 +568,7 @@ _Libraries for quantum computing._ **Developer Tools** -## Algorithms and Design Patterns +### Algorithms and Design Patterns _Python implementation of data structures, algorithms and design patterns. Also see [awesome-algorithms](https://github.com/tayllan/awesome-algorithms)._ @@ -580,7 +580,7 @@ _Python implementation of data structures, algorithms and design patterns. Also - [python-patterns](https://github.com/faif/python-patterns) - A collection of design patterns in Python. - [transitions](https://github.com/pytransitions/transitions) - A lightweight, object-oriented finite state machine implementation. -## Interactive Interpreter +### Interactive Interpreter _Interactive Python interpreters (REPL)._ @@ -589,7 +589,7 @@ _Interactive Python interpreters (REPL)._ - [marimo](https://github.com/marimo-team/marimo) - Transform data and train models, feels like a next-gen notebook, stored as Git-friendly Python. - [ptpython](https://github.com/prompt-toolkit/ptpython) - Advanced Python REPL built on top of the [python-prompt-toolkit](https://github.com/prompt-toolkit/python-prompt-toolkit). -## Code Analysis +### Code Analysis _Tools of static analysis, linters and code quality checkers. Also see [awesome-static-analysis](https://github.com/analysis-tools-dev/static-analysis)._ @@ -619,7 +619,7 @@ _Tools of static analysis, linters and code quality checkers. Also see [awesome- - [monkeytype](https://github.com/Instagram/MonkeyType) - A system for Python that generates static type annotations by collecting runtime types. - [pytype](https://github.com/google/pytype) - Pytype checks and infers types for Python code - without requiring type annotations. -## Testing +### Testing _Libraries for testing codebases and generating test data. Also see [awesome-python-testing](https://github.com/cleder/awesome-python-testing)._ @@ -655,7 +655,7 @@ _Libraries for testing codebases and generating test data. Also see [awesome-pyt - [faker](https://github.com/joke2k/faker) - A Python package that generates fake data. - [mimesis](https://github.com/lk-geimfari/mimesis) - is a Python library that help you generate fake data. -## Debugging Tools +### Debugging Tools _Libraries for debugging code._ @@ -674,7 +674,7 @@ _Libraries for debugging code._ - [icecream](https://github.com/gruns/icecream) - Inspect variables, expressions, and program execution with a single, simple function call. - [memory_graph](https://github.com/bterwijn/memory_graph) - Visualize Python data at runtime to debug references, mutability, and aliasing. -## Build Tools +### Build Tools _Compile software from source code._ @@ -685,7 +685,7 @@ _Compile software from source code._ - [doit](https://github.com/pydoit/doit) - A task runner and build tool. - [scons](https://github.com/SCons/scons) - A software construction tool. -## Documentation +### Documentation _Libraries for generating project documentation._ @@ -697,7 +697,7 @@ _Libraries for generating project documentation._ **DevOps** -## DevOps Tools +### DevOps Tools _Software and libraries for DevOps._ @@ -723,7 +723,7 @@ _Software and libraries for DevOps._ - [chaostoolkit](https://github.com/chaostoolkit/chaostoolkit) - A Chaos Engineering toolkit & Orchestration for Developers. - [pre-commit](https://github.com/pre-commit/pre-commit) - A framework for managing and maintaining multi-language pre-commit hooks. -## Distributed Computing +### Distributed Computing _Frameworks and libraries for Distributed Computing._ @@ -735,7 +735,7 @@ _Frameworks and libraries for Distributed Computing._ - [joblib](https://github.com/joblib/joblib) - A set of tools to provide lightweight pipelining in Python. - [ray](https://github.com/ray-project/ray/) - A system for parallel and distributed Python that unifies the machine learning ecosystem. -## Task Queues +### Task Queues _Libraries for working with task queues._ @@ -744,13 +744,13 @@ _Libraries for working with task queues._ - [huey](https://github.com/coleifer/huey) - Little multi-threaded task queue. - [rq](https://github.com/rq/rq) - Simple job queues for Python. -## Messaging +### Messaging _Libraries for working with message brokers and event streaming._ - [faststream](https://github.com/ag2ai/faststream) - A framework for building asynchronous services over Apache Kafka, RabbitMQ, NATS, MQTT and Redis. -## Job Schedulers +### Job Schedulers _Libraries for scheduling jobs._ @@ -761,7 +761,7 @@ _Libraries for scheduling jobs._ - [schedule](https://github.com/dbader/schedule) - Python job scheduling for humans. - [SpiffWorkflow](https://github.com/sartography/SpiffWorkflow) - A powerful workflow engine implemented in pure Python. -## Logging +### Logging _Libraries for generating and working with logs._ @@ -770,7 +770,7 @@ _Libraries for generating and working with logs._ - [loguru](https://github.com/Delgan/loguru) - Library which aims to bring enjoyable logging in Python. - [structlog](https://github.com/hynek/structlog) - Structured logging made easy. -## Network Virtualization +### Network Virtualization _Tools and libraries for Virtual Networking and SDN (Software Defined Networking)._ @@ -780,7 +780,7 @@ _Tools and libraries for Virtual Networking and SDN (Software Defined Networking **CLI & GUI** -## CLI Development +### CLI Development _Libraries for building command-line applications._ @@ -799,7 +799,7 @@ _Libraries for building command-line applications._ - [textual](https://github.com/Textualize/textual) - A framework for building interactive user interfaces that run in the terminal and the browser. - [tqdm](https://github.com/tqdm/tqdm) - Fast, extensible progress bar for loops and CLI. -## CLI Tools +### CLI Tools _Useful CLI-based tools for productivity._ @@ -818,7 +818,7 @@ _Useful CLI-based tools for productivity._ - [mycli](https://github.com/dbcli/mycli) - MySQL CLI with autocompletion and syntax highlighting. - [pgcli](https://github.com/dbcli/pgcli) - PostgreSQL CLI with autocompletion and syntax highlighting. -## GUI Development +### GUI Development _Libraries for working with graphical user interface applications._ @@ -846,7 +846,7 @@ _Libraries for working with graphical user interface applications._ **Text & Documents** -## Text Processing +### Text Processing _Libraries for parsing and manipulating plain texts._ @@ -872,7 +872,7 @@ _Libraries for parsing and manipulating plain texts._ - [python-user-agents](https://github.com/selwin/python-user-agents) - Browser user agent parser. - [sqlparse](https://github.com/andialbrecht/sqlparse) - A non-validating SQL parser. -## HTML Manipulation +### HTML Manipulation _Libraries for working with HTML and XML._ @@ -884,7 +884,7 @@ _Libraries for working with HTML and XML._ - [tinycss2](https://github.com/Kozea/tinycss2) - A low-level CSS parser and generator written in Python. - [xmltodict](https://github.com/martinblech/xmltodict) - Working with XML feel like you are working with JSON. -## File Format Processing +### File Format Processing _Libraries for parsing and manipulating specific text formats._ @@ -918,7 +918,7 @@ _Libraries for parsing and manipulating specific text formats._ - [pyyaml](https://github.com/yaml/pyyaml) - YAML implementations for Python. - [tomllib](https://docs.python.org/3/library/tomllib.html) - (Python standard library) Parse TOML files. -## File Manipulation +### File Manipulation _Libraries for file manipulation._ @@ -930,7 +930,7 @@ _Libraries for file manipulation._ **Media** -## Image Processing +### Image Processing _Libraries for manipulating images._ @@ -943,7 +943,7 @@ _Libraries for manipulating images._ - [thumbor](https://github.com/thumbor/thumbor) - A smart imaging service. It enables on-demand crop, re-sizing and flipping of images. - [wand](https://github.com/emcconville/wand) - Python bindings for [MagickWand](https://www.imagemagick.org/script/magick-wand.php), C API for ImageMagick. -## Audio & Video Processing +### Audio & Video Processing _Libraries for manipulating audio, video, and their metadata._ @@ -960,7 +960,7 @@ _Libraries for manipulating audio, video, and their metadata._ - [mutagen](https://github.com/quodlibet/mutagen) - A Python module to handle audio metadata. - [tinytag](https://github.com/devsnd/tinytag) - A library for reading music meta data of MP3, OGG, FLAC and Wave files. -## Game Development +### Game Development _Awesome game development libraries._ @@ -973,7 +973,7 @@ _Awesome game development libraries._ **Python Language** -## Implementations +### Implementations _Implementations of Python._ @@ -984,7 +984,7 @@ _Implementations of Python._ - [pyodide](https://github.com/pyodide/pyodide) - Python distribution for the browser and Node.js based on WebAssembly. - [pypy](https://github.com/pypy/pypy) - A very fast and compliant implementation of the Python language. -## Built-in Classes Enhancement +### Built-in Classes Enhancement _Libraries for enhancing Python built-in classes._ @@ -992,7 +992,7 @@ _Libraries for enhancing Python built-in classes._ - [bidict](https://github.com/jab/bidict) - Efficient, Pythonic bidirectional map data structures and related functionality. - [box](https://github.com/cdgriffith/Box) - Python dictionaries with advanced dot notation access. -## Functional Programming +### Functional Programming _Functional Programming with Python._ @@ -1003,7 +1003,7 @@ _Functional Programming with Python._ - [returns](https://github.com/dry-python/returns) - A set of type-safe monads, transformers, and composition utilities. - [toolz](https://github.com/pytoolz/toolz) - A collection of functional utilities for iterators, functions, and dictionaries. Also available as [cytoolz](https://github.com/pytoolz/cytoolz/) for Cython-accelerated performance. -## Asynchronous Programming +### Asynchronous Programming _Libraries for asynchronous, concurrent and parallel execution. Also see [awesome-asyncio](https://github.com/timofurrer/awesome-asyncio)._ @@ -1017,7 +1017,7 @@ _Libraries for asynchronous, concurrent and parallel execution. Also see [awesom - [twisted](https://github.com/twisted/twisted) - An event-driven networking engine. - [uvloop](https://github.com/MagicStack/uvloop) - Ultra fast asyncio event loop. -## Date and Time +### Date and Time _Libraries for working with dates and times._ @@ -1028,7 +1028,7 @@ _Libraries for working with dates and times._ **Python Toolchain** -## Environment Management +### Environment Management _Libraries for Python version and virtual environment management._ @@ -1038,7 +1038,7 @@ _Libraries for Python version and virtual environment management._ - [uv](https://github.com/astral-sh/uv) - An extremely fast Python version, package and project manager, written in Rust. - [virtualenv](https://github.com/pypa/virtualenv) - A tool to create isolated Python environments. -## Package Management +### Package Management _Libraries for package and dependency management._ @@ -1048,7 +1048,7 @@ _Libraries for package and dependency management._ - [poetry](https://github.com/python-poetry/poetry) - Python dependency management and packaging made easy. - [uv](https://github.com/astral-sh/uv) - An extremely fast Python version, package and project manager, written in Rust. -## Package Repositories +### Package Repositories _Local PyPI repository server and proxies._ @@ -1056,7 +1056,7 @@ _Local PyPI repository server and proxies._ - [devpi](https://github.com/devpi/devpi) - PyPI server and packaging/testing/release tool. - [warehouse](https://github.com/pypa/warehouse) - Next generation Python Package Repository (PyPI). -## Distribution +### Distribution _Libraries to create packaged executables for release distribution._ @@ -1066,7 +1066,7 @@ _Libraries to create packaged executables for release distribution._ - [pyinstaller](https://github.com/pyinstaller/pyinstaller) - Converts Python programs into stand-alone executables (cross-platform). - [shiv](https://github.com/linkedin/shiv) - A command line utility for building fully self-contained zipapps (PEP 441), but with all their dependencies included. -## Configuration Files +### Configuration Files _Libraries for storing and parsing configuration options._ @@ -1078,13 +1078,13 @@ _Libraries for storing and parsing configuration options._ **Security** -## Cryptography +### Cryptography - [cryptography](https://github.com/pyca/cryptography) - A package designed to expose cryptographic primitives and recipes to Python developers. - [paramiko](https://github.com/paramiko/paramiko) - The leading native Python SSHv2 protocol library. - [pynacl](https://github.com/pyca/pynacl) - Python binding to the Networking and Cryptography (NaCl) library. -## Penetration Testing +### Penetration Testing _Frameworks and tools for penetration testing._ @@ -1093,7 +1093,7 @@ _Frameworks and tools for penetration testing._ - [sherlock](https://github.com/sherlock-project/sherlock) - Hunt down social media accounts by username across social networks. - [sqlmap](https://github.com/sqlmapproject/sqlmap) - Automatic SQL injection and database takeover tool. -## Web Security +### Web Security _Libraries for application-layer web security._ @@ -1101,14 +1101,14 @@ _Libraries for application-layer web security._ **Other** -## Hardware +### Hardware _Libraries for programming with hardware._ - [bleak](https://github.com/hbldh/bleak) - A cross platform Bluetooth Low Energy Client for Python using asyncio. - [pynput](https://github.com/moses-palmer/pynput) - A library to control and monitor input devices. -## Microsoft Windows +### Microsoft Windows _Python programming on Microsoft Windows._ @@ -1116,7 +1116,7 @@ _Python programming on Microsoft Windows._ - [pywin32](https://github.com/mhammond/pywin32) - Python Extensions for Windows. - [winpython](https://github.com/winpython/winpython) - Portable development environment for Windows 10/11. -## Miscellaneous +### Miscellaneous _Useful libraries or tools that don't fit in the categories above._ @@ -1125,18 +1125,18 @@ _Useful libraries or tools that don't fit in the categories above._ - [itsdangerous](https://github.com/pallets/itsdangerous) - Various helpers to pass trusted data to untrusted environments. - [tryton](https://github.com/tryton/tryton) - A general-purpose business framework. -# Resources +## Resources Where to discover learning resources or new Python libraries. -## Newsletters +### Newsletters - [Awesome Python Newsletter](http://python.libhunt.com/newsletter) - [Pycoder's Weekly](https://pycoders.com/) - [Python Tricks](https://realpython.com/python-tricks/) - [Python Weekly](https://www.pythonweekly.com/) -## Podcasts +### Podcasts - [Django Chat](https://djangochat.com/) - [PyPodcats](https://pypodcats.live) @@ -1144,11 +1144,11 @@ Where to discover learning resources or new Python libraries. - [Talk Python To Me](https://talkpython.fm/) - [The Real Python Podcast](https://realpython.com/podcasts/rpp/) -## Websites +### Websites - [Python Developer Tooling Handbook](https://pydevtools.com/) - Comprehensive guide to modern Python developer tools covering package management, linting, type checking, testing, and more. -# Contributing +## Contributing Your contributions are always welcome! Please take a look at the [contribution guidelines](https://github.com/vinta/awesome-python/blob/master/CONTRIBUTING.md) first. diff --git a/website/build.py b/website/build.py index 88e69ab2..96cf0506 100644 --- a/website/build.py +++ b/website/build.py @@ -243,13 +243,14 @@ def write_sitemap_xml(path: Path, urls: Sequence[tuple[str, str]]) -> None: def top_level_heading_text(line: str) -> str | None: stripped = line.strip() - if not stripped.startswith("# "): + match = re.match(r"^(#{1,2})\s+(.+)$", stripped) + if match is None: return None - return stripped.removeprefix("#").strip().strip("#").strip().strip("*").strip() + return match.group(2).strip().strip("#").strip().strip("*").strip() def extract_categories_body(markdown: str) -> str: - """Return content under the `# Categories` heading, excluding the heading line itself.""" + """Return content from `Categories` through `Projects`, excluding later sections.""" lines = markdown.splitlines(keepends=True) start_idx = None end_idx = len(lines) @@ -261,7 +262,7 @@ def extract_categories_body(markdown: str) -> str: start_idx = i + 1 while start_idx < len(lines) and lines[start_idx].strip() == "": start_idx += 1 - elif start_idx is not None and i >= start_idx: + elif start_idx is not None and heading.lower() in ("resources", "contributing"): end_idx = i break if start_idx is None: diff --git a/website/readme_parser.py b/website/readme_parser.py index a80acc32..61bf1e98 100644 --- a/website/readme_parser.py +++ b/website/readme_parser.py @@ -114,6 +114,13 @@ def _heading_text(node: SyntaxTreeNode) -> str: return "" +def _heading_level(node: SyntaxTreeNode) -> int | None: + """Return the numeric level for a heading node.""" + if node.type != "heading" or not node.tag.startswith("h"): + return None + return int(node.tag[1:]) + + def _extract_description_children(nodes: list[SyntaxTreeNode]) -> list[SyntaxTreeNode]: """Extract description children from the first paragraph if it's a single block. @@ -303,7 +310,7 @@ def _parse_grouped_sections( ) -> list[ParsedGroup]: """Parse nodes into groups of categories using bold markers as group boundaries. - Bold-only paragraphs (**Group Name**) delimit groups. H2 headings under each + Bold-only paragraphs (**Group Name**) delimit groups. H3 headings under each bold marker become categories within that group. Categories appearing before any bold marker go into an "Other" group. """ @@ -341,7 +348,7 @@ def _parse_grouped_sections( flush_group() current_group_name = bold_name current_cat_body = [] - elif node.type == "heading" and node.tag == "h2": + elif node.type == "heading" and node.tag in ("h2", "h3"): flush_cat() current_cat_name = _heading_text(node) current_cat_body = [] @@ -383,7 +390,7 @@ def _parse_sponsor_item(inline: SyntaxTreeNode) -> ParsedSponsor | None: def parse_sponsors(text: str) -> list[ParsedSponsor]: - """Parse the `# Sponsors` section of README.md into a list of sponsors. + """Parse the `Sponsors` section of README.md into a list of sponsors. Expects bullets in the form `**[name](url)**: description`. Returns [] if no Sponsors section exists. @@ -395,14 +402,18 @@ def parse_sponsors(text: str) -> list[ParsedSponsor]: start_idx = None end_idx = len(children) + start_level = None for i, node in enumerate(children): - if node.type == "heading" and node.tag == "h1": - title = _heading_text(node).strip().lower() - if start_idx is None and title == "sponsors": - start_idx = i + 1 - elif start_idx is not None: - end_idx = i - break + level = _heading_level(node) + if level is None: + continue + title = _heading_text(node).strip().lower() + if start_idx is None and title == "sponsors": + start_idx = i + 1 + start_level = level + elif start_idx is not None and start_level is not None and level <= start_level: + end_idx = i + break if start_idx is None: return [] @@ -426,26 +437,26 @@ def parse_readme(text: str) -> list[ParsedGroup]: """Parse README.md text into grouped categories. Returns a list of ParsedGroup dicts containing nested categories. - Content between the thematic break (---) and # Resources or # Contributing - is parsed as categories grouped by bold markers (**Group Name**). + Content between the Projects heading and Resources or Contributing is parsed + as categories grouped by bold markers (**Group Name**). """ md = MarkdownIt("commonmark") tokens = md.parse(text) root = SyntaxTreeNode(tokens) children = root.children - # Find thematic break (---) and section boundaries in one pass - hr_idx = None + # Find Projects and section boundaries in one pass. + projects_idx = None cat_end_idx = None for i, node in enumerate(children): - if hr_idx is None and node.type == "hr": - hr_idx = i - elif node.type == "heading" and node.tag == "h1": + if _heading_level(node) in (1, 2): text_content = _heading_text(node) - if cat_end_idx is None and text_content in ("Resources", "Contributing"): + if projects_idx is None and text_content == "Projects": + projects_idx = i + elif cat_end_idx is None and text_content in ("Resources", "Contributing"): cat_end_idx = i - if hr_idx is None: + if projects_idx is None: return [] - cat_nodes = children[hr_idx + 1 : cat_end_idx or len(children)] + cat_nodes = children[projects_idx + 1 : cat_end_idx or len(children)] return _parse_grouped_sections(cat_nodes) diff --git a/website/tests/test_build.py b/website/tests/test_build.py index f4781bd6..0f99351a 100644 --- a/website/tests/test_build.py +++ b/website/tests/test_build.py @@ -137,31 +137,31 @@ class TestBuild: Intro. - --- + ## Projects **Tools** - ## Widgets + ### Widgets _Widget libraries. Also see [awesome-widgets](https://example.com/widgets)._ - [w1](https://example.com) - A widget. - ## Gadgets + ### Gadgets _Gadget tools._ - [g1](https://example.com) - A gadget. - # Resources + ## Resources Info. - ## Newsletters + ### Newsletters - [NL](https://example.com) - # Contributing + ## Contributing Help! """) @@ -179,17 +179,17 @@ class TestBuild: Intro. - --- + ## Projects **Tools** - ## Widgets + ### Widgets - Sync - [w1](https://example.com) - A widget. - # Contributing + ## Contributing Help! """) @@ -232,7 +232,7 @@ class TestBuild: Intro. - --- + ## Projects **Tools** @@ -298,28 +298,28 @@ class TestBuild: Intro. - # **Sponsors** + ## **Sponsors** - **[Sponsor](https://sponsor.example.com)**: Sponsored tool. > Become a sponsor: [Sponsor us](SPONSORSHIP.md). - # Categories + ## Categories **Tools** - [Widgets](#widgets) - --- + ## Projects **Tools** - ## Widgets + ### Widgets - [w1](https://example.com) - A widget. - [w2](https://github.com/owner/w2) - A starred widget. - # Contributing + ## Contributing Help! """) @@ -353,7 +353,7 @@ class TestBuild: assert "## Categories" in llms_txt assert "**Tools**" in llms_txt assert "- [Widgets](#widgets)" in llms_txt - assert "## Widgets" in llms_txt + assert "### Widgets" in llms_txt assert "- [w1](https://example.com) - A widget." in llms_txt assert "- [w2](https://github.com/owner/w2) - A starred widget. (GitHub stars: 42)" in llms_txt assert llms_txt != readme @@ -363,7 +363,7 @@ class TestBuild: readme = textwrap.dedent("""\ # T - --- + ## Projects ## Only @@ -387,7 +387,7 @@ class TestBuild: readme = textwrap.dedent("""\ # T - --- + ## Projects ## Stuff @@ -431,7 +431,7 @@ class TestBuild: readme = textwrap.dedent("""\ # T - --- + ## Projects **Widgets** @@ -538,7 +538,7 @@ class TestBuild: Intro. - --- + ## Projects **Tools** @@ -591,7 +591,7 @@ class TestBuild: readme = textwrap.dedent("""\ # T - --- + ## Projects **AI & ML** @@ -624,7 +624,7 @@ class TestBuild: readme = textwrap.dedent("""\ # T - --- + ## Projects **Web** @@ -659,7 +659,7 @@ class TestBuild: readme = textwrap.dedent("""\ # T - --- + ## Projects **Web** @@ -691,7 +691,7 @@ class TestBuild: readme = textwrap.dedent("""\ # T - --- + ## Projects **AI & ML** @@ -731,7 +731,7 @@ class TestBuild: readme = textwrap.dedent("""\ # T - --- + ## Projects ## Sneaky @@ -760,7 +760,7 @@ class TestBuild: readme = textwrap.dedent("""\ # T - --- + ## Projects **AI & ML** @@ -800,7 +800,7 @@ class TestBuild: readme = textwrap.dedent("""\ # T - --- + ## Projects **AI & ML** @@ -980,7 +980,7 @@ class TestExtractEntries: readme = textwrap.dedent("""\ # T - --- + ## Projects **Tools** @@ -1004,7 +1004,7 @@ class TestExtractEntries: readme = textwrap.dedent("""\ # T - --- + ## Projects **Tools** @@ -1031,7 +1031,7 @@ class TestExtractEntries: readme = textwrap.dedent("""\ # T - --- + ## Projects ## Stdlib @@ -1050,7 +1050,7 @@ class TestExtractEntries: readme = textwrap.dedent("""\ # T - --- + ## Projects **Tools** diff --git a/website/tests/test_readme_parser.py b/website/tests/test_readme_parser.py index 5273580f..4d0b7ae7 100644 --- a/website/tests/test_readme_parser.py +++ b/website/tests/test_readme_parser.py @@ -81,35 +81,35 @@ MINIMAL_README = textwrap.dedent("""\ Some intro text. - --- + ## Projects - ## Alpha + ### Alpha _Libraries for alpha stuff._ - [lib-a](https://example.com/a) - Does A. - [lib-b](https://example.com/b) - Does B. - ## Beta + ### Beta _Tools for beta._ - [lib-c](https://example.com/c) - Does C. - # Resources + ## Resources Where to discover resources. - ## Newsletters + ### Newsletters - [News One](https://example.com/n1) - [News Two](https://example.com/n2) - ## Podcasts + ### Podcasts - [Pod One](https://example.com/p1) - # Contributing + ## Contributing Please contribute! """) @@ -120,11 +120,11 @@ GROUPED_README = textwrap.dedent("""\ Some intro text. - --- + ## Projects **Group One** - ## Alpha + ### Alpha _Libraries for alpha stuff._ @@ -133,25 +133,25 @@ GROUPED_README = textwrap.dedent("""\ **Group Two** - ## Beta + ### Beta _Tools for beta._ - [lib-c](https://example.com/c) - Does C. - ## Gamma + ### Gamma - [lib-d](https://example.com/d) - Does D. - # Resources + ## Resources Where to discover resources. - ## Newsletters + ### Newsletters - [News One](https://example.com/n1) - # Contributing + ## Contributing Please contribute! """) @@ -191,7 +191,7 @@ class TestParseReadmeSections: all_names.extend(c["name"] for c in g["categories"]) assert "Contributing" not in all_names - def test_no_separator(self): + def test_no_projects_heading(self): groups = parse_readme("# Just a heading\n\nSome text.\n") assert groups == [] @@ -199,19 +199,19 @@ class TestParseReadmeSections: readme = textwrap.dedent("""\ # Title - --- + ## Projects - ## NullDesc + ### NullDesc - [item](https://x.com) - Thing. - # Resources + ## Resources - ## Tips + ### Tips - [tip](https://x.com) - # Contributing + ## Contributing Done. """) @@ -225,15 +225,15 @@ class TestParseReadmeSections: readme = textwrap.dedent("""\ # T - --- + ## Projects - ## Algos + ### Algos _Algorithms. Also see [awesome-algos](https://example.com)._ - [lib](https://x.com) - Lib. - # Contributing + ## Contributing Done. """) @@ -273,17 +273,17 @@ class TestParseGroupedReadme: readme = textwrap.dedent("""\ # T - --- + ## Projects **Empty** **HasCats** - ## Cat + ### Cat - [x](https://x.com) - X. - # Contributing + ## Contributing Done. """) @@ -295,15 +295,15 @@ class TestParseGroupedReadme: readme = textwrap.dedent("""\ # T - --- + ## Projects **Note:** This is not a group marker. - ## Cat + ### Cat - [x](https://x.com) - X. - # Contributing + ## Contributing Done. """) @@ -317,19 +317,19 @@ class TestParseGroupedReadme: readme = textwrap.dedent("""\ # T - --- + ## Projects - ## Orphan + ### Orphan - [x](https://x.com) - X. **A Group** - ## Grouped + ### Grouped - [y](https://x.com) - Y. - # Contributing + ## Contributing Done. """) @@ -405,15 +405,15 @@ class TestParseSectionEntries: readme = textwrap.dedent("""\ # T - --- + ## Projects - ## Async + ### Async - [asyncio](https://x.com) - Async I/O. - [awesome-asyncio](https://y.com) - [trio](https://z.com) - Friendly async. - # Contributing + ## Contributing Done. """) @@ -480,21 +480,21 @@ class TestParseRealReadme: md = MarkdownIt("commonmark") root = SyntaxTreeNode(md.parse(self.readme_text)) - # Find category section boundaries (between --- and # Resources/Contributing) - hr_idx = None + # Find category section boundaries (between Projects and Resources/Contributing) + projects_idx = None end_idx = None for i, node in enumerate(root.children): - if hr_idx is None and node.type == "hr": - hr_idx = i - elif node.type == "heading" and node.tag == "h1": + if node.type == "heading" and node.tag in ("h1", "h2"): text = render_inline_text(node.children[0].children) if node.children else "" - if end_idx is None and text in ("Resources", "Contributing"): + if projects_idx is None and text == "Projects": + projects_idx = i + elif end_idx is None and text in ("Resources", "Contributing"): end_idx = i - if hr_idx is None: + if projects_idx is None: return bad = [] - cat_nodes = root.children[hr_idx + 1 : end_idx or len(root.children)] + cat_nodes = root.children[projects_idx + 1 : end_idx or len(root.children)] for node in cat_nodes: if node.type != "bullet_list": continue