A Python script that scrapes BuiltWith.com pages for technology information used by websites, compiling a unique, sorted list of technologies.
This project is designed to quickly and easily compile the technologies a website uses by scraping its BuiltWith.com page. You can list multiple websites in an endpoints.txt
file (each on a new line) to process them in batch. The script deduplicates the results so that only unique technologies are included in the final output.
endpoints.txt
.output.txt
.Clone or Download the Repository
git clone https://github.com/thejessicafelts/builtwith-scraper.git
cd builtwith-scraper
Install Dependencies
Use pip to install the necessary libraries:
pip install requests beautifulsoup4
Prepare Endpoints File
Create an endpoints.txt
file in the project directory. Each line should contain the website endpoint (the part of the URL following http://builtwith.com/
). For example:
example1.com
example2.com
example3.com
Run the Python script:
python3 script.py
The script will:
endpoints.txt
http://builtwith.com/
output.txt
Contributions are welcome! Feel free to open an issue or submit a pull request for any improvements or bug fixes.