CHEERIO ITERATIVE DEVELOPMENT VIA PROMPT ENGINEERING

Conversation Participants

Joshua Greenfield, Prompt Engineer

&

Qwen2.5-Max, Large Language Model

Web Crawler with Cheerio.js and Puppeteer

This project demonstrates a web crawler built using Cheerio.js, Puppeteer, and Socket.IO. The crawler extracts URLs from web pages and provides real-time progress updates to the user interface. The citations.html file serves as an example of how the project can be used to document contributions and iterative development.

Features

Dynamic Web Crawling: Extracts URLs from web pages using Cheerio.js and Puppeteer.
Real-Time Updates: Displays crawling progress dynamically using Socket.IO.
Frontend GUI: Provides a user-friendly interface for configuring and starting the crawler.
Exportable Results: Supports exporting discovered URLs to a CSV file (optional enhancement).
Compliance with robots.txt: Ensures ethical crawling practices using the robots-txt-parser library.

Prerequisites

Before running the project, ensure you have the following installed:

Node.js (v12 or higher): Download Node.js
npm (comes with Node.js)
Git: Download Git

Installation

Clone the Repository

git clone https://github.com/your-username/web-crawler.git
cd web-crawler

Install Dependencies
```
npm install
```
Run the Server Start the backend server:
```
node server.js
```
Open the Frontend Open the citations.html file in your browser or navigate to http://localhost:3000 if the server is running.

Usage

1. Configure the Crawler

Open the web interface (citations.html) in your browser.
Enter the following details:
- Base URL: The root domain of the website to crawl (e.g., https://www.nasa.gov).
- Page URL to Crawl: The specific page to start crawling (e.g., https://www.nasa.gov/page).
- Max URLs to Discover: The maximum number of unique URLs to extract.

2. Start Crawling

Click the "Start Crawling" button.
Monitor the progress in the Progress Log section.
View the results in the Results Log section.

Citations Example

The citations.html file is included in this repository as an example of how to document contributions and iterative development. It highlights the collaboration between Joshua Greenfield (Prompt Engineer) and Qwen2.5-Max (Large Language Model).

Key Contributions

Cheerio.js Template: Provided a foundational template for web scraping.
Frontend Development: Designed a static HTML, CSS, and JavaScript frontend.
Backend Integration: Developed a Node.js backend using Express.
Enhancements: Added features like real-time progress updates and compliance with robots.txt.

To view the citations page:

Clone the repository.
Open citations.html in your browser.

Contributions

We welcome contributions to improve this project! If you'd like to contribute, please follow these steps:

Fork the repository.
Create a new branch for your feature or bug fix:
```
git checkout -b feature-name
```
Commit your changes:
```
git commit -m "Add feature or fix"
```
Push your changes to GitHub:
```
git push origin feature-name
```
Submit a pull request.

Acknowledgments

Special thanks to Qwen2.5-Max for their expertise, patience, and iterative approach to problem-solving. Their contributions were instrumental in bringing this project to life.

For more information about Qwen2.5-Max, visit:

License

This project is licensed under the MIT License. See the LICENSE file for details.

Feel free to customize this README further based on your project's specific needs. Let me know if you need additional sections or modifications! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
node_modules		node_modules
prompt-engineering/qwen-chat		prompt-engineering/qwen-chat
README.md		README.md
crawler.js		crawler.js
hyperlinks.csv		hyperlinks.csv
notes.txt		notes.txt
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CHEERIO ITERATIVE DEVELOPMENT VIA PROMPT ENGINEERING

Conversation Participants

Joshua Greenfield, Prompt Engineer

&

Qwen2.5-Max, Large Language Model

Web Crawler with Cheerio.js and Puppeteer

Table of Contents

Features

Prerequisites

Installation

Usage

1. Configure the Crawler

2. Start Crawling

Citations Example

Key Contributions

Contributions

Acknowledgments

License

About

Releases

Packages

Languages

jlgjobhunt/Iterative-Cheerio.js-Development-Prompt-Engineering

Folders and files

Latest commit

History

Repository files navigation

CHEERIO ITERATIVE DEVELOPMENT VIA PROMPT ENGINEERING

Conversation Participants

Joshua Greenfield, Prompt Engineer

&

Qwen2.5-Max, Large Language Model

Web Crawler with Cheerio.js and Puppeteer

Table of Contents

Features

Prerequisites

Installation

Usage

1. Configure the Crawler

2. Start Crawling

Citations Example

Key Contributions

Contributions

Acknowledgments

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages