# Java Web Scraper A complete web scraping application demonstrating: - **CLI Interface** - **MVC Architecture** - **Command Pattern** - **Strategy Pattern** - **Custom Exception Hierarchy** ## Features - 3 different scraping strategies: - `news_scraper` - Scrapes quotes from http://quotes.toscrape.com - `books_scraper` - Scrapes books from https://books.toscrape.com - `tech_news_scraper` - Scrapes news from https://www.bbc.com/news - Saves data to JSON files - Command-line interface - Extensible architecture ## Building ```bash cd java-scraper mvn clean package ``` ## Usage ### List available scrapers: ```bash mvn exec:java -Dexec.mainClass="com.scraper.Main" -Dexec.args="list" ``` ### Scrape using a specific strategy: ```bash mvn exec:java -Dexec.mainClass="com.scraper.Main" -Dexec.args="scrape news_scraper" ``` ### Scrape all: ```bash mvn exec:java -Dexec.mainClass="com.scraper.Main" -Dexec.args="scrape all" ``` ### Custom output directory: ```bash mvn exec:java -Dexec.mainClass="com.scraper.Main" -Dexec.args="scrape news_scraper --output my_data" ``` ### Using the built JAR: ```bash java -jar target/java-scraper-1.0-SNAPSHOT.jar list java -jar target/java-scraper-1.0-SNAPSHOT.jar scrape news_scraper ``` ## Architecture ### MVC - **Model**: `ScrapedItem`, `ScrapedData` - **View**: `ConsoleView` - **Controller**: `ScraperController` ### Command Pattern - `Command` interface - `ScrapeCommand` - `ListCommand` ### Strategy Pattern - `ScraperStrategy` interface - `NewsScraperStrategy` - `BooksScraperStrategy` - `TechNewsScraperStrategy` ### Exception Hierarchy - `ScraperException` (base) - `NetworkException` - `ParseException` - `StorageException` - `StrategyException` ## Requirements - Java 11 or higher - Maven