You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1.7 KiB

Java Web Scraper

A complete web scraping application demonstrating:

  • CLI Interface
  • MVC Architecture
  • Command Pattern
  • Strategy Pattern
  • Custom Exception Hierarchy

Features

Building

cd java-scraper
mvn clean package

Usage

List available scrapers:

mvn exec:java -Dexec.mainClass="com.scraper.Main" -Dexec.args="list"

Scrape using a specific strategy:

mvn exec:java -Dexec.mainClass="com.scraper.Main" -Dexec.args="scrape news_scraper"

Scrape all:

mvn exec:java -Dexec.mainClass="com.scraper.Main" -Dexec.args="scrape all"

Custom output directory:

mvn exec:java -Dexec.mainClass="com.scraper.Main" -Dexec.args="scrape news_scraper --output my_data"

Using the built JAR:

java -jar target/java-scraper-1.0-SNAPSHOT.jar list
java -jar target/java-scraper-1.0-SNAPSHOT.jar scrape news_scraper

Architecture

MVC

  • Model: ScrapedItem, ScrapedData
  • View: ConsoleView
  • Controller: ScraperController

Command Pattern

  • Command interface
  • ScrapeCommand
  • ListCommand

Strategy Pattern

  • ScraperStrategy interface
  • NewsScraperStrategy
  • BooksScraperStrategy
  • TechNewsScraperStrategy

Exception Hierarchy

  • ScraperException (base)
  • NetworkException
  • ParseException
  • StorageException
  • StrategyException

Requirements

  • Java 11 or higher
  • Maven