diff --git a/DataCleaner.java b/DataCleaner.java deleted file mode 100644 index 0b8b9be..0000000 --- a/DataCleaner.java +++ /dev/null @@ -1,31 +0,0 @@ -package Homework; - -public class DataCleaner { - public static void main(String[] args) { - int[] sensorData = {85, -5, 92, 0, 105, 999, 88, 76}; - - int validSum = 0; // 有效数据总和 - int validCount = 0; // 有效数据个数 - - // 请在此处编写你的流程控制代码 - for (int i=0;i=1&&sensorData[i]<=100){ - validSum+=sensorData[i]; - validCount++; - }else if((sensorData[i]>100||sensorData[i]<=0)&&sensorData[i]!=999){ - System.out.println("警告:发现越界数据 [数值],已跳过"); - continue; - }else if(sensorData[i]==999){ - System.out.println("致命错误:传感器掉线,终止处理"); - break; - } - } - if (validCount>0){ - double average=0; - average=(double)validSum/validCount; - System.out.println("有效数据平均值为:"+average); - }else{ - System.out.println("无有效的数据"); - } -} -} diff --git a/w11/java-cli-w11/.gitignore b/w11/java-cli-w11/.gitignore deleted file mode 100644 index 0ebcf1a..0000000 --- a/w11/java-cli-w11/.gitignore +++ /dev/null @@ -1,4 +0,0 @@ -*.jar -*.jar -*.class -*.log \ No newline at end of file diff --git a/w11/java-cli-w11/W10 PPT.md b/w11/java-cli-w11/W10 PPT.md deleted file mode 100644 index d4ba310..0000000 --- a/w11/java-cli-w11/W10 PPT.md +++ /dev/null @@ -1,492 +0,0 @@ ---- -id: "24" -title: w10-设计模式 -slug: w10-design-patterns -status: draft -view_count: 0 -created_at: 2026-05-07T12:00:00+08:00 -updated_at: 2026-05-07T14:00:00.000000000+08:00 ---- - -# 高级程序设计 · 第10周 - -### 设计模式:灵活性与可扩展性 - -### 策略模式 + 工厂 + Repository 实战 - ---- - -### 📌 本周导航 - -- W9回顾:骨架的成就与隐患 -- 策略模式:解析器的“插头标准” -- 解析器工厂:自动匹配的魔法 -- Repository:武装数据访问 -- 整体架构串联:调用链全程 -- 代码落地 + 实践任务 -- 架构反思 + W11 预告 - ---- - -## 1️⃣ W9回顾:骨架的成就与隐患 - -### 我们建了一座漂亮的房子 - -- ✅ MVC 分层清晰 -- ✅ Command 模式:**新增命令,Controller 零改动** -- ✅ 所有输出走 `ConsoleView` -- ✅ 工程包结构标准 - ---- - -### 但问题也随之而来 - -```java -// CrawlCommand 里解析逻辑怎么办? -if (url.contains("blog.example.com")) { - // 博客解析... -} else if (url.contains("news.example.com")) { - // 新闻解析... -} else { - view.printError("Unsupported website!"); -} -``` - -> 😫 每支持一个新网站,就要加一个 `else if` - ---- - -### 还有另一个“裸奔”的数据 - -```java -List
articles = new ArrayList<>(); -// 所有 Command 都可以: -articles.clear(); -articles.add(null); -articles.remove(0); -``` - -> 🚨 数据没有任何保护,靠口头约定是靠不住的 - ---- - -### 本周任务 - -1. **解析逻辑可插拔** → 策略模式 + 工厂 -2. **数据访问加守卫** → Repository 模式 - -> W9 搭骨架,W10 装盔甲 - ---- - -## 2️⃣ 策略模式:解析器的“插头标准” - -### 墙上的插座,为什么什么电器都能插? - -- **三孔插座** 是标准接口 -- 电视、电脑、手机充电器都实现这个接口 -- 插座不关心你是什么电器 - ---- - -### 爬虫的世界也一样 - -- `CrawlStrategy` = 插座接口 -- `BlogStrategy`、`NewsStrategy` = 具体电器 -- `CrawlCommand` = 使用电器的人 -- `StrategyFactory` = 插座面板 - ---- - -### 接口即合同 - -```java -public interface CrawlStrategy { - List
parse(String url, Document doc); - boolean supports(String url); -} -``` - -- `supports()`:我能不能处理这个 URL? -- `parse()`:怎么解析? -- **任何网站想被爬,签这份合同!** - ---- - -### 策略 vs 硬编码 - -| 维度 | if-else 屎山 | 策略模式 | -|------|-------------|----------| -| 新增网站 | 改 Command | 新建策略类 | -| 修改解析 | 翻找 else if | 只改对应类 | -| 测试 | 启动整个爬虫 | 单独测策略 | -| 开闭原则 | ❌ 修改开放 | ✅ 扩展开放,修改关闭 | - ---- - -### 具体策略示例 - -```java -public class BlogStrategy implements CrawlStrategy { - public boolean supports(String url) { - return url.contains("blog.example.com"); - } - public List
parse(String url, Document doc) { - List
articles = new ArrayList<>(); - for (Element e : doc.select(".post-title")) { - articles.add(new Article(e.text(), url, "")); - } - return articles; - } -} -``` - -> ✨ 一个新网站,一个独立类,各扫门前雪 - ---- - -## 3️⃣ 解析器工厂:自动匹配的魔法 - -### 谁来选择策略? - -- 如果 `CrawlCommand` 遍历所有策略 → 策略模式白用了 -- 我们需要一个黑盒子:**丢入 URL,返回合适的解析器** - ---- - -### 工厂登场 - -```java -public class StrategyFactory { - private final List strategies = new ArrayList<>(); - - public StrategyFactory() { - strategies.add(new BlogStrategy()); - strategies.add(new NewsStrategy()); - } - - public CrawlStrategy getStrategy(String url) { - for (CrawlStrategy s : strategies) { - if (s.supports(url)) return s; - } - return null; - } -} -``` - -> 🔧 新增网站只需:新建策略类 + 工厂里注册一行 - ---- - -### 开闭原则的胜利 - -- ✅ `CrawlCommand` 完全不改 -- ✅ 新增 `XxxStrategy` 和一行注册 -- ✅ 所有策略的调用方式完全一致 - -> 这就是 **“对扩展开放,对修改关闭”** - ---- - -### 重构后的 CrawlCommand - -```java -public void execute(String[] args, ArticleRepository repository) { - String url = args[1]; - CrawlStrategy strategy = strategyFactory.getStrategy(url); - if (strategy == null) { - view.printError("No strategy for: " + url); - return; - } - Document doc = Jsoup.connect(url).get(); - List
parsed = strategy.parse(url, doc); - for (Article a : parsed) { - repository.add(a); - } - view.printSuccess("Crawled " + parsed.size() + " articles."); -} -``` - -> 🧠 CrawlCommand 现在只做 **“调度”**,不做解析 - ---- - -## 4️⃣ Repository:武装数据访问 - -### 共享 List 的问题 - -```java -articles.clear(); // 清空 -articles.add(null); // 塞 null -articles.remove(0); // 随意删除 -``` - -> 靠约定维护的秩序,终将被打破 - ---- - -### 给数据装上防盗门 - -```java -public class ArticleRepository { - private final List
articles = new ArrayList<>(); - - public void add(Article article) { - if (article == null) throw new IllegalArgumentException(...); - articles.add(article); - } - - public List
getAll() { - return Collections.unmodifiableList(articles); - } - - public int size() { return articles.size(); } - - public void clear() { articles.clear(); } -} -``` - ---- - -### 三道防线 - -| 机制 | 作用 | -|------|------| -| **add 拒绝 null** | 规则写在代码里,不靠口头约定 | -| **getAll 返回不可变视图** | 任何修改立即抛异常 | -| **必须通过 repository 访问** | 封装内部结构,只暴露安全方法 | - ---- - -### 所有 Command 签名改变 - -```java -// W9 -public void execute(String[] args, List
articles); - -// W10 -public void execute(String[] args, ArticleRepository repository); -``` - -> 语义变化:从“给你数据随便玩” → “给你安全的存取通道” - ---- - -## 5️⃣ 整体架构串联 - -### 一个 `crawl` 命令的完整旅程 - -``` -用户输入 "crawl https://blog.example.com" - ↓ -ConsoleView 解析 - ↓ -Controller 路由 → CrawlCommand - ↓ -StrategyFactory.getStrategy(url) → BlogStrategy - ↓ -Jsoup 抓取 → Document - ↓ -BlogStrategy.parse(url, doc) → List
- ↓ -Repository.add() 存储 - ↓ -ConsoleView 输出成功信息 -``` - ---- - -### 架构全景图 - -![mvc-strategy-repo](/api/v1/attachments/8 "width=70% center") - -```mermaid -flowchart TD - User(["👤 用户输入
crawl https://blog.example.com"]) --> View - - subgraph View["🎨 View 层 (ConsoleView)"] - ReadLine["readLine()"] - Display["display() / printSuccess()"] - end - - ReadLine --> Controller - - subgraph Controller["🧭 Controller 层"] - Router["CrawlerController
Map 路由"] - end - - Router --> Command - - subgraph Command["⚡ Command 层"] - CrawlCmd["CrawlCommand
(调度者)"] - end - - CrawlCmd --> Factory - - subgraph Strategy["🧩 Strategy 层"] - Factory["StrategyFactory
(自动匹配)"] - StrategyI["<> CrawlStrategy"] - BlogS["BlogStrategy"] - NewsS["NewsStrategy"] - Factory --> StrategyI --> BlogS - StrategyI --> NewsS - end - - BlogS --> Repository - - subgraph Repository["🔐 Repository 层"] - Repo["ArticleRepository
(add / getAll)"] - RepoList["List
(私有)"] - Repo --> RepoList - end - - RepoList --> Model - - subgraph Model["📦 Model 层"] - Article["Article"] - end - - CrawlCmd --> Display - Repository --> Display -``` - -> 🗺️ 每一层都有清晰的职责,每一处扩展都只需要新增而不是修改 - ---- - -## 6️⃣ 代码落地(分步升级) - -### 从 W9 升级到 W10 的改动清单 - -1. 新建 `strategy/` 包 → `CrawlStrategy` 接口 -2. 实现 `BlogStrategy`、`NewsStrategy` -3. 实现 `StrategyFactory` -4. 新建 `repository/` 包 → `ArticleRepository` -5. 修改 `Command` 接口签名 -6. 重写 `CrawlCommand` -7. 调整其他所有 `Command` -8. 调整 `Controller` 和 `App.java` - ---- - -### 关键代码演示 - -- `Collections.unmodifiableList()` 的用法 -- `StrategyFactory.getStrategy()` 的遍历逻辑 -- `CrawlCommand` 从“写死解析”到“调度组装” - -```java -// 一个改动示例 -for (Article a : parsed) { - repository.add(a); // 旧: articles.add(a); -} -``` - ---- - -### 找茬点 - -- `StrategyFactory` 没匹配到策略时返回 `null` -- `CrawlCommand` 检查 `null` 并报错 -- 有没有更优雅的方式避免 `null` 判断? - -> 🔍 课后用 AI 探索 “空对象模式” 的前奏 - ---- - -## 7️⃣ 架构反思 + 下周预告 - -### 当前架构的脆弱点 - -- ❌ 异常处理单一笼统 -- ❌ 没有重试机制 -- ❌ 网络超时无控制 -- ❌ 日志仅输出到终端 - ---- - -### W11 目标:健壮性工程 - -- ✅ **自定义异常体系**:把“出错了”变成具体的业务异常 -- ✅ **工程化日志**:记录谁、什么时间、做了什么 -- ✅ **防御式编程 + 重试机制**:网络抖动不再致命 - -> W9 搭骨架 → W10 装盔甲 → W11 让它经得起毒打 - ---- - -## 8️⃣ 实践任务(现场) - -### 必做 - -1. 基于 W9 项目升级到 W10 -2. 至少实现 2 个 CrawlStrategy(可模拟) -3. 实现 `StrategyFactory` 和 `ArticleRepository` -4. 测试完整 `crawl` → `list` 流程 - -### 验收标准 - -- [ ] 新增策略只加类+注册,零改动旧代码 -- [ ] `getAll()` 返回不可修改视图 -- [ ] `CrawlCommand` 不含网站特定解析 -- [ ] 所有 Command 用 Repository -- [ ] 无地方直接操作 `List
` - ---- - -## 9️⃣ 课后作业 - -### 必做 - -1. 完善 `ArticleRepository`:增加 `addAll`,防御 null -2. **★ AnalyzeCommand**:复用策略解析但不存储,输出统计信息 -3. **AI 架构审计**:发送类签名给 AI,检查策略解耦与封装 - -### 选做 - -- 正则策略匹配、默认策略、策略优先级 -- 思考题:两个策略都 `supports` 同一 URL 时怎么办? - ---- - -## 🤖 AI 协同升级 - -### 架构审计师(必做) - -- 画出类依赖图 -- 发给 AI:“检查开闭原则达成度,Repository 封装完备性,是否存在循环依赖” - -### 进阶探究 - -- 不用工厂,直接用 `Map` 存起来 vs `StrategyFactory` 的区别? - ---- - -## 📚 总结 - -- ✅ 策略模式:算法可插拔,新增网站零痛苦 -- ✅ 工厂:自动匹配,URL → 策略的魔法 -- ✅ Repository:数据守卫,规则从口头约定变成代码强制 -- ✅ 架构:从“分开”到“优雅合上”,对扩展开放,对修改关闭 - -### W11 预告 - -自定义异常体系 + 日志 + 重试机制 - -> 🚀 让我们造的爬虫,经得住现实的考验 - ---- - -## 谢谢! - -**保持工程洁癖,下周见!** - ---- - -# 居中标题 - -## 居中副标题 - -### 居中内容 - ---- \ No newline at end of file diff --git a/w11/java-cli-w11/pom.xml b/w11/java-cli-w11/pom.xml deleted file mode 100644 index 9987b1c..0000000 --- a/w11/java-cli-w11/pom.xml +++ /dev/null @@ -1,62 +0,0 @@ - - 4.0.0 - com.example - datacollect-cli - 0.1.0 - - 11 - 11 - - - - org.jsoup - jsoup - 1.17.2 - - - org.slf4j - slf4j-api - 2.0.9 - - - ch.qos.logback - logback-classic - 1.4.14 - - - - - - org.apache.maven.plugins - maven-compiler-plugin - 3.8.1 - - - org.apache.maven.plugins - maven-assembly-plugin - 3.3.0 - - - - com.example.datacollect.Main - - - - jar-with-dependencies - - - - - make-assembly - package - - single - - - - - - - diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/Main.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/Main.java deleted file mode 100644 index ea9d151..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/Main.java +++ /dev/null @@ -1,41 +0,0 @@ -package com.example.datacollect; - -import com.example.datacollect.controller.CrawlerController; -import com.example.datacollect.repository.ArticleRepository; -import com.example.datacollect.strategy.StrategyFactory; -import com.example.datacollect.view.ConsoleView; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; -/*- 添加 logger 成员 -- 记录启动日志 -- 添加全局异常处理 */ -public class Main { - private static final Logger logger = LoggerFactory.getLogger(Main.class); - - public static void main(String[] args) { - try { - logger.info("Starting CLI Crawler application"); - - ConsoleView view = new ConsoleView(); - ArticleRepository repository = new ArticleRepository(); - StrategyFactory strategyFactory = new StrategyFactory(); - CrawlerController controller = new CrawlerController(view, repository, strategyFactory); - - view.printSuccess("Welcome to CLI Crawler (w10_3)! Type help for commands."); - logger.info("Application initialized successfully"); - - while (true) { - try { - controller.handle(view.readLine()); - } catch (Exception e) { - view.printError("Error: " + e.getMessage()); - logger.error("Error in main loop: {}", e.getMessage(), e); - } - } - } catch (Exception e) { - logger.error("Fatal error in application: {}", e.getMessage(), e); - System.err.println("Fatal error: " + e.getMessage()); - System.exit(1); - } - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/command/AnalyzeCommand.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/command/AnalyzeCommand.java deleted file mode 100644 index ec9bcc3..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/command/AnalyzeCommand.java +++ /dev/null @@ -1,103 +0,0 @@ -package com.example.datacollect.command; - -import com.example.datacollect.exception.NetworkException; -import com.example.datacollect.exception.ParseException; -import com.example.datacollect.model.Article; -import com.example.datacollect.repository.ArticleRepository; -import com.example.datacollect.strategy.CrawlStrategy; -import com.example.datacollect.strategy.StrategyFactory; -import com.example.datacollect.util.RetryUtils; -import com.example.datacollect.view.ConsoleView; -import org.jsoup.Jsoup; -import org.jsoup.nodes.Document; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.io.IOException; -import java.util.List; -import java.util.concurrent.Callable; - -public class AnalyzeCommand implements Command { - private static final Logger logger = LoggerFactory.getLogger(AnalyzeCommand.class); - private final ConsoleView view; - private final StrategyFactory strategyFactory; - - public AnalyzeCommand(ConsoleView view, StrategyFactory strategyFactory) { - this.view = view; - this.strategyFactory = strategyFactory; - } - - @Override - public String getName() { - return "analyze"; - } - - @Override - public void execute(String[] args, ArticleRepository repository) { - if (args.length < 2) { - view.printError("Usage: analyze "); - logger.warn("Invalid command: missing URL argument"); - return; - } - String url = args[1]; - logger.info("Analyze command executed for URL: {}", url); - - try { - CrawlStrategy strategy = strategyFactory.getStrategy(url); - if (strategy == null) { - view.printError("No strategy found for: " + url); - logger.error("No strategy found for URL: {}", url); - return; - } - - Callable fetchTask = () -> { - logger.debug("Fetching document from: {}", url); - try { - return Jsoup.connect(url) - .userAgent("Mozilla/5.0") - .timeout(5000) - .get(); - } catch (IOException e) { - throw new NetworkException("Failed to connect to " + url + ": " + e.getMessage(), e); - } - }; - - Document doc = RetryUtils.executeWithRetry(fetchTask); - logger.info("Successfully fetched document from: {}", url); - - List
articles = strategy.parse(url, doc); - logger.info("Parsed {} articles for analysis", articles.size()); - - int total = articles.size(); - int totalTitleLen = 0; - int totalContentLen = 0; - - for (Article a : articles) { - totalTitleLen += a.getTitle() == null ? 0 : a.getTitle().length(); - totalContentLen += a.getContent() == null ? 0 : a.getContent().length(); - } - - view.printInfo("===== 分析统计结果 ====="); - view.printInfo("文章总数:" + total + " 篇"); - view.printInfo("标题总长度:" + totalTitleLen); - view.printInfo("内容总长度:" + totalContentLen); - if (total > 0) { - view.printInfo("平均标题长度:" + (totalTitleLen / total)); - view.printInfo("平均内容长度:" + (totalContentLen / total)); - } - view.printInfo("======================"); - view.printSuccess("分析完成(数据未保存)"); - - logger.info("Analysis completed: {} articles analyzed", total); - } catch (NetworkException e) { - view.printError("Network error: " + e.getMessage()); - logger.error("Network error while analyzing {}: {}", url, e.getMessage(), e); - } catch (ParseException e) { - view.printError("Parse error: " + e.getMessage()); - logger.error("Parse error while analyzing {}: {}", url, e.getMessage(), e); - } catch (Exception e) { - view.printError("分析失败:" + e.getMessage()); - logger.error("Unexpected error while analyzing {}: {}", url, e.getMessage(), e); - } - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/command/Command.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/command/Command.java deleted file mode 100644 index 029cadc..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/command/Command.java +++ /dev/null @@ -1,8 +0,0 @@ -package com.example.datacollect.command; - -import com.example.datacollect.repository.ArticleRepository; - -public interface Command { - String getName(); - void execute(String[] args, ArticleRepository repository); -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/command/CrawlCommand.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/command/CrawlCommand.java deleted file mode 100644 index dd63594..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/command/CrawlCommand.java +++ /dev/null @@ -1,87 +0,0 @@ -package com.example.datacollect.command; - -import com.example.datacollect.exception.NetworkException; -import com.example.datacollect.exception.ParseException; -import com.example.datacollect.repository.ArticleRepository; -import com.example.datacollect.strategy.CrawlStrategy; -import com.example.datacollect.strategy.StrategyFactory; -import com.example.datacollect.util.RetryUtils; -import com.example.datacollect.view.ConsoleView; -import org.jsoup.Jsoup; -import org.jsoup.nodes.Document; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.io.IOException; -import java.util.concurrent.Callable; - -public class CrawlCommand implements Command { - private static final Logger logger = LoggerFactory.getLogger(CrawlCommand.class); - private final ConsoleView view; - private final StrategyFactory strategyFactory; - - public CrawlCommand(ConsoleView view, StrategyFactory strategyFactory) { - this.view = view; - this.strategyFactory = strategyFactory; - } - - @Override - public String getName() { - return "crawl"; - } - - @Override - public void execute(String[] args, ArticleRepository repository) { - if (args.length < 2) { - view.printError("Usage: crawl "); - logger.warn("Invalid command: missing URL argument"); - return; - } - String url = args[1]; - logger.info("Crawl started for: {}", url); - - CrawlStrategy strategy = strategyFactory.getStrategy(url); - if (strategy == null) { - view.printError("No strategy found for: " + url); - logger.error("No strategy found for URL: {}", url); - return; - } - - try { - view.printInfo("Crawling: " + url); - - Callable fetchTask = () -> { - logger.debug("Fetching document from: {}", url); - try { - return Jsoup.connect(url) - .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36") - .timeout(10000) - .get(); - } catch (IOException e) { - throw new NetworkException("Failed to connect to " + url + ": " + e.getMessage(), e); - } - }; - - Document doc = RetryUtils.executeWithRetry(fetchTask); - logger.info("Successfully fetched document from: {}", url); - - var articles = strategy.parse(url, doc); - logger.info("Parsed {} articles", articles.size()); - - repository.addAll(articles); - logger.info("Successfully added {} articles to repository", articles.size()); - - view.printSuccess("Crawled " + articles.size() + " articles."); - logger.info("Successfully crawled {} articles from {}", articles.size(), url); - } catch (NetworkException e) { - view.printError("Network error: " + e.getMessage()); - logger.error("Network error while crawling {}: {}", url, e.getMessage(), e); - } catch (ParseException e) { - view.printError("Parse error: " + e.getMessage()); - logger.error("Parse error while crawling {}: {}", url, e.getMessage(), e); - } catch (Exception e) { - view.printError("Failed to crawl: " + e.getMessage()); - logger.error("Unexpected error while crawling {}: {}", url, e.getMessage(), e); - } - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/command/ExitCommand.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/command/ExitCommand.java deleted file mode 100644 index 0f1d7fd..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/command/ExitCommand.java +++ /dev/null @@ -1,27 +0,0 @@ -package com.example.datacollect.command; - -import com.example.datacollect.repository.ArticleRepository; -import com.example.datacollect.view.ConsoleView; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -public class ExitCommand implements Command { - private static final Logger logger = LoggerFactory.getLogger(ExitCommand.class); - private final ConsoleView view; - - public ExitCommand(ConsoleView view) { - this.view = view; - } - - @Override - public String getName() { - return "exit"; - } - - @Override - public void execute(String[] args, ArticleRepository repository) { - logger.info("Exit command executed, shutting down"); - view.printSuccess("Bye!"); - System.exit(0);/*退出程序 */ - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/command/HelpCommand.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/command/HelpCommand.java deleted file mode 100644 index 2087695..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/command/HelpCommand.java +++ /dev/null @@ -1,26 +0,0 @@ -package com.example.datacollect.command; - -import com.example.datacollect.repository.ArticleRepository; -import com.example.datacollect.view.ConsoleView; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -public class HelpCommand implements Command { - private static final Logger logger = LoggerFactory.getLogger(HelpCommand.class); - private final ConsoleView view; - - public HelpCommand(ConsoleView view) { - this.view = view; - } - - @Override - public String getName() { - return "help"; - } - - @Override - public void execute(String[] args, ArticleRepository repository) { - logger.info("Help command executed"); - view.printInfo("Commands: crawl , list, help, exit, analyze"); - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/command/ListCommand.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/command/ListCommand.java deleted file mode 100644 index 9261a3d..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/command/ListCommand.java +++ /dev/null @@ -1,26 +0,0 @@ -package com.example.datacollect.command; - -import com.example.datacollect.repository.ArticleRepository; -import com.example.datacollect.view.ConsoleView; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -public class ListCommand implements Command { - private static final Logger logger = LoggerFactory.getLogger(ListCommand.class); - private final ConsoleView view; - - public ListCommand(ConsoleView view) { - this.view = view; - } - - @Override - public String getName() { - return "list"; - } - - @Override - public void execute(String[] args, ArticleRepository repository) { - logger.info("List command executed, showing {} articles", repository.size()); - view.display(repository.getAll()); - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/controller/CrawlerController.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/controller/CrawlerController.java deleted file mode 100644 index 5ef370a..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/controller/CrawlerController.java +++ /dev/null @@ -1,64 +0,0 @@ -package com.example.datacollect.controller; - -import com.example.datacollect.command.AnalyzeCommand; -import com.example.datacollect.command.Command; -import com.example.datacollect.command.CrawlCommand; -import com.example.datacollect.command.ExitCommand; -import com.example.datacollect.command.HelpCommand; -import com.example.datacollect.command.ListCommand; -import com.example.datacollect.repository.ArticleRepository; -import com.example.datacollect.strategy.StrategyFactory; -import com.example.datacollect.view.ConsoleView; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; -import java.util.HashMap; -import java.util.Map; - -public class CrawlerController { - private static final Logger logger = LoggerFactory.getLogger(CrawlerController.class); - private final Map commands = new HashMap<>(); - private final ConsoleView view; - private final ArticleRepository repository; - - public CrawlerController(ConsoleView view, ArticleRepository repository, StrategyFactory strategyFactory) { - this.view = view; - this.repository = repository; - register(new HelpCommand(view)); - register(new ListCommand(view)); - register(new CrawlCommand(view, strategyFactory)); - register(new ExitCommand(view)); - register(new AnalyzeCommand(view, strategyFactory)); - logger.info("CrawlerController initialized with {} commands", commands.size()); - } - - private void register(Command command) { - commands.put(command.getName(), command); - logger.debug("Registered command: {}", command.getName()); - } - - public void handle(String input) {/* 处理用户输入 */ - String text = input == null ? "" : input.trim();/* 处理空输入 */ - if (text.isEmpty()) { - return; - } - - String[] args = text.split("\\s+");/* 解析命令行参数 */ - String cmdName = args[0].toLowerCase();/* 提取命令名称并转换为小写 */ - - logger.debug("Processing command: {}", cmdName); - - Command command = commands.get(cmdName);/* 获取命令对象 */ - if (command == null) { - view.printError("Unknown command: " + cmdName); - logger.warn("Unknown command attempted: {}", cmdName); - return; - } - - try { - command.execute(args, repository);/* 执行命令 */ - } catch (Exception e) { - view.printError("Command execution failed: " + e.getMessage()); - logger.error("Error executing command {}: {}", cmdName, e.getMessage(), e); - } - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/exception/CrawlerException.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/exception/CrawlerException.java deleted file mode 100644 index 230adb3..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/exception/CrawlerException.java +++ /dev/null @@ -1,10 +0,0 @@ -package com.example.datacollect.exception; - -public class CrawlerException extends Exception { - public CrawlerException(String message) { - super(message); - } - public CrawlerException(String message, Throwable cause) { - super(message, cause); - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/exception/NetworkException.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/exception/NetworkException.java deleted file mode 100644 index 3a24c92..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/exception/NetworkException.java +++ /dev/null @@ -1,10 +0,0 @@ -package com.example.datacollect.exception; - -public class NetworkException extends CrawlerException { - public NetworkException(String message) { - super(message); - } - public NetworkException(String message, Throwable cause) { - super(message, cause); - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/exception/ParseException.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/exception/ParseException.java deleted file mode 100644 index 09f9f20..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/exception/ParseException.java +++ /dev/null @@ -1,10 +0,0 @@ -package com.example.datacollect.exception; - -public class ParseException extends CrawlerException { - public ParseException(String message) { - super(message); - } - public ParseException(String message, Throwable cause) { - super(message, cause); - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/model/Article.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/model/Article.java deleted file mode 100644 index 53b138b..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/model/Article.java +++ /dev/null @@ -1,72 +0,0 @@ -package com.example.datacollect.model; -/*- 文章模型类 -- 添加字段验证 -- 添加 toString() 方法(已有) -- 考虑添加 equals() 和 hashCode() */ -public class Article { - private String title; - private String url; - private String content; - - public Article(String title, String url, String content) { - setTitle(title); - setUrl(url); - setContent(content); - } - - public String getTitle() { - return title; - } - - public void setTitle(String title) { - if (title == null) { - throw new IllegalArgumentException("Title cannot be null"); - } - if (title.trim().isEmpty()) { - throw new IllegalArgumentException("Title cannot be empty"); - } - if (title.length() > 500) { - throw new IllegalArgumentException("Title cannot exceed 500 characters"); - } - this.title = title.trim(); - } - - public String getUrl() { - return url; - } - - public void setUrl(String url) { - if (url == null) { - throw new IllegalArgumentException("URL cannot be null"); - } - if (url.trim().isEmpty()) { - throw new IllegalArgumentException("URL cannot be empty"); - } - if (!url.startsWith("http://") && !url.startsWith("https://")) { - throw new IllegalArgumentException("URL must start with http:// or https://"); - } - this.url = url.trim(); - } - - public String getContent() { - return content; - } - - public void setContent(String content) { - if (content == null) { - this.content = ""; - } else if (content.length() > 10000) { - this.content = content.substring(0, 10000);/* 截断内容到 10000 个字符 */ - } else { - this.content = content; - } - } - - @Override - public String toString() { - return "Article{" - + "title='" + title + '\'' - + ", url='" + url + '\'' - + '}'; - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/repository/ArticleRepository.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/repository/ArticleRepository.java deleted file mode 100644 index 8994efa..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/repository/ArticleRepository.java +++ /dev/null @@ -1,113 +0,0 @@ -package com.example.datacollect.repository; - -import com.example.datacollect.model.Article; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; -import java.util.ArrayList; -import java.util.Collections; -import java.util.HashSet; -import java.util.List; -import java.util.Set; -/* 文章仓库 -- 添加 logger 成员 -- 增强 add() 方法的防御检查 -- 增强 addALL() 方法的防御检查 -- 添加空值检查、重复检查、长度验证 -- 记录操作日志*/ -public class ArticleRepository { - private static final Logger logger = LoggerFactory.getLogger(ArticleRepository.class); - private static final int MAX_TITLE_LENGTH = 500;/* 最大标题长度 */ - private static final int MAX_CONTENT_LENGTH = 10000;/* 最大内容长度 */ - - private final List
articles = new ArrayList<>(); - private final Set urlSet = new HashSet<>(); - - public void add(Article article) { - if (article == null) { - logger.error("Attempted to add null article"); - throw new IllegalArgumentException("Article cannot be null"); - } - - String title = article.getTitle(); - String url = article.getUrl(); - String content = article.getContent(); - - if (title == null || title.trim().isEmpty()) { - logger.warn("Attempted to add article with empty title"); - throw new IllegalArgumentException("Article title cannot be null or empty"); - } - - if (url == null || url.trim().isEmpty()) { - logger.warn("Attempted to add article with empty URL"); - throw new IllegalArgumentException("Article URL cannot be null or empty"); - } - - if (title.length() > MAX_TITLE_LENGTH) { - logger.warn("Article title too long: {} characters (max: {})", title.length(), MAX_TITLE_LENGTH); - throw new IllegalArgumentException("Article title exceeds maximum length of " + MAX_TITLE_LENGTH); - } - - if (content != null && content.length() > MAX_CONTENT_LENGTH) { - logger.warn("Article content too long: {} characters (max: {})", content.length(), MAX_CONTENT_LENGTH); - content = content.substring(0, MAX_CONTENT_LENGTH); - } - - if (!url.startsWith("http://") && !url.startsWith("https://")) { - logger.warn("Invalid URL format: {}", url); - throw new IllegalArgumentException("Article URL must start with http:// or https://"); - } - - if (urlSet.contains(url)) { - logger.warn("Duplicate article URL detected: {}", url); - return;/* 跳过重复文章 */ - } - - Article validatedArticle = new Article(title.trim(), url.trim(), content != null ? content.trim() : "");/* 创建验证后的文章 */ - articles.add(validatedArticle);/* 添加文章到列表 */ - urlSet.add(url);/* 添加URL到集合 */ - logger.debug("Added article: {}", title);/* 记录添加日志 */ - } - - public void addAll(List
articleList) { - if (articleList == null) { - logger.error("Attempted to add null article list"); - throw new IllegalArgumentException("Article list cannot be null"); - } - - int successCount = 0;/* 成功添加的文章数量 */ - int skipCount = 0;/* 跳过的无效文章数量 */ - - for (Article article : articleList) { - if (article != null) { - try { - add(article); - successCount++; - } catch (IllegalArgumentException e) { - logger.warn("Skipped invalid article: {}", e.getMessage()); - skipCount++; - } - } else { - logger.warn("Skipped null article in list"); - skipCount++; - } - } - - logger.info("Added {} articles, skipped {} invalid articles", successCount, skipCount); - } - - public List
getAll() { - logger.debug("Retrieving all articles, total: {}", articles.size()); - return Collections.unmodifiableList(articles);/* 返回不可修改的列表 */ - } - - public int size() { - return articles.size();/* 返回文章数量 */ - } - - public void clear() { - int count = articles.size();/* 记录当前文章数量 */ - articles.clear(); - urlSet.clear(); - logger.info("Cleared repository, removed {} articles", count); - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/BlogStrategy.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/BlogStrategy.java deleted file mode 100644 index 1e23b2b..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/BlogStrategy.java +++ /dev/null @@ -1,25 +0,0 @@ -package com.example.datacollect.strategy; - -import com.example.datacollect.model.Article; -import org.jsoup.nodes.Document; -import org.jsoup.nodes.Element; -import org.jsoup.select.Elements; -import java.util.ArrayList; -import java.util.List; - -public class BlogStrategy implements CrawlStrategy { - @Override - public boolean supports(String url) { - return url.contains("blog.example.com"); - } - - @Override - public List
parse(String url, Document doc) { - List
articles = new ArrayList<>(); - Elements titles = doc.select(".post-title"); - for (Element e : titles) { - articles.add(new Article(e.text(), url, "")); - } - return articles; - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/CrawlStrategy.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/CrawlStrategy.java deleted file mode 100644 index ed69e19..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/CrawlStrategy.java +++ /dev/null @@ -1,11 +0,0 @@ -package com.example.datacollect.strategy; - -import com.example.datacollect.exception.ParseException; -import com.example.datacollect.model.Article; -import org.jsoup.nodes.Document; -import java.util.List; - -public interface CrawlStrategy { - List
parse(String url, Document doc) throws ParseException; - boolean supports(String url); -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/HnuNewsStrategy.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/HnuNewsStrategy.java deleted file mode 100644 index 6892510..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/HnuNewsStrategy.java +++ /dev/null @@ -1,77 +0,0 @@ -package com.example.datacollect.strategy; - -import com.example.datacollect.exception.ParseException; -import com.example.datacollect.model.Article; -import org.jsoup.nodes.Document; -import org.jsoup.nodes.Element; -import org.jsoup.select.Elements; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; -import java.util.ArrayList; -import java.util.List; - -/* HNU News 策略 -- 添加 logger 成员 -- 添加异常处理 -- 实现防御性编程 */ -public class HnuNewsStrategy implements CrawlStrategy { - private static final Logger logger = LoggerFactory.getLogger(HnuNewsStrategy.class); - - @Override - public boolean supports(String url) { - return url.contains("news.hnu.edu.cn");/* 支持 HNU News 网站 */ - } - - @Override - public List
parse(String url, Document doc) throws ParseException { - logger.info("Starting to parse HNU News: {}", url); - List
articles = new ArrayList<>();/* 存储储解析后的文章 */ - - try { - Elements listItems = doc.select("ul.list11 li");/* 选择文章列表项 */ - logger.debug("Found {} list items", listItems.size());/* 记录找到的列表项数量 */ - - for (Element li : listItems) { - try { - Element link = li.selectFirst("a");/* 选择列表项中的链接 */ - if (link == null) { - logger.warn("No link found in list item");/* 记录未找到链接 */ - continue; - } - - String articleUrl = link.attr("href");/* 获取链接的 href 属性值 */ - if (!articleUrl.startsWith("http")) { - articleUrl = "https://news.hnu.edu.cn" + articleUrl.replace("..", "");/* 补全相对路径 */ - } - - String title = "";/* 存储文章标题 */ - Element titleEl = link.selectFirst("h4.l2.h4s2");/* 选择标题元素 */ - if (titleEl != null) { - title = titleEl.text().trim();/* 提取标题文本并移除首尾空格 */ - } - - String content = "";/* 存储文章内容 */ - Element contentEl = link.selectFirst("p.l3.ps3");/* 选择内容元素 */ - if (contentEl != null) { - content = contentEl.text().trim();/* 提取内容文本并移除首尾空格 */ - } - - if (!title.isEmpty()) { - Article article = new Article(title, articleUrl, content);/* 创建文章对象 */ - articles.add(article);/* 将文章添加到列表 */ - } else { - logger.warn("Empty title found, skipping article"); - } - } catch (Exception e) { - logger.error("Error parsing individual article: {}", e.getMessage()); - } - } - - logger.info("Successfully parsed {} articles from HNU News", articles.size()); - return articles; - } catch (Exception e) { - logger.error("Failed to parse HNU News page: {}", e.getMessage(), e); - throw new ParseException("Failed to parse HNU News: " + e.getMessage(), e); - } - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/NewsStrategy.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/NewsStrategy.java deleted file mode 100644 index f6eb4bd..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/NewsStrategy.java +++ /dev/null @@ -1,25 +0,0 @@ -package com.example.datacollect.strategy; - -import com.example.datacollect.model.Article; -import org.jsoup.nodes.Document; -import org.jsoup.nodes.Element; -import org.jsoup.select.Elements; -import java.util.ArrayList; -import java.util.List; - -public class NewsStrategy implements CrawlStrategy { - @Override - public boolean supports(String url) { - return url.contains("news.example.com"); - } - - @Override - public List
parse(String url, Document doc) { - List
articles = new ArrayList<>(); - Elements items = doc.select(".article-headline"); - for (Element e : items) { - articles.add(new Article(e.text(), url, "")); - } - return articles; - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/PeopleStrategy.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/PeopleStrategy.java deleted file mode 100644 index eb25935..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/PeopleStrategy.java +++ /dev/null @@ -1,83 +0,0 @@ -package com.example.datacollect.strategy; - -import com.example.datacollect.exception.ParseException; -import com.example.datacollect.model.Article; -import org.jsoup.nodes.Document; -import org.jsoup.nodes.Element; -import org.jsoup.select.Elements; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; -import java.util.ArrayList; -import java.util.List; -/* 人民网策略类 */ -public class PeopleStrategy implements CrawlStrategy { - private static final Logger logger = LoggerFactory.getLogger(PeopleStrategy.class); - - @Override - public boolean supports(String url) { - return url.contains("people.com.cn");/* 检查URL是否包含people.com.cn */ - } - - @Override - public List
parse(String url, Document doc) throws ParseException { - logger.info("Starting to parse People's Daily News: {}", url); - List
articles = new ArrayList<>();/* 初始化文章列表 */ - - try { - Elements newsItems = doc.select("div.w1000, div.news-item, li.list_item");/* 选择新闻容器 */ - logger.debug("Found {} news containers", newsItems.size()); - - if (newsItems.isEmpty()) { - newsItems = doc.select("a[href*='/n1/']");/* 选择替代选择器 */ - logger.debug("Trying alternative selector, found {} items", newsItems.size()); - } - - for (Element item : newsItems) { - try { - Element link = item.selectFirst("a");/* 选择链接元素 */ - if (link == null) { - link = item.tagName().equals("a") ? item : null;/* 检查是否为链接元素 */ - } - - if (link == null) { - logger.warn("No link found in news item"); - continue; - } - - String articleUrl = link.attr("href");/* 获取链接URL */ - if (!articleUrl.startsWith("http")) {/* 检查是否为绝对URL */ - if (articleUrl.startsWith("/")) { - articleUrl = "https://www.people.com.cn" + articleUrl; - } else { - articleUrl = "https://www.people.com.cn/" + articleUrl; - } - } - - String title = link.text().trim();/* 获取标题文本 */ - - String content = "";/* 初始化内容文本 */ - Element contentEl = item.selectFirst("p, div.ed, div.summary");/* 选择内容元素 */ - if (contentEl != null) { - content = contentEl.text().trim();/* 获取内容文本 */ - } - - if (!title.isEmpty() && title.length() > 5) { - Article article = new Article(title, articleUrl, content);/* 创建文章对象 */ - articles.add(article);/* 添加文章到列表 */ - logger.debug("Parsed article: {}", title);/* 记录解析文章 */ - } else { - logger.warn("Invalid title found, skipping article");/* 记录无效标题 */ - } - } catch (Exception e) { - logger.error("Error parsing individual article: {}", e.getMessage()); - } - } - - logger.info("Successfully parsed {} articles from People's Daily News", articles.size()); - return articles; - } catch (Exception e) { - logger.error("Failed to parse People's Daily News page: {}", e.getMessage(), e); - throw new ParseException("Failed to parse People's Daily News: " + e.getMessage(), e); - } - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/StrategyFactory.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/StrategyFactory.java deleted file mode 100644 index e28aaac..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/StrategyFactory.java +++ /dev/null @@ -1,36 +0,0 @@ -package com.example.datacollect.strategy; - -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; -import java.util.ArrayList; -import java.util.List; - -public class StrategyFactory { - private static final Logger logger = LoggerFactory.getLogger(StrategyFactory.class); - private final List strategies = new ArrayList<>(); - - public StrategyFactory() { - strategies.add(new HnuNewsStrategy()); - strategies.add(new YouthStrategy()); - strategies.add(new PeopleStrategy()); - strategies.add(new BlogStrategy()); - strategies.add(new NewsStrategy()); - logger.info("Initialized StrategyFactory with {} strategies", strategies.size()); - } - - public CrawlStrategy getStrategy(String url) { - for (CrawlStrategy s : strategies) { - if (s.supports(url)) { - logger.debug("Found strategy {} for URL: {}", s.getClass().getSimpleName(), url); - return s; - } - } - logger.warn("No strategy found for URL: {}", url); - return null; - } - - public void register(CrawlStrategy strategy) { - strategies.add(strategy); - logger.info("Registered new strategy: {}", strategy.getClass().getSimpleName()); - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/YouthStrategy.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/YouthStrategy.java deleted file mode 100644 index 2bdb8d1..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/strategy/YouthStrategy.java +++ /dev/null @@ -1,87 +0,0 @@ -package com.example.datacollect.strategy; - -import com.example.datacollect.exception.ParseException; -import com.example.datacollect.model.Article; -import org.jsoup.nodes.Document; -import org.jsoup.nodes.Element; -import org.jsoup.select.Elements; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; -import java.util.ArrayList; -import java.util.List; -/* 青年网新闻解析策略*/ -public class YouthStrategy implements CrawlStrategy { - private static final Logger logger = LoggerFactory.getLogger(YouthStrategy.class); - - @Override - public boolean supports(String url) { - return url.contains("youth.cn");/* 检查URL是否包含青年网域名 */ - } - - @Override - public List
parse(String url, Document doc) throws ParseException { - logger.info("Starting to parse Youth News: {}", url); - List
articles = new ArrayList<>(); - - try { - Elements newsItems = doc.select("div.news-item, div.article-item, li.news-list-item");/* 选择新闻项元素 */ - logger.debug("Found {} news items", newsItems.size()); - - if (newsItems.isEmpty()) { - newsItems = doc.select("a[href*='/n1/']");/* 选择替代选择器 */ - logger.debug("Trying alternative selector, found {} items", newsItems.size()); - } - - for (Element item : newsItems) { - try { - Element link = item.selectFirst("a");/* 选择链接元素 */ - if (link == null) { - link = item.tagName().equals("a") ? item : null;/* 检查是否为链接元素 */ - } - - if (link == null) { - logger.warn("No link found in news item"); - continue; - } - - String articleUrl = link.attr("href");/* 获取链接URL */ - - if (!articleUrl.startsWith("http")) {/* 检查URL是否为绝对URL */ - if (articleUrl.startsWith("/")) { - articleUrl = "https://www.youth.cn" + articleUrl; - } else { - articleUrl = "https://www.youth.cn/" + articleUrl; - } - } - - String title = link.text().trim();/* 获取链接文本 */ - if (title.isEmpty()) {/* 检查标题是否为空 */ - continue; - } - - String content = "";/* 初始化内容为空字符串 */ - Element contentEl = item.selectFirst("p.summary, p.desc, div.brief");/* 选择摘要元素 */ - if (contentEl != null) { - content = contentEl.text().trim();/* 获取摘要文本 */ - } - - if (!title.isEmpty() && title.length() > 5) { - Article article = new Article(title, articleUrl, content); - articles.add(article); - logger.debug("Parsed article: {}", title); - } else { - logger.warn("Invalid title found, skipping article"); - } - } catch (Exception e) { - logger.error("Error parsing individual article: {}", e.getMessage()); - } - } - - logger.info("Successfully parsed {} articles from Youth News", articles.size()); - return articles; - } catch (Exception e) { - logger.error("Failed to parse Youth News page: {}", e.getMessage(), e); - throw new ParseException("Failed to parse Youth News: " + e.getMessage(), e); - } - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/util/RetryUtils.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/util/RetryUtils.java deleted file mode 100644 index 96aee20..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/util/RetryUtils.java +++ /dev/null @@ -1,49 +0,0 @@ -package com.example.datacollect.util; - -import com.example.datacollect.exception.NetworkException; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; -import java.util.concurrent.Callable; - -public class RetryUtils { - private static final Logger logger = LoggerFactory.getLogger(RetryUtils.class); - - private static final int DEFAULT_MAX_RETRIES = 3; - private static final long DEFAULT_RETRY_DELAY_MS = 1000; - - public static T executeWithRetry(Callable task) throws Exception { - return executeWithRetry(task, DEFAULT_MAX_RETRIES, DEFAULT_RETRY_DELAY_MS); - } - - public static T executeWithRetry(Callable task, int maxRetries, long retryDelayMs) throws Exception { - Exception lastException = null; - - for (int attempt = 0; attempt <= maxRetries; attempt++) { - try { - if (attempt > 0) { - logger.info("Retry attempt {}/{} for task", attempt, maxRetries); - Thread.sleep(retryDelayMs); - } - - return task.call(); - } catch (Exception e) { - lastException = e; - - if (e instanceof NetworkException) { - logger.warn("Network error on attempt {}: {}", attempt, e.getMessage()); - - if (attempt < maxRetries) { - logger.info("Will retry in {} ms...", retryDelayMs); - continue; - } - } else { - logger.error("Non-retryable error: {}", e.getMessage()); - throw e; - } - } - } - - logger.error("All {} retry attempts failed", maxRetries + 1); - throw lastException; - } -} diff --git a/w11/java-cli-w11/src/main/java/com/example/datacollect/view/ConsoleView.java b/w11/java-cli-w11/src/main/java/com/example/datacollect/view/ConsoleView.java deleted file mode 100644 index 4665db0..0000000 --- a/w11/java-cli-w11/src/main/java/com/example/datacollect/view/ConsoleView.java +++ /dev/null @@ -1,46 +0,0 @@ -package com.example.datacollect.view; - -import com.example.datacollect.model.Article; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; -import java.util.List; -import java.util.Scanner; - -public class ConsoleView { - private static final Logger logger = LoggerFactory.getLogger(ConsoleView.class); - private static final String ANSI_RESET = "\u001B[0m"; - private static final String ANSI_GREEN = "\u001B[32m"; - private static final String ANSI_RED = "\u001B[31m"; - private static final String ANSI_BLUE = "\u001B[34m"; - - private final Scanner scanner = new Scanner(System.in); - - public String readLine() { - System.out.print("> "); - String input = scanner.nextLine(); - return input;/* 返回用户输入 */ - } - - public void printSuccess(String msg) { - System.out.println(ANSI_GREEN + msg + ANSI_RESET); - } - - public void printError(String msg) { - System.out.println(ANSI_RED + msg + ANSI_RESET); - } - - public void printInfo(String msg) { - System.out.println(ANSI_BLUE + msg + ANSI_RESET); - } - - public void display(List
articles) { - if (articles.isEmpty()) { - printInfo("暂无文章,请先执行 crawl。"); - return; - } - for (int i = 0; i < articles.size(); i++) { - Article a = articles.get(i); - System.out.println((i + 1) + ". " + a.getTitle() + " | " + a.getUrl()); - } - } -} diff --git a/w11/java-cli-w11/src/main/resources/logback.xml b/w11/java-cli-w11/src/main/resources/logback.xml deleted file mode 100644 index aa0a06b..0000000 --- a/w11/java-cli-w11/src/main/resources/logback.xml +++ /dev/null @@ -1,24 +0,0 @@ - - - - - %d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n - - - - - logs/crawler.log - - logs/crawler.%d{yyyy-MM-dd}.log - 30 - - - %d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n - - - - - - - - diff --git a/w11/java-cli-w11/target/classes/logback.xml b/w11/java-cli-w11/target/classes/logback.xml deleted file mode 100644 index aa0a06b..0000000 --- a/w11/java-cli-w11/target/classes/logback.xml +++ /dev/null @@ -1,24 +0,0 @@ - - - - - %d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n - - - - - logs/crawler.log - - logs/crawler.%d{yyyy-MM-dd}.log - 30 - - - %d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n - - - - - - - - diff --git a/w11/java-cli-w11/target/maven-archiver/pom.properties b/w11/java-cli-w11/target/maven-archiver/pom.properties deleted file mode 100644 index 5c1de34..0000000 --- a/w11/java-cli-w11/target/maven-archiver/pom.properties +++ /dev/null @@ -1,3 +0,0 @@ -artifactId=datacollect-cli -groupId=com.example -version=0.1.0 diff --git a/w11/java-cli-w11/target/maven-status/maven-compiler-plugin/compile/default-compile/createdFiles.lst b/w11/java-cli-w11/target/maven-status/maven-compiler-plugin/compile/default-compile/createdFiles.lst deleted file mode 100644 index 1ead6c5..0000000 --- a/w11/java-cli-w11/target/maven-status/maven-compiler-plugin/compile/default-compile/createdFiles.lst +++ /dev/null @@ -1,22 +0,0 @@ -com\example\datacollect\command\ListCommand.class -com\example\datacollect\strategy\PeopleStrategy.class -com\example\datacollect\command\CrawlCommand.class -com\example\datacollect\strategy\BlogStrategy.class -com\example\datacollect\repository\ArticleRepository.class -com\example\datacollect\Main.class -com\example\datacollect\view\ConsoleView.class -com\example\datacollect\command\ExitCommand.class -com\example\datacollect\command\HelpCommand.class -com\example\datacollect\util\RetryUtils.class -com\example\datacollect\strategy\NewsStrategy.class -com\example\datacollect\command\Command.class -com\example\datacollect\controller\CrawlerController.class -com\example\datacollect\exception\CrawlerException.class -com\example\datacollect\exception\NetworkException.class -com\example\datacollect\command\AnalyzeCommand.class -com\example\datacollect\strategy\StrategyFactory.class -com\example\datacollect\strategy\HnuNewsStrategy.class -com\example\datacollect\strategy\YouthStrategy.class -com\example\datacollect\exception\ParseException.class -com\example\datacollect\strategy\CrawlStrategy.class -com\example\datacollect\model\Article.class diff --git a/w11/java-cli-w11/target/maven-status/maven-compiler-plugin/compile/default-compile/inputFiles.lst b/w11/java-cli-w11/target/maven-status/maven-compiler-plugin/compile/default-compile/inputFiles.lst deleted file mode 100644 index 937e5d7..0000000 --- a/w11/java-cli-w11/target/maven-status/maven-compiler-plugin/compile/default-compile/inputFiles.lst +++ /dev/null @@ -1,22 +0,0 @@ -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\strategy\NewsStrategy.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\controller\CrawlerController.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\repository\ArticleRepository.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\strategy\HnuNewsStrategy.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\command\ExitCommand.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\command\Command.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\Main.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\command\CrawlCommand.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\exception\NetworkException.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\strategy\StrategyFactory.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\strategy\BlogStrategy.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\util\RetryUtils.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\command\HelpCommand.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\exception\CrawlerException.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\exception\ParseException.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\model\Article.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\view\ConsoleView.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\command\AnalyzeCommand.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\strategy\YouthStrategy.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\command\ListCommand.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\strategy\CrawlStrategy.java -C:\Users\27687\Desktop\java-cli\src\main\java\com\example\datacollect\strategy\PeopleStrategy.java diff --git a/w11/java-cli-w11/第10周——设计模式:灵活性与可扩展性.md b/w11/java-cli-w11/第10周——设计模式:灵活性与可扩展性.md deleted file mode 100644 index 9641102..0000000 --- a/w11/java-cli-w11/第10周——设计模式:灵活性与可扩展性.md +++ /dev/null @@ -1,705 +0,0 @@ -# 教案:《高级程序设计》第10周——设计模式:灵活性与可扩展性 - -| 项目 | 内容 | -| -------- | ---------------------------------------------------------------------------- | -| **课程名称** | 高级程序设计 | -| **周次** | 第10周 | -| **主题** | 设计模式——灵活性与可扩展性 | -| **学时** | 2学时(90分钟) | -| **授课对象** | 已完成第9周CLI+MVC架构学习,具备Command模式基础 | -| **教学环境** | JDK 17+、IntelliJ IDEA、Maven | -| **前情提要** | W9搭建了CLI骨架:MVC分层 + Command路由,但留下了两大隐患——解析逻辑耦合在Command中、List\共享引用裸奔 | - ---- - -## 教学调整说明:为什么W10要在“骨架”上装“盔甲”? - -> **W9成果**:一个可扩展的命令行骨架 → **W9痛点**:解析器与数据存储仍在“裸奔” - -| 维度 | W9状态 | W10目标 | -|------|--------|---------| -| **架构** | MVC分层清晰 | MVC + 策略模式 + 仓库层 | -| **命令扩展** | 新增命令不改Controller | 新增解析器不改任何旧代码 | -| **数据安全** | List\全员可写 | Repository封装,只暴露安全接口 | -| **解析逻辑** | 硬编码在CrawlCommand内 | 策略模式,按URL自动匹配 | -| **代码量** | ~8个类 | ~12个类,但每个更小更纯粹 | - -**决策理由**: -1. W9学生已经感受到Command模式的好处——**多态带来的扩展性** -2. 策略模式是多态思想的又一次实战,是**接口抽象的深化** -3. 仓库层是“封装”这一OOP核心原则的落地,补上W9留下的课 -4. 解析器工厂让学生看到**“自动匹配”**的威力——增加网站支持只需新增一个类 - -**更深层的教育价值**: -> W9教会学生“怎么把代码分开”,W10要教会学生“怎么把代码分开后还能优雅地合上”——**接口即合同,工厂即自动匹配,仓库即数据守卫**。这三句话,就是本周的全部精华。 - ---- - -## 一、教学目标 - -| 目标维度 | 具体描述 | -|----------|----------| -| **知识掌握** | 理解策略模式的定义与多态本质;掌握工厂模式的两类变体(工厂方法/简单工厂)及适用场景;理解仓库模式对数据访问的封装原理。 | -| **工程实践** | 能在爬虫项目中用策略模式封装不同网站的解析逻辑;能实现解析器工厂,根据URL自动匹配解析策略;能用Repository模式替代裸List,提供安全的数据访问接口。 | -| **思维转型** | 从“写死逻辑”转向“策略可插拔”;从“直接操作集合”转向“通过仓库存取”;理解“对扩展开放,对修改关闭”的开闭原则。 | -| **工具应用** | 利用AI审查策略模式实现是否真正解耦;让AI扮演“网站结构分析师”辅助编写具体解析策略;用AI生成Repository的安全接口建议。 | - ---- - -## 二、教学重点与难点 - -| 项目 | 内容 | 突破方法 | -|------|------|----------| -| **重点** | 策略模式的多态本质、解析器工厂的自动匹配机制、Repository对数据访问的封装 | 以“新增网站需要改什么”为切入点,展示策略模式的开闭原则达成;通过“攻击”当前List裸奔的问题,引出Repository的必然性 | -| **难点** | 理解“接口即合同”的抽象思维、工厂模式中反射/Map注册的实现、仓库层与Strategy模式的协同 | 用“插座与电器”类比接口标准;现场演示从硬编码→工厂→反射的演进路径;用时序图展示“用户→Command→Strategy→Repository”的完整调用链 | - ---- - -## 三、教学过程设计(90分钟) - -| 环节 | 时间 | 教学内容 | 师生活动 | AI协同点 | -| -------------------------- | --- | ----------------------------------------------------------------- | -------------------------------------- | --------------------------- | -| **1. W9回顾与痛点暴露** | 8' | 回顾W9成果(CLI骨架),暴露两大隐患:①CrawlCommand里解析逻辑硬编码;②List\全员可读可写 | **教师演示**:展示W9代码,用“事故场景”引发思考 | — | -| **2. 策略模式:解析器的“插头标准化”** | 18' | 策略模式定义、接口设计、多态调用、与Command模式的对比 | **类比**:插座与电器;**教师演示**:从if-else到策略模式的演进 | 让AI生成“策略模式vs switch-case”对比 | -| **3. 解析器工厂:自动匹配的魔法** | 14' | 工厂模式的两种形态(简单工厂→Map注册工厂),解析器工厂实现 | **教师演示**:先用if-else判断host,再升级为Map注册工厂 | 让AI解释工厂模式与策略模式如何协同 | -| **4. Repository模式:武装数据访问** | 12' | Repository定义、接口设计、替换List\后的影响 | **教师演示**:在原代码中把List替换为Repository,展示改动点 | 学生用AI审计Repository接口的“最小完备性” | -| **5. 整体架构串联** | 8' | 用一张时序图串联:用户→CLI→Controller→Command→Strategy→Repository→Model | **师生互动**:让学生在白板上画出调用链 | — | -| **6. 代码落地** | 20' | 实现CrawlStrategy接口 + 两个策略 + 解析器工厂 + ArticleRepository | **教师演示**:分步写出代码,刻意埋入“策略匹配失败”的异常处理 | 完成后用AI检查策略模式实现 | -| **7. 架构反思与W11预告** | 5' | 当前架构还有什么隐患?(异常处理不统一、日志缺失)→ 预告W11健壮性工程 | **师生互动**:如果解析器工厂找不到匹配策略,会发生什么? | — | -| **8. 实践任务** | 5' | 实现策略模式和仓库层,完成本周代码升级 | 学生现场编码,教师巡视 | — | - ---- - -## 四、核心教学内容脚本 - -### 4.1 W9回顾与痛点暴露(8分钟) - -**教师口播**: -> "上节课我们搭了一个很漂亮的骨架——CLI+MVC+Command模式。我们先来表扬一下自己:新增一个命令,只要新建一个类,Controller零改动。但请大家想一个问题——" - -**投影展示W9的CrawlCommand存根**: -```java -public class CrawlCommand implements Command { - // ... - public void execute(String[] args, List
articles) { - if (args.length < 2) { - view.printError("Usage: crawl "); - return; - } - view.printInfo("Stub: Would crawl " + args[1]); - } -} -``` - -**提问引导**: -1. "这个存根下周要填坑了。假设我们现在要真正实现爬取,代码写在哪?" -2. "如果我要支持两个网站——比如一个技术博客和一个新闻网站——它们的HTML结构完全不一样,这个`execute`方法会变成什么样?" - -**展示“噩梦版”CrawlCommand**: -```java -public void execute(String[] args, List
articles) { - String url = args[1]; - // 五十行if-else地狱... - if (url.contains("blog.example.com")) { - // 解析技术博客的HTML - Document doc = Jsoup.connect(url).get(); - Elements titles = doc.select(".post-title"); - for (Element e : titles) { - articles.add(new Article(e.text(), url, "")); - } - } else if (url.contains("news.example.com")) { - // 解析新闻网站的HTML - Document doc = Jsoup.connect(url).get(); - Elements items = doc.select(".article-headline"); - for (Element e : items) { - articles.add(new Article(e.text(), url, "")); - } - } else { - view.printError("Unsupported website!"); - } -} -``` - -**痛点提炼**: -> "看到了吗?每支持一个新网站,就要在这里加一个`else if`。这就是W1我们痛批的'牵一发而动全身',只不过这次灾难地点从`main`搬到了`CrawlCommand`。" -> -> "更重要的是,我们上节课辛辛苦苦实现了Command模式,难道解析逻辑又要回到if-else地狱吗?**这就是W10要解决的第一个问题:怎么让解析逻辑也可插拔?**" - -**第二个隐患——共享状态的回顾**: -> "还有一件事,我们上节课结束前提到的:`List
articles`在所有Command之间共享。任何一个Command都可以往里面塞东西、删东西、甚至清空。这是W10要解决的第二个问题:**怎么给数据装上'防盗门'?**" - ---- - -### 4.2 策略模式:解析器的“插头标准化”(18分钟) - -#### 4.2.1 从类比切入 - -**教师口播**: -> "先讲个生活场景。你家里墙上有一个三孔插座,你可以插电视、插电脑、插手机充电器——任何符合这个标准的电器都能用。插座不在乎你是什么电器,它只认接口标准。" - -**类比映射**: - -| 生活场景 | 代码对应 | -|----------|----------| -| 三孔插座 | `CrawlStrategy` 接口 | -| 电视/电脑充电器 | 具体解析策略(BlogStrategy/NewsStrategy) | -| 电流 | 输入:URL + Document;输出:List\ | -| 你(使用者) | CrawlCommand | -| 插座面板 | 解析器工厂 | - -> "策略模式的核心思想就是:**定义一个算法接口,让具体的算法实现可以互相替换,而使用算法的客户端不受影响。**" - -#### 4.2.2 策略模式定义 - -```java -// src/main/java/com/crawler/strategy/CrawlStrategy.java -package com.crawler.strategy; - -import com.crawler.model.Article; -import org.jsoup.nodes.Document; -import java.util.List; - -public interface CrawlStrategy { - /** - * 从已获取的Document中解析文章列表 - * @param url 原始请求URL(用于填充Article) - * @param doc Jsoup解析后的Document - * @return 解析出的文章列表 - */ - List
parse(String url, Document doc); - - /** - * 判断此策略是否为给定URL服务 - * @param url 待判断的URL - * @return true表示此策略可以处理该URL - */ - boolean supports(String url); -} -``` - -**教师口播**: -> "注意,策略接口里有两个方法。`parse`是干活的那个,`supports`是'我能不能干这个活'——这是什么?**这是合同!** 任何网站想被我们爬虫支持,就必须签署这份合同:告诉我你是不是我的客户(supports),以及怎么解析你(parse)。" - -#### 4.2.3 具体策略实现示例 - -```java -// BlogStrategy.java - 技术博客解析策略 -public class BlogStrategy implements CrawlStrategy { - @Override - public boolean supports(String url) { - return url.contains("blog.example.com"); - } - - @Override - public List
parse(String url, Document doc) { - List
articles = new ArrayList<>(); - Elements titles = doc.select(".post-title"); - for (Element e : titles) { - articles.add(new Article(e.text(), url, "")); - } - return articles; - } -} - -// NewsStrategy.java - 新闻网站解析策略 -public class NewsStrategy implements CrawlStrategy { - @Override - public boolean supports(String url) { - return url.contains("news.example.com"); - } - - @Override - public List
parse(String url, Document doc) { - List
articles = new ArrayList<>(); - Elements items = doc.select(".article-headline"); - for (Element e : items) { - articles.add(new Article(e.text(), url, "")); - } - return articles; - } -} -``` - -**对比:策略模式 vs 硬编码if-else** - -| 维度 | if-else屎山 | 策略模式 | -|------|-------------|----------| -| 新增网站 | 改CrawlCommand,加else if | 新写一个类,实现CrawlStrategy | -| 修改解析逻辑 | 在CrawlCommand里翻找对应的else if | 只改对应策略类 | -| 测试 | 必须启动整个爬虫 | 单独对Strategy做单元测试 | -| 是否符合开闭原则 | ❌ 对修改开放 | ✅ 对扩展开放,对修改关闭 | - -**与Command模式的对比(加深理解)**: -> "上节课Command模式,我们为每个命令定义一个类;这节课策略模式,我们为每个网站的解析算法定义一个类。**本质上都是同一个OOP思想:用多态替代条件分支。** 只不过Command的接口是`execute()`,Strategy的接口是`parse()`。" -> -> "这张图你们可以记下来:**接口是消除if-else的利器,多态是接口的灵魂。**" - ---- - -### 4.3 解析器工厂:自动匹配的魔法(14分钟) - -#### 4.3.1 问题引出 - -**教师口播**: -> "现在我们有A网站的策略、B网站的策略。问题来了:谁来选策略?谁来遍历所有策略,找到一个supports返回true的?" -> -> "如果把这个逻辑写在CrawlCommand里,那策略模式就白用了——CrawlCommand还是得'知道'有哪些策略。我们要的是一个黑盒子:**把URL丢进去,自动弹出一个合适的解析器。**" - -#### 4.3.2 解析器工厂的实现 - -```java -// src/main/java/com/crawler/strategy/StrategyFactory.java -package com.crawler.strategy; - -import java.util.ArrayList; -import java.util.List; - -public class StrategyFactory { - private final List strategies = new ArrayList<>(); - - // 注册策略——新的网站只需在这里加一行 - public StrategyFactory() { - strategies.add(new BlogStrategy()); - strategies.add(new NewsStrategy()); - // 未来增加新网站:strategies.add(new XxxStrategy()); - } - - /** - * 根据URL自动匹配解析策略 - * @param url 目标URL - * @return 匹配的策略,如果没有匹配返回null - */ - public CrawlStrategy getStrategy(String url) { - for (CrawlStrategy s : strategies) { - if (s.supports(url)) { - return s; - } - } - return null; // 未找到匹配策略 - } -} -``` - -**教师口播**: -> "这个工厂类足够简单:一个List存所有策略,一个方法遍历找到匹配的。但简单不等于不强大。** -> -> **关键点**:新增网站支持,只需要——" -1. 写一个`XxxStrategy`实现`CrawlStrategy` -2. 在工厂构造器里加一行`strategies.add(new XxxStrategy())` -> -> "CrawlCommand一行不改。这就是开闭原则的胜利。" - -#### 4.3.3 从简单工厂到更高级的注册机制(拓展思维) - -**教师口播**: -> "有同学可能会问:还要在工厂构造器里加一行,能不能做到完全零改动?当然可以——用反射或者SPI。" - -**演示概念(不要求实现)**: -```java -// 进阶思路:扫描指定包下的所有CrawlStrategy实现类 -// 用反射自动注册,真正做到“新增类即生效” -// 这是Spring框架的核心思想之一 -``` - -> "这个技术我们暂时不要求掌握,但我希望你们知道:你现在写的每一个`new XxxStrategy()`,在未来都可能进化为框架级别的自动装配。**你现在建立的思维习惯,决定了你未来能走多高。**" - -#### 4.3.4 重构后的CrawlCommand - -```java -public class CrawlCommand implements Command { - private ConsoleView view; - private StrategyFactory strategyFactory; - private ArticleRepository repository; // 注意:这里是Repository了! - - public CrawlCommand(ConsoleView v, StrategyFactory f, ArticleRepository r) { - this.view = v; - this.strategyFactory = f; - this.repository = r; - } - - public String getName() { return "crawl"; } - - public void execute(String[] args, List
articles) { - if (args.length < 2) { - view.printError("Usage: crawl "); - return; - } - String url = args[1]; - - // 1. 工厂自动选策略 - CrawlStrategy strategy = strategyFactory.getStrategy(url); - if (strategy == null) { - view.printError("No strategy found for: " + url); - return; - } - - // 2. 抓取页面 - view.printInfo("Crawling: " + url); - try { - Document doc = Jsoup.connect(url).get(); - List
parsed = strategy.parse(url, doc); - - // 3. 通过仓库存入(而不是直接操作List) - for (Article a : parsed) { - repository.add(a); - } - view.printSuccess("Crawled " + parsed.size() + " articles."); - } catch (IOException e) { - view.printError("Failed to crawl: " + e.getMessage()); - } - } -} -``` - -**教师口播**: -> "注意这个CrawlCommand现在的职责:拿到URL → 交给工厂选策略 → 执行解析 → 交给仓库存储。**它自己在干什么?在调度!** 这就是上节课我们讲的Controller的'调度思维',现在向Command内部延伸了。" - ---- - -### 4.4 Repository模式:武装数据访问(12分钟) - -#### 4.4.1 问题重提 - -**教师口播**: -> "回到上节课结束时的那个问题:`List
`在所有Command之间共享。任何一个Command都可以做这些事——" -```java -articles.clear(); // 清空所有文章 -articles.add(null); // 塞入null -articles.remove(0); // 随意删除 -``` - -> "如果一个新同事接手开发,他不知道'不要动这个List'的潜规则,写了一个`articles.clear()`,你的`list`命令就突然什么都不显示了。**靠代码约定维护的秩序,早晚会被打破。我们需要实体的'规则'——代码层面的约束。**" - -#### 4.4.2 ArticleRepository的定义 - -```java -// src/main/java/com/crawler/repository/ArticleRepository.java -package com.crawler.repository; - -import com.crawler.model.Article; -import java.util.ArrayList; -import java.util.Collections; -import java.util.List; - -public class ArticleRepository { - private final List
articles = new ArrayList<>(); - - /** - * 添加一篇文章。注意:不接受null,这是代码层面的规则,不是口头约定。 - */ - public void add(Article article) { - if (article == null) { - throw new IllegalArgumentException("Article cannot be null"); - } - articles.add(article); - } - - /** - * 获取所有文章的只读视图 - * 调用者无法通过此返回值修改内部数据 - */ - public List
getAll() { - return Collections.unmodifiableList(articles); - } - - /** - * 获取文章数量 - */ - public int size() { - return articles.size(); - } - - /** - * 清空(仅管理员可调——下一篇:权限控制) - */ - public void clear() { - articles.clear(); - } -} -``` - -**教师口播**: -> "三个关键设计点——" -> -> - **add()拒绝null**:规则写在代码里,不是写在邮件里 -> - **getAll()返回不可修改的视图**:`Collections.unmodifiableList()`——调用者如果尝试add/remove,会**直接抛异常**,不是'悄悄的bug' -> - **ClearCommand要清空数据?调`repository.clear()`**,而不是直接操作List -> -> "这就是面向对象的第一课——封装。把数据藏起来,只暴露安全的方法。从'直接操作集合'到'通过仓库存取',是程序员成熟度的分水岭。" - -#### 4.4.3 仓库引入后的架构变化 - -**Command接口的execute方法调整**: - -```java -// 调整前(W9) -public interface Command { - String getName(); - void execute(String[] args, List
articles); -} - -// 调整后(W10) -public interface Command { - String getName(); - void execute(String[] args, ArticleRepository repository); -} -``` - -**教师口播**: -> "这个改动很小——把`List
`换成`ArticleRepository`。但语义完全不同:之前是'给你数据随便玩',现在是'给你一个安全的存取通道'。" - -**所有Command同步调整**: - -```java -// ListCommand.java - 调整后 -public class ListCommand implements Command { - private ConsoleView view; - public ListCommand(ConsoleView v) { this.view = v; } - public String getName() { return "list"; } - public void execute(String[] args, ArticleRepository repository) { - view.display(repository.getAll()); // 通过仓库获取数据 - } -} - -// ClearCommand.java(新增示例) -public class ClearCommand implements Command { - private ConsoleView view; - public ClearCommand(ConsoleView v) { this.view = v; } - public String getName() { return "clear"; } - public void execute(String[] args, ArticleRepository repository) { - repository.clear(); - view.printSuccess("All articles cleared."); - } -} -``` - -**Controller和main的调整**: - -```java -// App.java - 调整后 -public class App { - public static void main(String[] args) { - ConsoleView view = new ConsoleView(); - ArticleRepository repository = new ArticleRepository(); // 替代 List
- StrategyFactory factory = new StrategyFactory(); // 新增 - - CrawlerController controller = new CrawlerController(view, repository, factory); - - view.printSuccess("Welcome to CLI Crawler v2.0!"); - view.printInfo("Type 'help' for commands."); - - while (true) { - controller.handle(view.readLine()); - } - } -} -``` - ---- - -### 4.5 整体架构串联(8分钟) - -**教师口播**: -> "现在我们把所有部件串起来,看看一个`crawl https://blog.example.com`命令走过的完整路径。" - -**时序图(口述配白板绘制)**: -``` -用户输入 "crawl https://blog.example.com" - │ - ▼ -ConsoleView.readLine() - │ - ▼ -CrawlerController.handle("crawl https://blog.example.com") - │ Map查找 "crawl" → CrawlCommand - ▼ -CrawlCommand.execute(args, repository) - │ - ├─► StrategyFactory.getStrategy(url) - │ │ 遍历List - │ │ BlogStrategy.supports(url) → true! - │ ▼ - │ 返回 BlogStrategy - │ - ├─► Jsoup.connect(url).get() → Document - │ - ├─► BlogStrategy.parse(url, doc) → List
- │ - └─► for each article: repository.add(article) - │ - ▼ - ArticleRepository.articles.add(article) - -最终:ConsoleView.printSuccess("Crawled N articles.") -``` - -**教师口播**: -> "七步调用,每一步职责清晰:View负责输入输出,Controller负责路由,Command负责调度,Factory负责匹配,Strategy负责解析,Repository负责存储。**没有哪个类干了两个人的活,也没有哪个类不知道自己的活是什么。**" -> -> "这就是工程化——不是把代码写得快,是把代码写得对。" - ---- - -### 4.6 代码落地(20分钟) - -**教师准备**:课前准备一份“W9升级到W10”的改动清单,现场演示关键改动。 - -**改动清单**: -1. 新建`strategy/`包,创建`CrawlStrategy`接口 -2. 新建`strategy/BlogStrategy.java` -3. 新建`strategy/NewsStrategy.java` -4. 新建`strategy/StrategyFactory.java` -5. 新建`repository/`包,创建`ArticleRepository.java` -6. 修改`Command`接口的`execute`签名 -7. 修改`CrawlCommand`,引入`StrategyFactory`和`ArticleRepository` -8. 修改其余所有`Command`实现类 -9. 修改`CrawlerController`构造器 -10. 修改`App.java` - -**教师演示关键步骤**(重点演示): -- `ArticleRepository`的`Collections.unmodifiableList()` -- `StrategyFactory`的遍历匹配逻辑 -- `CrawlCommand`重写后的调度结构 - -**刻意埋入的“找茬点”**: -> "我在`StrategyFactory.getStrategy()`里,如果没有匹配的策略就返回`null`。然后在`CrawlCommand`里检查null。这其实叫'null object pattern的前奏'——如果我不想让Command检查null,我应该怎么改工厂?大家带着这个问题用AI探究。" - ---- - -### 4.7 架构反思与W11预告(5分钟) - -**教师口播**: -> "现在我们的架构比W9强壮多了:解析逻辑可插拔,数据访问有守卫。但还有一些漏洞——" - -**逐一点破**: -1. **异常处理**:`CrawlCommand`用了一个笼统的`catch (IOException e)`,如果解析过程中抛出其他异常怎么办? -2. **网络超时**:如果目标网站3秒没响应,当前代码会一直等吗? -3. **日志缺失**:所有的成功/失败信息只输出到终端,如果程序半夜跑,第二天想看昨晚抓了多少——看不了。 -4. **重试机制**:如果一次失败就直接报错,要不要给个重试的机会? - -**W11预告**: -> "下周,我们会做三件事:**自定义异常体系**、**工程化日志框架**、**防御式编程与重试机制**。W9搭骨架,W10装盔甲,W11要让这个系统**经得起现实的毒打**。" - ---- - -### 4.8 实践任务(5分钟) - -**任务要求**: -1. 从W9代码出发,完成W10升级 -2. 实现至少两个`CrawlStrategy`(可以是模拟的,不要求真实爬取) -3. 实现`StrategyFactory`和`ArticleRepository` -4. 确保所有Command通过Repository访问数据 -5. 运行并测试完整流程 - -**验收标准**: -- [x] 新增策略类只需新建文件+工厂注册一行,其余代码零改动 -- [x] `ArticleRepository`的`getAll()`返回不可修改视图 -- [x] `CrawlCommand`不包含任何网站特定的解析逻辑 -- [x] `StrategyFactory`能根据URL自动匹配正确的策略 -- [x] 所有Command的`execute`方法签名已更新为`ArticleRepository` -- [x] 无任何地方直接操作`List
` - ---- - -## 五、课后作业 - -### 5.1 必做任务 - -1. **完善ArticleRepository**:增加`addAll(List
)`批量添加方法,注意防御null -2. **★ AnalyzeCommand(集大成作业)**: - - 实现`analyze `命令 - - 内部调用`StrategyFactory`匹配策略 - - 调用策略解析文章后,**不存到Repository**,而是分析统计信息: - - 文章总数 - - 标题平均长度 - - 按某种规则排名的Top 5 - - 结果只输出,不存储 - - **提示**:这就是策略的复用——同一个解析策略,既能为`crawl`服务(存入仓库),也能为`analyze`服务(仅分析) - -3. **AI架构审计**:将完整代码的类图(或类名与方法签名列表)发给AI,指令: - > "作为Java架构审计师,请检查:①策略模式的实现是否正确解耦(CrawlCommand是否仍然包含网站特定逻辑);②Repository是否真正封装了数据访问(是否存在绕过Repository直接操作List的地方);③工厂的匹配逻辑是否存在性能隐患。请给出具体的改进建议。" - -### 5.2 选做任务 - -1. **正则策略匹配**:将`Supports()`的判断从`url.contains()`改为正则表达式,让一张策略可以匹配一类URL -2. **默认策略(DefaultStrategy)**:当没有策略匹配时,提供一个通用的“标题提取”逻辑 -3. **策略优先级**:给每个策略加一个`priority`字段,工厂按优先级匹配(而不是按注册顺序) -4. **思考并回答(200字)**: - > "策略模式中,策略的`supports()`方法有可能让两个策略都返回true,这时该选哪个?`StrategyFactory`的遍历顺序会如何影响结果?你有什么解决方案?" - -### 5.3 思考题 - -1. **Repository与List的区别是什么?** 如果Repository只是包了一层List,为什么还要用? -2. **策略工厂的演进**:如果网站数量增加到100个,逐个注册的写法还合适吗?你想到什么解决方案? -3. **`Collections.unmodifiableList()`返回的是什么?** 它真的“不可修改”吗?如果原List被修改,这个不可修改视图会怎样? - ---- - -## 六、AI协同升级 - -### 架构审计师任务(必做) - -**学生执行步骤**: -1. 画出当前项目的类依赖图(手绘或工具生成) -2. 将类名和依赖关系发给AI -3. 输入指令: - > "作为Java架构审计师,请检查这个爬虫项目的架构。重点关注:①策略模式是否真正实现了开闭原则(增加新网站是否真的只需新增类);②Repository封装是否完整(是否有绕过Repository的路径);③是否存在循环依赖。请逐一指出问题并给出改进建议。" - -**预期AI输出**: -- 指出是否还存在“改一处影响多处”的耦合 -- 判断Repository的API设计是否完备 -- 评价整体架构的开闭原则达成度 - -### 进阶AI探究(选做) - -> "假设我有一个CrawlStrategy接口和10个实现类。不用工厂模式,直接用一个Map存起来,key是策略名称。这和StrategyFactory设计有什么本质区别?各自的优缺点是什么?" - ---- - -## 七、教学反思与调整记录 - -| 日期 | 事项 | 调整内容 | -|------|------|----------| -| 2026-05-01 | 首次编写 | 基于W9骨架,引入策略模式+工厂+Repository | -| 2026-05-07 | 结构优化 | 调整策略模式与工厂的讲解顺序,先策略后工厂更自然 | - ---- - -## 附录1:W9到W10改动对照表 - -| 改动项 | W9代码 | W10代码 | -|--------|--------|---------| -| 数据存储 | `List
articles` | `ArticleRepository repository` | -| Command接口 | `execute(String[], List
)` | `execute(String[], ArticleRepository)` | -| 解析逻辑位置 | `CrawlCommand`内部 | 各`CrawlStrategy`实现类 | -| URL匹配 | 无(硬编码) | `StrategyFactory.getStrategy(url)` | -| 数据添加 | `articles.add(article)` | `repository.add(article)` | -| 数据读取 | 直接遍历`articles` | `repository.getAll()` | - -## 附录2:常见问题速查 - -| 问题 | 解答 | -|------|------| -| 策略模式和Command模式有什么区别? | Command封装“动作”(做什么事),Strategy封装“算法”(怎么做)。在爬虫中:crawl是命令(动作),如何解析是策略(算法)。 | -| 工厂一定要叫Factory吗? | 不必须。但叫Factory意味着“创建对象”的职责,符合模式命名的惯例。 | -| `Collections.unmodifiableList()`有什么用? | 返回一个只读视图,调用add/remove等方法会抛`UnsupportedOperationException`。 | -| Repository和DAO有什么区别? | 在我们的上下文中可以视为同义词。严谨地说,Repository是领域驱动设计的概念,更偏向“集合语义”;DAO更偏数据库操作。 | -| 策略的`supports()`返回true但解析失败怎么办? | 那是策略实现的bug,该策略应修复。Factory不负责验证策略的正确性。 | - -## 附录3:教学逻辑说明 - -| 顺序 | 内容 | 设计理由 | -|------|------|----------| -| 1 | W9回顾+痛点暴露 | 承上启下,从已知问题引出新知识 | -| 2 | 策略模式 | 解决解析逻辑耦合问题,深化多态理解 | -| 3 | 解析器工厂 | 解决策略选择问题,引入工厂模式 | -| 4 | Repository模式 | 解决数据安全问题,实践封装原则 | -| 5 | 架构串联 | 将所有部件统一,形成完整心智模型 | -| 6 | 代码落地 | 实践验证,从“听懂”到“会做” | -| 7 | 架构反思+预告 | 暴露新问题,为W11健壮性工程铺垫 | - ---- - -## 版本说明 - -- **v1(本版)**:基于W9教案模式首次编写,包含策略模式、工厂模式、Repository模式的完整引入 \ No newline at end of file