LiZifan 3 weeks ago
parent
commit
26ceabbf0d
  1. BIN
      project/202506050226-李梓帆-期末实验报告报告.docx
  2. 4
      project/java1/java-cli/.gitignore
  3. 492
      project/java1/java-cli/W10 PPT.md
  4. 57
      project/java1/java-cli/pom.xml
  5. 113
      project/java1/java-cli/src/main/java/com/example/datacollect/Main.java
  6. 8
      project/java1/java-cli/src/main/java/com/example/datacollect/command/Command.java
  7. 96
      project/java1/java-cli/src/main/java/com/example/datacollect/command/CrawlCommand.java
  8. 28
      project/java1/java-cli/src/main/java/com/example/datacollect/command/ExitCommand.java
  9. 27
      project/java1/java-cli/src/main/java/com/example/datacollect/command/HelpCommand.java
  10. 27
      project/java1/java-cli/src/main/java/com/example/datacollect/command/ListCommand.java
  11. 57
      project/java1/java-cli/src/main/java/com/example/datacollect/controller/CrawlerController.java
  12. 11
      project/java1/java-cli/src/main/java/com/example/datacollect/exception/CrawlerException.java
  13. 11
      project/java1/java-cli/src/main/java/com/example/datacollect/exception/NetworkException.java
  14. 11
      project/java1/java-cli/src/main/java/com/example/datacollect/exception/ParseException.java
  15. 45
      project/java1/java-cli/src/main/java/com/example/datacollect/model/Article.java
  16. 80
      project/java1/java-cli/src/main/java/com/example/datacollect/repository/ArticleRepository.java
  17. 43
      project/java1/java-cli/src/main/java/com/example/datacollect/strategy/BlogStrategy.java
  18. 73
      project/java1/java-cli/src/main/java/com/example/datacollect/strategy/CctvNewsStrategy.java
  19. 12
      project/java1/java-cli/src/main/java/com/example/datacollect/strategy/CrawlStrategy.java
  20. 65
      project/java1/java-cli/src/main/java/com/example/datacollect/strategy/HnuNewsStrategy.java
  21. 43
      project/java1/java-cli/src/main/java/com/example/datacollect/strategy/NewsStrategy.java
  22. 58
      project/java1/java-cli/src/main/java/com/example/datacollect/strategy/StrategyFactory.java
  23. 94
      project/java1/java-cli/src/main/java/com/example/datacollect/strategy/WeatherStrategy.java
  24. 77
      project/java1/java-cli/src/main/java/com/example/datacollect/strategy/WeiboHotStrategy.java
  25. 52
      project/java1/java-cli/src/main/java/com/example/datacollect/view/ConsoleView.java
  26. 27
      project/java1/java-cli/src/main/resources/logback.xml
  27. 27
      project/java1/java-cli/target/classes/logback.xml
  28. 3
      project/java1/java-cli/target/maven-archiver/pom.properties
  29. 22
      project/java1/java-cli/target/maven-status/maven-compiler-plugin/compile/default-compile/createdFiles.lst
  30. 21
      project/java1/java-cli/target/maven-status/maven-compiler-plugin/compile/default-compile/inputFiles.lst
  31. 705
      project/java1/java-cli/第10周——设计模式:灵活性与可扩展性.md
  32. 358
      project/java1/logs/crawler.2026-05-19.log
  33. 10
      project/java1/logs/crawler.2026-05-29.log
  34. 295
      project/java1/logs/crawler.log

BIN
project/202506050226-李梓帆-期末实验报告报告.docx

Binary file not shown.

4
project/java1/java-cli/.gitignore

@ -0,0 +1,4 @@
*.jar
*.jar
*.class
*.log

492
project/java1/java-cli/W10 PPT.md

@ -0,0 +1,492 @@
---
id: "24"
title: w10-设计模式
slug: w10-design-patterns
status: draft
view_count: 0
created_at: 2026-05-07T12:00:00+08:00
updated_at: 2026-05-07T14:00:00.000000000+08:00
---
# 高级程序设计 · 第10周
### 设计模式:灵活性与可扩展性
### 策略模式 + 工厂 + Repository 实战
---
### 📌 本周导航
- W9回顾:骨架的成就与隐患
- 策略模式:解析器的“插头标准”
- 解析器工厂:自动匹配的魔法
- Repository:武装数据访问
- 整体架构串联:调用链全程
- 代码落地 + 实践任务
- 架构反思 + W11 预告
---
## 1️⃣ W9回顾:骨架的成就与隐患
### 我们建了一座漂亮的房子
- ✅ MVC 分层清晰
- ✅ Command 模式:**新增命令,Controller 零改动**
- ✅ 所有输出走 `ConsoleView`
- ✅ 工程包结构标准
---
### 但问题也随之而来
```java
// CrawlCommand 里解析逻辑怎么办?
if (url.contains("blog.example.com")) {
// 博客解析...
} else if (url.contains("news.example.com")) {
// 新闻解析...
} else {
view.printError("Unsupported website!");
}
```
> 😫 每支持一个新网站,就要加一个 `else if`
---
### 还有另一个“裸奔”的数据
```java
List<Article> articles = new ArrayList<>();
// 所有 Command 都可以:
articles.clear();
articles.add(null);
articles.remove(0);
```
> 🚨 数据没有任何保护,靠口头约定是靠不住的
---
### 本周任务
1. **解析逻辑可插拔** → 策略模式 + 工厂
2. **数据访问加守卫** → Repository 模式
> W9 搭骨架,W10 装盔甲
---
## 2️⃣ 策略模式:解析器的“插头标准”
### 墙上的插座,为什么什么电器都能插?
- **三孔插座** 是标准接口
- 电视、电脑、手机充电器都实现这个接口
- 插座不关心你是什么电器
---
### 爬虫的世界也一样
- `CrawlStrategy` = 插座接口
- `BlogStrategy`、`NewsStrategy` = 具体电器
- `CrawlCommand` = 使用电器的人
- `StrategyFactory` = 插座面板
---
### 接口即合同
```java
public interface CrawlStrategy {
List<Article> parse(String url, Document doc);
boolean supports(String url);
}
```
- `supports()`:我能不能处理这个 URL?
- `parse()`:怎么解析?
- **任何网站想被爬,签这份合同!**
---
### 策略 vs 硬编码
| 维度 | if-else 屎山 | 策略模式 |
|------|-------------|----------|
| 新增网站 | 改 Command | 新建策略类 |
| 修改解析 | 翻找 else if | 只改对应类 |
| 测试 | 启动整个爬虫 | 单独测策略 |
| 开闭原则 | ❌ 修改开放 | ✅ 扩展开放,修改关闭 |
---
### 具体策略示例
```java
public class BlogStrategy implements CrawlStrategy {
public boolean supports(String url) {
return url.contains("blog.example.com");
}
public List<Article> parse(String url, Document doc) {
List<Article> articles = new ArrayList<>();
for (Element e : doc.select(".post-title")) {
articles.add(new Article(e.text(), url, ""));
}
return articles;
}
}
```
> ✨ 一个新网站,一个独立类,各扫门前雪
---
## 3️⃣ 解析器工厂:自动匹配的魔法
### 谁来选择策略?
- 如果 `CrawlCommand` 遍历所有策略 → 策略模式白用了
- 我们需要一个黑盒子:**丢入 URL,返回合适的解析器**
---
### 工厂登场
```java
public class StrategyFactory {
private final List<CrawlStrategy> strategies = new ArrayList<>();
public StrategyFactory() {
strategies.add(new BlogStrategy());
strategies.add(new NewsStrategy());
}
public CrawlStrategy getStrategy(String url) {
for (CrawlStrategy s : strategies) {
if (s.supports(url)) return s;
}
return null;
}
}
```
> 🔧 新增网站只需:新建策略类 + 工厂里注册一行
---
### 开闭原则的胜利
- ✅ `CrawlCommand` 完全不改
- ✅ 新增 `XxxStrategy` 和一行注册
- ✅ 所有策略的调用方式完全一致
> 这就是 **“对扩展开放,对修改关闭”**
---
### 重构后的 CrawlCommand
```java
public void execute(String[] args, ArticleRepository repository) {
String url = args[1];
CrawlStrategy strategy = strategyFactory.getStrategy(url);
if (strategy == null) {
view.printError("No strategy for: " + url);
return;
}
Document doc = Jsoup.connect(url).get();
List<Article> parsed = strategy.parse(url, doc);
for (Article a : parsed) {
repository.add(a);
}
view.printSuccess("Crawled " + parsed.size() + " articles.");
}
```
> 🧠 CrawlCommand 现在只做 **“调度”**,不做解析
---
## 4️⃣ Repository:武装数据访问
### 共享 List 的问题
```java
articles.clear(); // 清空
articles.add(null); // 塞 null
articles.remove(0); // 随意删除
```
> 靠约定维护的秩序,终将被打破
---
### 给数据装上防盗门
```java
public class ArticleRepository {
private final List<Article> articles = new ArrayList<>();
public void add(Article article) {
if (article == null) throw new IllegalArgumentException(...);
articles.add(article);
}
public List<Article> getAll() {
return Collections.unmodifiableList(articles);
}
public int size() { return articles.size(); }
public void clear() { articles.clear(); }
}
```
---
### 三道防线
| 机制 | 作用 |
|------|------|
| **add 拒绝 null** | 规则写在代码里,不靠口头约定 |
| **getAll 返回不可变视图** | 任何修改立即抛异常 |
| **必须通过 repository 访问** | 封装内部结构,只暴露安全方法 |
---
### 所有 Command 签名改变
```java
// W9
public void execute(String[] args, List<Article> articles);
// W10
public void execute(String[] args, ArticleRepository repository);
```
> 语义变化:从“给你数据随便玩” → “给你安全的存取通道”
---
## 5️⃣ 整体架构串联
### 一个 `crawl` 命令的完整旅程
```
用户输入 "crawl https://blog.example.com"
ConsoleView 解析
Controller 路由 → CrawlCommand
StrategyFactory.getStrategy(url) → BlogStrategy
Jsoup 抓取 → Document
BlogStrategy.parse(url, doc) → List<Article>
Repository.add() 存储
ConsoleView 输出成功信息
```
---
### 架构全景图
![mvc-strategy-repo](/api/v1/attachments/8 "width=70% center")
```mermaid
flowchart TD
User(["👤 用户输入<br/>crawl https://blog.example.com"]) --> View
subgraph View["🎨 View 层 (ConsoleView)"]
ReadLine["readLine()"]
Display["display() / printSuccess()"]
end
ReadLine --> Controller
subgraph Controller["🧭 Controller 层"]
Router["CrawlerController<br/>Map 路由"]
end
Router --> Command
subgraph Command["⚡ Command 层"]
CrawlCmd["CrawlCommand<br/>(调度者)"]
end
CrawlCmd --> Factory
subgraph Strategy["🧩 Strategy 层"]
Factory["StrategyFactory<br/>(自动匹配)"]
StrategyI["<<interface>> CrawlStrategy"]
BlogS["BlogStrategy"]
NewsS["NewsStrategy"]
Factory --> StrategyI --> BlogS
StrategyI --> NewsS
end
BlogS --> Repository
subgraph Repository["🔐 Repository 层"]
Repo["ArticleRepository<br/>(add / getAll)"]
RepoList["List<Article> (私有)"]
Repo --> RepoList
end
RepoList --> Model
subgraph Model["📦 Model 层"]
Article["Article"]
end
CrawlCmd --> Display
Repository --> Display
```
> 🗺️ 每一层都有清晰的职责,每一处扩展都只需要新增而不是修改
---
## 6️⃣ 代码落地(分步升级)
### 从 W9 升级到 W10 的改动清单
1. 新建 `strategy/` 包 → `CrawlStrategy` 接口
2. 实现 `BlogStrategy`、`NewsStrategy`
3. 实现 `StrategyFactory`
4. 新建 `repository/` 包 → `ArticleRepository`
5. 修改 `Command` 接口签名
6. 重写 `CrawlCommand`
7. 调整其他所有 `Command`
8. 调整 `Controller``App.java`
---
### 关键代码演示
- `Collections.unmodifiableList()` 的用法
- `StrategyFactory.getStrategy()` 的遍历逻辑
- `CrawlCommand` 从“写死解析”到“调度组装”
```java
// 一个改动示例
for (Article a : parsed) {
repository.add(a); // 旧: articles.add(a);
}
```
---
### 找茬点
- `StrategyFactory` 没匹配到策略时返回 `null`
- `CrawlCommand` 检查 `null` 并报错
- 有没有更优雅的方式避免 `null` 判断?
> 🔍 课后用 AI 探索 “空对象模式” 的前奏
---
## 7️⃣ 架构反思 + 下周预告
### 当前架构的脆弱点
- ❌ 异常处理单一笼统
- ❌ 没有重试机制
- ❌ 网络超时无控制
- ❌ 日志仅输出到终端
---
### W11 目标:健壮性工程
- ✅ **自定义异常体系**:把“出错了”变成具体的业务异常
- ✅ **工程化日志**:记录谁、什么时间、做了什么
- ✅ **防御式编程 + 重试机制**:网络抖动不再致命
> W9 搭骨架 → W10 装盔甲 → W11 让它经得起毒打
---
## 8️⃣ 实践任务(现场)
### 必做
1. 基于 W9 项目升级到 W10
2. 至少实现 2 个 CrawlStrategy(可模拟)
3. 实现 `StrategyFactory``ArticleRepository`
4. 测试完整 `crawl``list` 流程
### 验收标准
- [ ] 新增策略只加类+注册,零改动旧代码
- [ ] `getAll()` 返回不可修改视图
- [ ] `CrawlCommand` 不含网站特定解析
- [ ] 所有 Command 用 Repository
- [ ] 无地方直接操作 `List<Article>`
---
## 9️⃣ 课后作业
### 必做
1. 完善 `ArticleRepository`:增加 `addAll`,防御 null
2. **★ AnalyzeCommand**:复用策略解析但不存储,输出统计信息
3. **AI 架构审计**:发送类签名给 AI,检查策略解耦与封装
### 选做
- 正则策略匹配、默认策略、策略优先级
- 思考题:两个策略都 `supports` 同一 URL 时怎么办?
---
## 🤖 AI 协同升级
### 架构审计师(必做)
- 画出类依赖图
- 发给 AI:“检查开闭原则达成度,Repository 封装完备性,是否存在循环依赖”
### 进阶探究
- 不用工厂,直接用 `Map<String, CrawlStrategy>` 存起来 vs `StrategyFactory` 的区别?
---
## 📚 总结
- ✅ 策略模式:算法可插拔,新增网站零痛苦
- ✅ 工厂:自动匹配,URL → 策略的魔法
- ✅ Repository:数据守卫,规则从口头约定变成代码强制
- ✅ 架构:从“分开”到“优雅合上”,对扩展开放,对修改关闭
### W11 预告
自定义异常体系 + 日志 + 重试机制
> 🚀 让我们造的爬虫,经得住现实的考验
---
## 谢谢!
**保持工程洁癖,下周见!**
---
# 居中标题
## 居中副标题
### 居中内容
---

57
project/java1/java-cli/pom.xml

@ -0,0 +1,57 @@
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example</groupId>
<artifactId>datacollect-cli</artifactId>
<version>0.1.0</version>
<properties>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.17.2</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.4.11</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.1</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.3.0</version>
<configuration>
<archive>
<manifest>
<mainClass>com.example.datacollect.Main</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>

113
project/java1/java-cli/src/main/java/com/example/datacollect/Main.java

@ -0,0 +1,113 @@
package com.example.datacollect;
import com.example.datacollect.controller.CrawlerController;
import com.example.datacollect.exception.ParseException;
import com.example.datacollect.model.Article;
import com.example.datacollect.repository.ArticleRepository;
import com.example.datacollect.strategy.CrawlStrategy;
import com.example.datacollect.strategy.StrategyFactory;
import com.example.datacollect.view.ConsoleView;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class Main {
private static final Logger logger = LoggerFactory.getLogger(Main.class);
public static void main(String[] args) {
logger.info("Starting CLI Crawler application");
ConsoleView view = new ConsoleView();
ArticleRepository repository = new ArticleRepository();
StrategyFactory strategyFactory = new StrategyFactory();
if (args.length > 0 && "-test".equals(args[0])) {
logger.info("Running in test mode");
runTest(view, repository, strategyFactory);
return;
}
CrawlerController controller = new CrawlerController(view, repository, strategyFactory);
view.printSuccess("Welcome to CLI Crawler (w10_3)! Type help for commands.");
logger.info("Application ready, waiting for input");
while (true) {
controller.handle(view.readLine());
}
}
private static void runTest(ConsoleView view, ArticleRepository repository, StrategyFactory strategyFactory) {
strategyFactory.register(new MockStrategy());
CrawlerController controller = new CrawlerController(view, repository, strategyFactory);
view.printSuccess("=== 测试完整 crawl → list 流程 ===");
view.printInfo("\n1. 测试空列表状态:");
controller.handle("list");
view.printInfo("\n2. 测试无效 URL(无匹配策略):");
controller.handle("crawl https://unknown.example.com");
view.printInfo("\n3. 测试爬取 mock.example.com:");
controller.handle("crawl https://mock.example.com");
view.printInfo("\n4. 测试 list 显示爬取结果:");
controller.handle("list");
view.printInfo("\n5. 测试爬取 blog.example.com:");
controller.handle("crawl https://blog.example.com");
view.printInfo("\n6. 测试 list 显示累计结果:");
controller.handle("list");
view.printInfo("\n7. 测试 getAll() 返回不可修改视图:");
testUnmodifiableView(repository);
view.printInfo("\n8. 测试 Repository 防御检查:");
testRepositoryDefense(repository);
view.printSuccess("\n=== 测试完成 ===");
}
private static void testUnmodifiableView(ArticleRepository repository) {
try {
repository.getAll().add(new Article("Test", "http://test.com", ""));
System.out.println("ERROR: 应该抛出 UnsupportedOperationException");
} catch (UnsupportedOperationException e) {
System.out.println("SUCCESS: getAll() 返回不可修改视图,正确抛出异常");
}
}
private static void testRepositoryDefense(ArticleRepository repository) {
try {
repository.add(null);
System.out.println("ERROR: 应该抛出 NullPointerException");
} catch (NullPointerException e) {
System.out.println("SUCCESS: 添加 null 文章正确抛出异常");
}
try {
repository.add(new Article("", "http://test.com", ""));
System.out.println("ERROR: 应该抛出 IllegalArgumentException");
} catch (IllegalArgumentException e) {
System.out.println("SUCCESS: 添加空标题文章正确抛出异常");
}
}
public static class MockStrategy implements CrawlStrategy {
@Override
public boolean supports(String url) {
return url != null && url.contains("mock.example.com");
}
@Override
public java.util.List<Article> parse(String url, org.jsoup.nodes.Document doc) throws ParseException {
java.util.List<Article> articles = new java.util.ArrayList<>();
articles.add(new Article("模拟文章 1", url + "/article1", "模拟内容 1"));
articles.add(new Article("模拟文章 2", url + "/article2", "模拟内容 2"));
articles.add(new Article("模拟文章 3", url + "/article3", "模拟内容 3"));
return articles;
}
}
}

8
project/java1/java-cli/src/main/java/com/example/datacollect/command/Command.java

@ -0,0 +1,8 @@
package com.example.datacollect.command;
import com.example.datacollect.repository.ArticleRepository;
public interface Command {
String getName();
void execute(String[] args, ArticleRepository repository);
}

96
project/java1/java-cli/src/main/java/com/example/datacollect/command/CrawlCommand.java

@ -0,0 +1,96 @@
package com.example.datacollect.command;
import com.example.datacollect.exception.ParseException;
import com.example.datacollect.repository.ArticleRepository;
import com.example.datacollect.strategy.CrawlStrategy;
import com.example.datacollect.strategy.StrategyFactory;
import com.example.datacollect.view.ConsoleView;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class CrawlCommand implements Command {
private static final Logger logger = LoggerFactory.getLogger(CrawlCommand.class);
private static final int MAX_RETRY = 3;
private static final long RETRY_DELAY_MS = 1000;
private final ConsoleView view;
private final StrategyFactory strategyFactory;
public CrawlCommand(ConsoleView view, StrategyFactory strategyFactory) {
this.view = view;
this.strategyFactory = strategyFactory;
}
@Override
public String getName() {
return "crawl";
}
@Override
public void execute(String[] args, ArticleRepository repository) {
if (args.length < 2) {
String errorMsg = "Usage: crawl <url>";
logger.warn(errorMsg);
view.printError(errorMsg);
return;
}
String url = args[1];
logger.info("Crawl started for: {}", url);
CrawlStrategy strategy = strategyFactory.getStrategy(url);
if (strategy == null) {
String errorMsg = "No strategy found for: " + url;
logger.warn(errorMsg);
view.printError(errorMsg);
return;
}
logger.info("Starting crawl for URL: {}", url);
view.printInfo("Crawling: " + url);
Document doc = null;
int attempt = 0;
boolean success = false;
while (attempt < MAX_RETRY && !success) {
attempt++;
try {
logger.debug("Attempt {} to fetch URL: {}", attempt, url);
doc = Jsoup.connect(url).get();
success = true;
} catch (Exception e) {
logger.warn("Attempt {} failed for URL {}: {}", attempt, url, e.getMessage());
if (attempt < MAX_RETRY) {
logger.info("Retrying in {}ms...", RETRY_DELAY_MS);
try {
Thread.sleep(RETRY_DELAY_MS);
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
break;
}
}
}
}
if (!success) {
String errorMsg = "Failed to fetch URL after " + MAX_RETRY + " attempts: " + url;
logger.error(errorMsg);
view.printError(errorMsg);
return;
}
try {
var articles = strategy.parse(url, doc);
for (var article : articles) {
repository.add(article);
}
logger.info("Successfully crawled {} articles from {}", articles.size(), url);
view.printSuccess("Crawled " + articles.size() + " articles.");
} catch (ParseException e) {
logger.error("Parse error for URL {}: {}", url, e.getMessage(), e);
view.printError("Parse error: " + e.getMessage());
}
}
}

28
project/java1/java-cli/src/main/java/com/example/datacollect/command/ExitCommand.java

@ -0,0 +1,28 @@
package com.example.datacollect.command;
import com.example.datacollect.repository.ArticleRepository;
import com.example.datacollect.view.ConsoleView;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class ExitCommand implements Command {
private static final Logger logger = LoggerFactory.getLogger(ExitCommand.class);
private final ConsoleView view;
public ExitCommand(ConsoleView view) {
this.view = view;
}
@Override
public String getName() {
return "exit";
}
@Override
public void execute(String[] args, ArticleRepository repository) {
logger.info("Exiting application");
view.printSuccess("Bye!");
System.exit(0);
}
}

27
project/java1/java-cli/src/main/java/com/example/datacollect/command/HelpCommand.java

@ -0,0 +1,27 @@
package com.example.datacollect.command;
import com.example.datacollect.repository.ArticleRepository;
import com.example.datacollect.view.ConsoleView;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class HelpCommand implements Command {
private static final Logger logger = LoggerFactory.getLogger(HelpCommand.class);
private final ConsoleView view;
public HelpCommand(ConsoleView view) {
this.view = view;
}
@Override
public String getName() {
return "help";
}
@Override
public void execute(String[] args, ArticleRepository repository) {
logger.debug("Displaying help");
view.printInfo("Commands: crawl <url>, list, help, exit");
}
}

27
project/java1/java-cli/src/main/java/com/example/datacollect/command/ListCommand.java

@ -0,0 +1,27 @@
package com.example.datacollect.command;
import com.example.datacollect.repository.ArticleRepository;
import com.example.datacollect.view.ConsoleView;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class ListCommand implements Command {
private static final Logger logger = LoggerFactory.getLogger(ListCommand.class);
private final ConsoleView view;
public ListCommand(ConsoleView view) {
this.view = view;
}
@Override
public String getName() {
return "list";
}
@Override
public void execute(String[] args, ArticleRepository repository) {
logger.debug("Listing articles");
view.display(repository.getAll());
}
}

57
project/java1/java-cli/src/main/java/com/example/datacollect/controller/CrawlerController.java

@ -0,0 +1,57 @@
package com.example.datacollect.controller;
import com.example.datacollect.command.Command;
import com.example.datacollect.command.CrawlCommand;
import com.example.datacollect.command.ExitCommand;
import com.example.datacollect.command.HelpCommand;
import com.example.datacollect.command.ListCommand;
import com.example.datacollect.repository.ArticleRepository;
import com.example.datacollect.strategy.StrategyFactory;
import com.example.datacollect.view.ConsoleView;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.HashMap;
import java.util.Map;
public class CrawlerController {
private static final Logger logger = LoggerFactory.getLogger(CrawlerController.class);
private final Map<String, Command> commands = new HashMap<>();
private final ConsoleView view;
private final ArticleRepository repository;
public CrawlerController(ConsoleView view, ArticleRepository repository, StrategyFactory strategyFactory) {
this.view = view;
this.repository = repository;
register(new HelpCommand(view));
register(new ListCommand(view));
register(new CrawlCommand(view, strategyFactory));
register(new ExitCommand(view));
logger.info("CrawlerController initialized with {} commands", commands.size());
}
private void register(Command command) {
commands.put(command.getName(), command);
logger.debug("Registered command: {}", command.getName());
}
public void handle(String input) {
String text = input == null ? "" : input.trim();
if (text.isEmpty()) {
return;
}
String[] args = text.split("\\s+");
String cmdName = args[0].toLowerCase();
Command command = commands.get(cmdName);
if (command == null) {
String errorMsg = "Unknown command: " + cmdName;
logger.warn(errorMsg);
view.printError(errorMsg);
return;
}
logger.debug("Executing command: {}", cmdName);
command.execute(args, repository);
}
}

11
project/java1/java-cli/src/main/java/com/example/datacollect/exception/CrawlerException.java

@ -0,0 +1,11 @@
package com.example.datacollect.exception;
public class CrawlerException extends Exception {
public CrawlerException(String message) {
super(message);
}
public CrawlerException(String message, Throwable cause) {
super(message, cause);
}
}

11
project/java1/java-cli/src/main/java/com/example/datacollect/exception/NetworkException.java

@ -0,0 +1,11 @@
package com.example.datacollect.exception;
public class NetworkException extends CrawlerException {
public NetworkException(String message) {
super(message);
}
public NetworkException(String message, Throwable cause) {
super(message, cause);
}
}

11
project/java1/java-cli/src/main/java/com/example/datacollect/exception/ParseException.java

@ -0,0 +1,11 @@
package com.example.datacollect.exception;
public class ParseException extends CrawlerException {
public ParseException(String message) {
super(message);
}
public ParseException(String message, Throwable cause) {
super(message, cause);
}
}

45
project/java1/java-cli/src/main/java/com/example/datacollect/model/Article.java

@ -0,0 +1,45 @@
package com.example.datacollect.model;
public class Article {
private String title;
private String url;
private String content;
public Article(String title, String url, String content) {
this.title = title;
this.url = url;
this.content = content;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
public String getUrl() {
return url;
}
public void setUrl(String url) {
this.url = url;
}
public String getContent() {
return content;
}
public void setContent(String content) {
this.content = content;
}
@Override
public String toString() {
return "Article{"
+ "title='" + title + '\''
+ ", url='" + url + '\''
+ '}';
}
}

80
project/java1/java-cli/src/main/java/com/example/datacollect/repository/ArticleRepository.java

@ -0,0 +1,80 @@
package com.example.datacollect.repository;
import com.example.datacollect.model.Article;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Objects;
public class ArticleRepository {
private static final Logger logger = LoggerFactory.getLogger(ArticleRepository.class);
private final List<Article> articles = new ArrayList<>();
public void add(Article article) {
Objects.requireNonNull(article, "Article cannot be null");
if (article.getTitle() == null || article.getTitle().trim().isEmpty()) {
logger.warn("Attempted to add article with empty title");
throw new IllegalArgumentException("Article title cannot be null or empty");
}
if (article.getUrl() == null || article.getUrl().trim().isEmpty()) {
logger.warn("Attempted to add article with empty URL");
throw new IllegalArgumentException("Article URL cannot be null or empty");
}
articles.add(article);
logger.debug("Added article: {}", article.getTitle());
}
public void addAll(List<Article> articleList) {
Objects.requireNonNull(articleList, "Article list cannot be null");
if (articleList.isEmpty()) {
logger.debug("Empty article list provided, nothing to add");
return;
}
for (Article article : articleList) {
add(article);
}
logger.info("Added {} articles", articleList.size());
}
public List<Article> getAll() {
List<Article> result = Collections.unmodifiableList(articles);
logger.debug("Returning {} articles (unmodifiable)", result.size());
return result;
}
public Article get(int index) {
if (index < 0 || index >= articles.size()) {
logger.warn("Index out of bounds: {} (size: {})", index, articles.size());
throw new IndexOutOfBoundsException("Index: " + index + ", Size: " + articles.size());
}
return articles.get(index);
}
public int size() {
return articles.size();
}
public boolean isEmpty() {
return articles.isEmpty();
}
public void clear() {
int size = articles.size();
articles.clear();
logger.info("Cleared {} articles", size);
}
public boolean contains(Article article) {
Objects.requireNonNull(article, "Article cannot be null");
return articles.contains(article);
}
}

43
project/java1/java-cli/src/main/java/com/example/datacollect/strategy/BlogStrategy.java

@ -0,0 +1,43 @@
package com.example.datacollect.strategy;
import com.example.datacollect.exception.ParseException;
import com.example.datacollect.model.Article;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
public class BlogStrategy implements CrawlStrategy {
private static final Logger logger = LoggerFactory.getLogger(BlogStrategy.class);
@Override
public boolean supports(String url) {
boolean supported = url != null && url.contains("blog.example.com");
logger.debug("BlogStrategy supports URL {}: {}", url, supported);
return supported;
}
@Override
public List<Article> parse(String url, Document doc) throws ParseException {
try {
logger.debug("Parsing blog page: {}", url);
List<Article> articles = new ArrayList<>();
Elements titles = doc.select(".post-title");
logger.debug("Found {} titles", titles.size());
for (Element e : titles) {
articles.add(new Article(e.text(), url, ""));
}
logger.info("Parsed {} articles from blog", articles.size());
return articles;
} catch (Exception e) {
logger.error("Failed to parse blog page: {}", e.getMessage(), e);
throw new ParseException("Failed to parse blog page", e);
}
}
}

73
project/java1/java-cli/src/main/java/com/example/datacollect/strategy/CctvNewsStrategy.java

@ -0,0 +1,73 @@
package com.example.datacollect.strategy;
import com.example.datacollect.exception.ParseException;
import com.example.datacollect.model.Article;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
public class CctvNewsStrategy implements CrawlStrategy {
private static final Logger logger = LoggerFactory.getLogger(CctvNewsStrategy.class);
@Override
public boolean supports(String url) {
boolean supported = url != null && url.contains("tv.cctv.com");
logger.debug("CctvNewsStrategy supports URL {}: {}", url, supported);
return supported;
}
@Override
public List<Article> parse(String url, Document doc) throws ParseException {
try {
logger.debug("Parsing CCTV news page: {}", url);
List<Article> articles = new ArrayList<>();
Elements newsItems = doc.select("div.list_item");
logger.debug("Found {} news items", newsItems.size());
for (Element item : newsItems) {
Element titleEl = item.selectFirst("a.title");
Element descEl = item.selectFirst("p.description");
if (titleEl != null) {
String title = titleEl.text().trim();
String articleUrl = titleEl.attr("href");
if (!articleUrl.startsWith("http")) {
articleUrl = "https://news.cctv.com" + articleUrl;
}
String content = descEl != null ? descEl.text().trim() : "";
if (!title.isEmpty()) {
articles.add(new Article(title, articleUrl, content));
}
}
}
if (articles.isEmpty()) {
Elements listItems = doc.select("li");
for (Element item : listItems) {
Element link = item.selectFirst("a");
if (link != null) {
String title = link.text().trim();
String articleUrl = link.attr("abs:href");
if (!title.isEmpty() && articleUrl != null && !articleUrl.isEmpty()) {
articles.add(new Article(title, articleUrl, ""));
}
}
}
}
logger.info("Parsed {} news from CCTV", articles.size());
return articles;
} catch (Exception e) {
logger.error("Failed to parse CCTV news page: {}", e.getMessage(), e);
throw new ParseException("Failed to parse CCTV news page", e);
}
}
}

12
project/java1/java-cli/src/main/java/com/example/datacollect/strategy/CrawlStrategy.java

@ -0,0 +1,12 @@
package com.example.datacollect.strategy;
import com.example.datacollect.exception.ParseException;
import com.example.datacollect.model.Article;
import org.jsoup.nodes.Document;
import java.util.List;
public interface CrawlStrategy {
List<Article> parse(String url, Document doc) throws ParseException;
boolean supports(String url);
}

65
project/java1/java-cli/src/main/java/com/example/datacollect/strategy/HnuNewsStrategy.java

@ -0,0 +1,65 @@
package com.example.datacollect.strategy;
import com.example.datacollect.exception.ParseException;
import com.example.datacollect.model.Article;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
public class HnuNewsStrategy implements CrawlStrategy {
private static final Logger logger = LoggerFactory.getLogger(HnuNewsStrategy.class);
@Override
public boolean supports(String url) {
boolean supported = url != null && url.contains("news.hnu.edu.cn");
logger.debug("HnuNewsStrategy supports URL {}: {}", url, supported);
return supported;
}
@Override
public List<Article> parse(String url, Document doc) throws ParseException {
try {
logger.debug("Parsing Hnu news page: {}", url);
List<Article> articles = new ArrayList<>();
Elements listItems = doc.select("ul.list11 li");
logger.debug("Found {} list items", listItems.size());
for (Element li : listItems) {
Element link = li.selectFirst("a");
if (link == null) continue;
String articleUrl = link.attr("href");
if (!articleUrl.startsWith("http")) {
articleUrl = "https://news.hnu.edu.cn" + articleUrl.replace("..", "");
}
String title = "";
Element titleEl = link.selectFirst("h4.l2.h4s2");
if (titleEl != null) {
title = titleEl.text().trim();
}
String content = "";
Element contentEl = link.selectFirst("p.l3.ps3");
if (contentEl != null) {
content = contentEl.text().trim();
}
if (!title.isEmpty()) {
articles.add(new Article(title, articleUrl, content));
}
}
logger.info("Parsed {} articles from Hnu news", articles.size());
return articles;
} catch (Exception e) {
logger.error("Failed to parse Hnu news page: {}", e.getMessage(), e);
throw new ParseException("Failed to parse Hnu news page", e);
}
}
}

43
project/java1/java-cli/src/main/java/com/example/datacollect/strategy/NewsStrategy.java

@ -0,0 +1,43 @@
package com.example.datacollect.strategy;
import com.example.datacollect.exception.ParseException;
import com.example.datacollect.model.Article;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
public class NewsStrategy implements CrawlStrategy {
private static final Logger logger = LoggerFactory.getLogger(NewsStrategy.class);
@Override
public boolean supports(String url) {
boolean supported = url != null && url.contains("news.example.com");
logger.debug("NewsStrategy supports URL {}: {}", url, supported);
return supported;
}
@Override
public List<Article> parse(String url, Document doc) throws ParseException {
try {
logger.debug("Parsing news page: {}", url);
List<Article> articles = new ArrayList<>();
Elements items = doc.select(".article-headline");
logger.debug("Found {} headlines", items.size());
for (Element e : items) {
articles.add(new Article(e.text(), url, ""));
}
logger.info("Parsed {} articles from news", articles.size());
return articles;
} catch (Exception e) {
logger.error("Failed to parse news page: {}", e.getMessage(), e);
throw new ParseException("Failed to parse news page", e);
}
}
}

58
project/java1/java-cli/src/main/java/com/example/datacollect/strategy/StrategyFactory.java

@ -0,0 +1,58 @@
package com.example.datacollect.strategy;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
public class StrategyFactory {
private static final Logger logger = LoggerFactory.getLogger(StrategyFactory.class);
private final List<CrawlStrategy> strategies = new ArrayList<>();
public StrategyFactory() {
strategies.add(new HnuNewsStrategy());
strategies.add(new BlogStrategy());
strategies.add(new NewsStrategy());
strategies.add(new WeiboHotStrategy());
strategies.add(new CctvNewsStrategy());
strategies.add(new WeatherStrategy());
logger.info("StrategyFactory initialized with {} strategies", strategies.size());
}
public CrawlStrategy getStrategy(String url) {
Objects.requireNonNull(url, "URL cannot be null");
if (url.trim().isEmpty()) {
logger.warn("Empty URL provided");
return null;
}
for (CrawlStrategy s : strategies) {
if (s.supports(url)) {
logger.debug("Found strategy {} for URL: {}", s.getClass().getSimpleName(), url);
return s;
}
}
logger.warn("No strategy found for URL: {}", url);
return null;
}
public void register(CrawlStrategy strategy) {
Objects.requireNonNull(strategy, "Strategy cannot be null");
if (strategies.contains(strategy)) {
logger.warn("Strategy {} already registered", strategy.getClass().getSimpleName());
return;
}
strategies.add(strategy);
logger.info("Registered strategy: {}", strategy.getClass().getSimpleName());
}
public int getStrategyCount() {
return strategies.size();
}
}

94
project/java1/java-cli/src/main/java/com/example/datacollect/strategy/WeatherStrategy.java

@ -0,0 +1,94 @@
package com.example.datacollect.strategy;
import com.example.datacollect.exception.ParseException;
import com.example.datacollect.model.Article;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.ArrayList;
import java.util.List;
public class WeatherStrategy implements CrawlStrategy {
private static final Logger logger = LoggerFactory.getLogger(WeatherStrategy.class);
@Override
public boolean supports(String url) {
boolean supported = url != null && (url.contains("weather") || url.contains("tianqi"));
logger.debug("WeatherStrategy supports URL {}: {}", url, supported);
return supported;
}
@Override
public List<Article> parse(String url, Document doc) throws ParseException {
try {
logger.debug("Parsing weather page: {}", url);
List<Article> articles = new ArrayList<>();
String today = LocalDate.now().format(DateTimeFormatter.ofPattern("yyyy年MM月dd日"));
Elements weatherCards = doc.select("div.today");
if (!weatherCards.isEmpty()) {
Element todayCard = weatherCards.first();
String city = "长沙";
Element tempEl = todayCard.selectFirst("span.temperature");
Element weatherEl = todayCard.selectFirst("span.weather");
Element windEl = todayCard.selectFirst("span.wind");
String temp = tempEl != null ? tempEl.text().trim() : "";
String weather = weatherEl != null ? weatherEl.text().trim() : "";
String wind = windEl != null ? windEl.text().trim() : "";
String title = today + " " + city + "天气";
String content = "温度: " + temp + ", 天气: " + weather + ", 风力: " + wind;
articles.add(new Article(title, url, content));
} else {
Elements temps = doc.select("div.temp");
Elements weathers = doc.select("div.weather");
if (!temps.isEmpty()) {
String city = "长沙";
String temp = temps.first().text().trim();
String weather = weathers.isEmpty() ? "" : weathers.first().text().trim();
String title = today + " " + city + "天气";
String content = "温度: " + temp + ", 天气状况: " + weather;
articles.add(new Article(title, url, content));
} else {
String city = "长沙";
String title = today + " " + city + "天气";
String content = "今日温度:待更新,天气状况:待更新";
articles.add(new Article(title, url, content));
}
}
Elements forecastItems = doc.select("div.forecast-item");
for (Element item : forecastItems) {
Element dateEl = item.selectFirst("span.date");
Element weatherEl = item.selectFirst("span.weather");
Element tempEl = item.selectFirst("span.temp");
if (dateEl != null) {
String date = dateEl.text().trim();
String weather = weatherEl != null ? weatherEl.text().trim() : "";
String temp = tempEl != null ? tempEl.text().trim() : "";
String title = date + " 天气预告";
String content = "天气: " + weather + ", 温度: " + temp;
articles.add(new Article(title, url, content));
}
}
logger.info("Parsed {} weather items", articles.size());
return articles;
} catch (Exception e) {
logger.error("Failed to parse weather page: {}", e.getMessage(), e);
throw new ParseException("Failed to parse weather page", e);
}
}
}

77
project/java1/java-cli/src/main/java/com/example/datacollect/strategy/WeiboHotStrategy.java

@ -0,0 +1,77 @@
package com.example.datacollect.strategy;
import com.example.datacollect.exception.ParseException;
import com.example.datacollect.model.Article;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
public class WeiboHotStrategy implements CrawlStrategy {
private static final Logger logger = LoggerFactory.getLogger(WeiboHotStrategy.class);
@Override
public boolean supports(String url) {
boolean supported = url != null && url.contains("weibo.com");
logger.debug("WeiboHotStrategy supports URL {}: {}", url, supported);
return supported;
}
@Override
public List<Article> parse(String url, Document doc) throws ParseException {
try {
logger.debug("Parsing Weibo hot page: {}", url);
List<Article> articles = new ArrayList<>();
Elements hotItems = doc.select("div[class*=HotTopic_item]");
logger.debug("Found {} hot items", hotItems.size());
for (Element item : hotItems) {
Element titleEl = item.selectFirst("a[class*=HotTopic_title]");
Element hotEl = item.selectFirst("span[class*=HotTopic_hot]");
if (titleEl != null) {
String title = titleEl.text().trim();
String hotUrl = "https://s.weibo.com" + titleEl.attr("href");
String hotValue = hotEl != null ? hotEl.text().trim() : "";
if (!title.isEmpty()) {
String content = "热度: " + hotValue;
articles.add(new Article(title, hotUrl, content));
}
}
}
if (articles.isEmpty()) {
Elements cards = doc.select("div.card-wrap");
for (Element card : cards) {
Element titleEl = card.selectFirst("a.title");
Element descEl = card.selectFirst("p.desc");
if (titleEl != null) {
String title = titleEl.text().trim();
String articleUrl = titleEl.attr("href");
if (!articleUrl.startsWith("http")) {
articleUrl = "https://s.weibo.com" + articleUrl;
}
String content = descEl != null ? descEl.text().trim() : "";
if (!title.isEmpty()) {
articles.add(new Article(title, articleUrl, content));
}
}
}
}
logger.info("Parsed {} hot topics from Weibo", articles.size());
return articles;
} catch (Exception e) {
logger.error("Failed to parse Weibo hot page: {}", e.getMessage(), e);
throw new ParseException("Failed to parse Weibo hot page", e);
}
}
}

52
project/java1/java-cli/src/main/java/com/example/datacollect/view/ConsoleView.java

@ -0,0 +1,52 @@
package com.example.datacollect.view;
import com.example.datacollect.model.Article;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.List;
import java.util.Scanner;
public class ConsoleView {
private static final Logger logger = LoggerFactory.getLogger(ConsoleView.class);
private static final String ANSI_RESET = "\u001B[0m";
private static final String ANSI_GREEN = "\u001B[32m";
private static final String ANSI_RED = "\u001B[31m";
private static final String ANSI_BLUE = "\u001B[34m";
private final Scanner scanner = new Scanner(System.in);
public String readLine() {
logger.debug("Reading input from console");
System.out.print("> ");
return scanner.nextLine();
}
public void printSuccess(String msg) {
logger.info("Success: {}", msg);
System.out.println(ANSI_GREEN + msg + ANSI_RESET);
}
public void printError(String msg) {
logger.error("Error: {}", msg);
System.out.println(ANSI_RED + msg + ANSI_RESET);
}
public void printInfo(String msg) {
logger.debug("Info: {}", msg);
System.out.println(ANSI_BLUE + msg + ANSI_RESET);
}
public void display(List<Article> articles) {
logger.debug("Displaying {} articles", articles.size());
if (articles.isEmpty()) {
printInfo("暂无文章,请先执行 crawl。");
return;
}
for (int i = 0; i < articles.size(); i++) {
Article a = articles.get(i);
System.out.println((i + 1) + ". " + a.getTitle() + " | " + a.getUrl());
}
}
}

27
project/java1/java-cli/src/main/resources/logback.xml

@ -0,0 +1,27 @@
<configuration>
<property name="LOG_PATH" value="./logs"/>
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${LOG_PATH}/crawler.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>${LOG_PATH}/crawler.%d{yyyy-MM-dd}.log</fileNamePattern>
<maxHistory>30</maxHistory>
</rollingPolicy>
<encoder>
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<logger name="com.example.datacollect" level="DEBUG"/>
<root level="INFO">
<appender-ref ref="CONSOLE"/>
<appender-ref ref="FILE"/>
</root>
</configuration>

27
project/java1/java-cli/target/classes/logback.xml

@ -0,0 +1,27 @@
<configuration>
<property name="LOG_PATH" value="./logs"/>
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${LOG_PATH}/crawler.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>${LOG_PATH}/crawler.%d{yyyy-MM-dd}.log</fileNamePattern>
<maxHistory>30</maxHistory>
</rollingPolicy>
<encoder>
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<logger name="com.example.datacollect" level="DEBUG"/>
<root level="INFO">
<appender-ref ref="CONSOLE"/>
<appender-ref ref="FILE"/>
</root>
</configuration>

3
project/java1/java-cli/target/maven-archiver/pom.properties

@ -0,0 +1,3 @@
artifactId=datacollect-cli
groupId=com.example
version=0.1.0

22
project/java1/java-cli/target/maven-status/maven-compiler-plugin/compile/default-compile/createdFiles.lst

@ -0,0 +1,22 @@
com\example\datacollect\command\ListCommand.class
com\example\datacollect\strategy\WeiboHotStrategy.class
com\example\datacollect\command\CrawlCommand.class
com\example\datacollect\strategy\BlogStrategy.class
com\example\datacollect\repository\ArticleRepository.class
com\example\datacollect\Main.class
com\example\datacollect\view\ConsoleView.class
com\example\datacollect\command\ExitCommand.class
com\example\datacollect\command\HelpCommand.class
com\example\datacollect\strategy\CctvNewsStrategy.class
com\example\datacollect\Main$MockStrategy.class
com\example\datacollect\strategy\NewsStrategy.class
com\example\datacollect\strategy\WeatherStrategy.class
com\example\datacollect\command\Command.class
com\example\datacollect\controller\CrawlerController.class
com\example\datacollect\exception\CrawlerException.class
com\example\datacollect\exception\NetworkException.class
com\example\datacollect\strategy\StrategyFactory.class
com\example\datacollect\strategy\HnuNewsStrategy.class
com\example\datacollect\exception\ParseException.class
com\example\datacollect\strategy\CrawlStrategy.class
com\example\datacollect\model\Article.class

21
project/java1/java-cli/target/maven-status/maven-compiler-plugin/compile/default-compile/inputFiles.lst

@ -0,0 +1,21 @@
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\command\ListCommand.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\exception\CrawlerException.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\strategy\NewsStrategy.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\command\CrawlCommand.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\controller\CrawlerController.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\strategy\BlogStrategy.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\strategy\WeatherStrategy.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\strategy\CctvNewsStrategy.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\view\ConsoleView.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\command\HelpCommand.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\exception\NetworkException.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\repository\ArticleRepository.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\Main.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\strategy\CrawlStrategy.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\command\Command.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\command\ExitCommand.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\strategy\StrategyFactory.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\model\Article.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\exception\ParseException.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\strategy\HnuNewsStrategy.java
C:\Users\14003\Desktop\java1\java-cli\src\main\java\com\example\datacollect\strategy\WeiboHotStrategy.java

705
project/java1/java-cli/第10周——设计模式:灵活性与可扩展性.md

@ -0,0 +1,705 @@
# 教案:《高级程序设计》第10周——设计模式:灵活性与可扩展性
| 项目 | 内容 |
| -------- | ---------------------------------------------------------------------------- |
| **课程名称** | 高级程序设计 |
| **周次** | 第10周 |
| **主题** | 设计模式——灵活性与可扩展性 |
| **学时** | 2学时(90分钟) |
| **授课对象** | 已完成第9周CLI+MVC架构学习,具备Command模式基础 |
| **教学环境** | JDK 17+、IntelliJ IDEA、Maven |
| **前情提要** | W9搭建了CLI骨架:MVC分层 + Command路由,但留下了两大隐患——解析逻辑耦合在Command中、List\<Article\>共享引用裸奔 |
---
## 教学调整说明:为什么W10要在“骨架”上装“盔甲”?
> **W9成果**:一个可扩展的命令行骨架 → **W9痛点**:解析器与数据存储仍在“裸奔”
| 维度 | W9状态 | W10目标 |
|------|--------|---------|
| **架构** | MVC分层清晰 | MVC + 策略模式 + 仓库层 |
| **命令扩展** | 新增命令不改Controller | 新增解析器不改任何旧代码 |
| **数据安全** | List\<Article\>全员可写 | Repository封装,只暴露安全接口 |
| **解析逻辑** | 硬编码在CrawlCommand内 | 策略模式,按URL自动匹配 |
| **代码量** | ~8个类 | ~12个类,但每个更小更纯粹 |
**决策理由**:
1. W9学生已经感受到Command模式的好处——**多态带来的扩展性**
2. 策略模式是多态思想的又一次实战,是**接口抽象的深化**
3. 仓库层是“封装”这一OOP核心原则的落地,补上W9留下的课
4. 解析器工厂让学生看到**“自动匹配”**的威力——增加网站支持只需新增一个类
**更深层的教育价值**:
> W9教会学生“怎么把代码分开”,W10要教会学生“怎么把代码分开后还能优雅地合上”——**接口即合同,工厂即自动匹配,仓库即数据守卫**。这三句话,就是本周的全部精华。
---
## 一、教学目标
| 目标维度 | 具体描述 |
|----------|----------|
| **知识掌握** | 理解策略模式的定义与多态本质;掌握工厂模式的两类变体(工厂方法/简单工厂)及适用场景;理解仓库模式对数据访问的封装原理。 |
| **工程实践** | 能在爬虫项目中用策略模式封装不同网站的解析逻辑;能实现解析器工厂,根据URL自动匹配解析策略;能用Repository模式替代裸List,提供安全的数据访问接口。 |
| **思维转型** | 从“写死逻辑”转向“策略可插拔”;从“直接操作集合”转向“通过仓库存取”;理解“对扩展开放,对修改关闭”的开闭原则。 |
| **工具应用** | 利用AI审查策略模式实现是否真正解耦;让AI扮演“网站结构分析师”辅助编写具体解析策略;用AI生成Repository的安全接口建议。 |
---
## 二、教学重点与难点
| 项目 | 内容 | 突破方法 |
|------|------|----------|
| **重点** | 策略模式的多态本质、解析器工厂的自动匹配机制、Repository对数据访问的封装 | 以“新增网站需要改什么”为切入点,展示策略模式的开闭原则达成;通过“攻击”当前List裸奔的问题,引出Repository的必然性 |
| **难点** | 理解“接口即合同”的抽象思维、工厂模式中反射/Map注册的实现、仓库层与Strategy模式的协同 | 用“插座与电器”类比接口标准;现场演示从硬编码→工厂→反射的演进路径;用时序图展示“用户→Command→Strategy→Repository”的完整调用链 |
---
## 三、教学过程设计(90分钟)
| 环节 | 时间 | 教学内容 | 师生活动 | AI协同点 |
| -------------------------- | --- | ----------------------------------------------------------------- | -------------------------------------- | --------------------------- |
| **1. W9回顾与痛点暴露** | 8' | 回顾W9成果(CLI骨架),暴露两大隐患:①CrawlCommand里解析逻辑硬编码;②List\<Article\>全员可读可写 | **教师演示**:展示W9代码,用“事故场景”引发思考 | — |
| **2. 策略模式:解析器的“插头标准化”** | 18' | 策略模式定义、接口设计、多态调用、与Command模式的对比 | **类比**:插座与电器;**教师演示**:从if-else到策略模式的演进 | 让AI生成“策略模式vs switch-case”对比 |
| **3. 解析器工厂:自动匹配的魔法** | 14' | 工厂模式的两种形态(简单工厂→Map注册工厂),解析器工厂实现 | **教师演示**:先用if-else判断host,再升级为Map注册工厂 | 让AI解释工厂模式与策略模式如何协同 |
| **4. Repository模式:武装数据访问** | 12' | Repository定义、接口设计、替换List\<Article\>后的影响 | **教师演示**:在原代码中把List替换为Repository,展示改动点 | 学生用AI审计Repository接口的“最小完备性” |
| **5. 整体架构串联** | 8' | 用一张时序图串联:用户→CLI→Controller→Command→Strategy→Repository→Model | **师生互动**:让学生在白板上画出调用链 | — |
| **6. 代码落地** | 20' | 实现CrawlStrategy接口 + 两个策略 + 解析器工厂 + ArticleRepository | **教师演示**:分步写出代码,刻意埋入“策略匹配失败”的异常处理 | 完成后用AI检查策略模式实现 |
| **7. 架构反思与W11预告** | 5' | 当前架构还有什么隐患?(异常处理不统一、日志缺失)→ 预告W11健壮性工程 | **师生互动**:如果解析器工厂找不到匹配策略,会发生什么? | — |
| **8. 实践任务** | 5' | 实现策略模式和仓库层,完成本周代码升级 | 学生现场编码,教师巡视 | — |
---
## 四、核心教学内容脚本
### 4.1 W9回顾与痛点暴露(8分钟)
**教师口播**:
> "上节课我们搭了一个很漂亮的骨架——CLI+MVC+Command模式。我们先来表扬一下自己:新增一个命令,只要新建一个类,Controller零改动。但请大家想一个问题——"
**投影展示W9的CrawlCommand存根**:
```java
public class CrawlCommand implements Command {
// ...
public void execute(String[] args, List<Article> articles) {
if (args.length < 2) {
view.printError("Usage: crawl <url>");
return;
}
view.printInfo("Stub: Would crawl " + args[1]);
}
}
```
**提问引导**:
1. "这个存根下周要填坑了。假设我们现在要真正实现爬取,代码写在哪?"
2. "如果我要支持两个网站——比如一个技术博客和一个新闻网站——它们的HTML结构完全不一样,这个`execute`方法会变成什么样?"
**展示“噩梦版”CrawlCommand**:
```java
public void execute(String[] args, List<Article> articles) {
String url = args[1];
// 五十行if-else地狱...
if (url.contains("blog.example.com")) {
// 解析技术博客的HTML
Document doc = Jsoup.connect(url).get();
Elements titles = doc.select(".post-title");
for (Element e : titles) {
articles.add(new Article(e.text(), url, ""));
}
} else if (url.contains("news.example.com")) {
// 解析新闻网站的HTML
Document doc = Jsoup.connect(url).get();
Elements items = doc.select(".article-headline");
for (Element e : items) {
articles.add(new Article(e.text(), url, ""));
}
} else {
view.printError("Unsupported website!");
}
}
```
**痛点提炼**:
> "看到了吗?每支持一个新网站,就要在这里加一个`else if`。这就是W1我们痛批的'牵一发而动全身',只不过这次灾难地点从`main`搬到了`CrawlCommand`。"
>
> "更重要的是,我们上节课辛辛苦苦实现了Command模式,难道解析逻辑又要回到if-else地狱吗?**这就是W10要解决的第一个问题:怎么让解析逻辑也可插拔?**"
**第二个隐患——共享状态的回顾**:
> "还有一件事,我们上节课结束前提到的:`List<Article> articles`在所有Command之间共享。任何一个Command都可以往里面塞东西、删东西、甚至清空。这是W10要解决的第二个问题:**怎么给数据装上'防盗门'?**"
---
### 4.2 策略模式:解析器的“插头标准化”(18分钟)
#### 4.2.1 从类比切入
**教师口播**:
> "先讲个生活场景。你家里墙上有一个三孔插座,你可以插电视、插电脑、插手机充电器——任何符合这个标准的电器都能用。插座不在乎你是什么电器,它只认接口标准。"
**类比映射**:
| 生活场景 | 代码对应 |
|----------|----------|
| 三孔插座 | `CrawlStrategy` 接口 |
| 电视/电脑充电器 | 具体解析策略(BlogStrategy/NewsStrategy) |
| 电流 | 输入:URL + Document;输出:List\<Article\> |
| 你(使用者) | CrawlCommand |
| 插座面板 | 解析器工厂 |
> "策略模式的核心思想就是:**定义一个算法接口,让具体的算法实现可以互相替换,而使用算法的客户端不受影响。**"
#### 4.2.2 策略模式定义
```java
// src/main/java/com/crawler/strategy/CrawlStrategy.java
package com.crawler.strategy;
import com.crawler.model.Article;
import org.jsoup.nodes.Document;
import java.util.List;
public interface CrawlStrategy {
/**
* 从已获取的Document中解析文章列表
* @param url 原始请求URL(用于填充Article)
* @param doc Jsoup解析后的Document
* @return 解析出的文章列表
*/
List<Article> parse(String url, Document doc);
/**
* 判断此策略是否为给定URL服务
* @param url 待判断的URL
* @return true表示此策略可以处理该URL
*/
boolean supports(String url);
}
```
**教师口播**:
> "注意,策略接口里有两个方法。`parse`是干活的那个,`supports`是'我能不能干这个活'——这是什么?**这是合同!** 任何网站想被我们爬虫支持,就必须签署这份合同:告诉我你是不是我的客户(supports),以及怎么解析你(parse)。"
#### 4.2.3 具体策略实现示例
```java
// BlogStrategy.java - 技术博客解析策略
public class BlogStrategy implements CrawlStrategy {
@Override
public boolean supports(String url) {
return url.contains("blog.example.com");
}
@Override
public List<Article> parse(String url, Document doc) {
List<Article> articles = new ArrayList<>();
Elements titles = doc.select(".post-title");
for (Element e : titles) {
articles.add(new Article(e.text(), url, ""));
}
return articles;
}
}
// NewsStrategy.java - 新闻网站解析策略
public class NewsStrategy implements CrawlStrategy {
@Override
public boolean supports(String url) {
return url.contains("news.example.com");
}
@Override
public List<Article> parse(String url, Document doc) {
List<Article> articles = new ArrayList<>();
Elements items = doc.select(".article-headline");
for (Element e : items) {
articles.add(new Article(e.text(), url, ""));
}
return articles;
}
}
```
**对比:策略模式 vs 硬编码if-else**
| 维度 | if-else屎山 | 策略模式 |
|------|-------------|----------|
| 新增网站 | 改CrawlCommand,加else if | 新写一个类,实现CrawlStrategy |
| 修改解析逻辑 | 在CrawlCommand里翻找对应的else if | 只改对应策略类 |
| 测试 | 必须启动整个爬虫 | 单独对Strategy做单元测试 |
| 是否符合开闭原则 | ❌ 对修改开放 | ✅ 对扩展开放,对修改关闭 |
**与Command模式的对比(加深理解)**:
> "上节课Command模式,我们为每个命令定义一个类;这节课策略模式,我们为每个网站的解析算法定义一个类。**本质上都是同一个OOP思想:用多态替代条件分支。** 只不过Command的接口是`execute()`,Strategy的接口是`parse()`。"
>
> "这张图你们可以记下来:**接口是消除if-else的利器,多态是接口的灵魂。**"
---
### 4.3 解析器工厂:自动匹配的魔法(14分钟)
#### 4.3.1 问题引出
**教师口播**:
> "现在我们有A网站的策略、B网站的策略。问题来了:谁来选策略?谁来遍历所有策略,找到一个supports返回true的?"
>
> "如果把这个逻辑写在CrawlCommand里,那策略模式就白用了——CrawlCommand还是得'知道'有哪些策略。我们要的是一个黑盒子:**把URL丢进去,自动弹出一个合适的解析器。**"
#### 4.3.2 解析器工厂的实现
```java
// src/main/java/com/crawler/strategy/StrategyFactory.java
package com.crawler.strategy;
import java.util.ArrayList;
import java.util.List;
public class StrategyFactory {
private final List<CrawlStrategy> strategies = new ArrayList<>();
// 注册策略——新的网站只需在这里加一行
public StrategyFactory() {
strategies.add(new BlogStrategy());
strategies.add(new NewsStrategy());
// 未来增加新网站:strategies.add(new XxxStrategy());
}
/**
* 根据URL自动匹配解析策略
* @param url 目标URL
* @return 匹配的策略,如果没有匹配返回null
*/
public CrawlStrategy getStrategy(String url) {
for (CrawlStrategy s : strategies) {
if (s.supports(url)) {
return s;
}
}
return null; // 未找到匹配策略
}
}
```
**教师口播**:
> "这个工厂类足够简单:一个List存所有策略,一个方法遍历找到匹配的。但简单不等于不强大。**
>
> **关键点**:新增网站支持,只需要——"
1. 写一个`XxxStrategy`实现`CrawlStrategy`
2. 在工厂构造器里加一行`strategies.add(new XxxStrategy())`
>
> "CrawlCommand一行不改。这就是开闭原则的胜利。"
#### 4.3.3 从简单工厂到更高级的注册机制(拓展思维)
**教师口播**:
> "有同学可能会问:还要在工厂构造器里加一行,能不能做到完全零改动?当然可以——用反射或者SPI。"
**演示概念(不要求实现)**:
```java
// 进阶思路:扫描指定包下的所有CrawlStrategy实现类
// 用反射自动注册,真正做到“新增类即生效”
// 这是Spring框架的核心思想之一
```
> "这个技术我们暂时不要求掌握,但我希望你们知道:你现在写的每一个`new XxxStrategy()`,在未来都可能进化为框架级别的自动装配。**你现在建立的思维习惯,决定了你未来能走多高。**"
#### 4.3.4 重构后的CrawlCommand
```java
public class CrawlCommand implements Command {
private ConsoleView view;
private StrategyFactory strategyFactory;
private ArticleRepository repository; // 注意:这里是Repository了!
public CrawlCommand(ConsoleView v, StrategyFactory f, ArticleRepository r) {
this.view = v;
this.strategyFactory = f;
this.repository = r;
}
public String getName() { return "crawl"; }
public void execute(String[] args, List<Article> articles) {
if (args.length < 2) {
view.printError("Usage: crawl <url>");
return;
}
String url = args[1];
// 1. 工厂自动选策略
CrawlStrategy strategy = strategyFactory.getStrategy(url);
if (strategy == null) {
view.printError("No strategy found for: " + url);
return;
}
// 2. 抓取页面
view.printInfo("Crawling: " + url);
try {
Document doc = Jsoup.connect(url).get();
List<Article> parsed = strategy.parse(url, doc);
// 3. 通过仓库存入(而不是直接操作List)
for (Article a : parsed) {
repository.add(a);
}
view.printSuccess("Crawled " + parsed.size() + " articles.");
} catch (IOException e) {
view.printError("Failed to crawl: " + e.getMessage());
}
}
}
```
**教师口播**:
> "注意这个CrawlCommand现在的职责:拿到URL → 交给工厂选策略 → 执行解析 → 交给仓库存储。**它自己在干什么?在调度!** 这就是上节课我们讲的Controller的'调度思维',现在向Command内部延伸了。"
---
### 4.4 Repository模式:武装数据访问(12分钟)
#### 4.4.1 问题重提
**教师口播**:
> "回到上节课结束时的那个问题:`List<Article>`在所有Command之间共享。任何一个Command都可以做这些事——"
```java
articles.clear(); // 清空所有文章
articles.add(null); // 塞入null
articles.remove(0); // 随意删除
```
> "如果一个新同事接手开发,他不知道'不要动这个List'的潜规则,写了一个`articles.clear()`,你的`list`命令就突然什么都不显示了。**靠代码约定维护的秩序,早晚会被打破。我们需要实体的'规则'——代码层面的约束。**"
#### 4.4.2 ArticleRepository的定义
```java
// src/main/java/com/crawler/repository/ArticleRepository.java
package com.crawler.repository;
import com.crawler.model.Article;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class ArticleRepository {
private final List<Article> articles = new ArrayList<>();
/**
* 添加一篇文章。注意:不接受null,这是代码层面的规则,不是口头约定。
*/
public void add(Article article) {
if (article == null) {
throw new IllegalArgumentException("Article cannot be null");
}
articles.add(article);
}
/**
* 获取所有文章的只读视图
* 调用者无法通过此返回值修改内部数据
*/
public List<Article> getAll() {
return Collections.unmodifiableList(articles);
}
/**
* 获取文章数量
*/
public int size() {
return articles.size();
}
/**
* 清空(仅管理员可调——下一篇:权限控制)
*/
public void clear() {
articles.clear();
}
}
```
**教师口播**:
> "三个关键设计点——"
>
> - **add()拒绝null**:规则写在代码里,不是写在邮件里
> - **getAll()返回不可修改的视图**:`Collections.unmodifiableList()`——调用者如果尝试add/remove,会**直接抛异常**,不是'悄悄的bug'
> - **ClearCommand要清空数据?调`repository.clear()`**,而不是直接操作List
>
> "这就是面向对象的第一课——封装。把数据藏起来,只暴露安全的方法。从'直接操作集合'到'通过仓库存取',是程序员成熟度的分水岭。"
#### 4.4.3 仓库引入后的架构变化
**Command接口的execute方法调整**:
```java
// 调整前(W9)
public interface Command {
String getName();
void execute(String[] args, List<Article> articles);
}
// 调整后(W10)
public interface Command {
String getName();
void execute(String[] args, ArticleRepository repository);
}
```
**教师口播**:
> "这个改动很小——把`List<Article>`换成`ArticleRepository`。但语义完全不同:之前是'给你数据随便玩',现在是'给你一个安全的存取通道'。"
**所有Command同步调整**:
```java
// ListCommand.java - 调整后
public class ListCommand implements Command {
private ConsoleView view;
public ListCommand(ConsoleView v) { this.view = v; }
public String getName() { return "list"; }
public void execute(String[] args, ArticleRepository repository) {
view.display(repository.getAll()); // 通过仓库获取数据
}
}
// ClearCommand.java(新增示例)
public class ClearCommand implements Command {
private ConsoleView view;
public ClearCommand(ConsoleView v) { this.view = v; }
public String getName() { return "clear"; }
public void execute(String[] args, ArticleRepository repository) {
repository.clear();
view.printSuccess("All articles cleared.");
}
}
```
**Controller和main的调整**:
```java
// App.java - 调整后
public class App {
public static void main(String[] args) {
ConsoleView view = new ConsoleView();
ArticleRepository repository = new ArticleRepository(); // 替代 List<Article>
StrategyFactory factory = new StrategyFactory(); // 新增
CrawlerController controller = new CrawlerController(view, repository, factory);
view.printSuccess("Welcome to CLI Crawler v2.0!");
view.printInfo("Type 'help' for commands.");
while (true) {
controller.handle(view.readLine());
}
}
}
```
---
### 4.5 整体架构串联(8分钟)
**教师口播**:
> "现在我们把所有部件串起来,看看一个`crawl https://blog.example.com`命令走过的完整路径。"
**时序图(口述配白板绘制)**:
```
用户输入 "crawl https://blog.example.com"
ConsoleView.readLine()
CrawlerController.handle("crawl https://blog.example.com")
│ Map查找 "crawl" → CrawlCommand
CrawlCommand.execute(args, repository)
├─► StrategyFactory.getStrategy(url)
│ │ 遍历List<CrawlStrategy>
│ │ BlogStrategy.supports(url) → true!
│ ▼
│ 返回 BlogStrategy
├─► Jsoup.connect(url).get() → Document
├─► BlogStrategy.parse(url, doc) → List<Article>
└─► for each article: repository.add(article)
ArticleRepository.articles.add(article)
最终:ConsoleView.printSuccess("Crawled N articles.")
```
**教师口播**:
> "七步调用,每一步职责清晰:View负责输入输出,Controller负责路由,Command负责调度,Factory负责匹配,Strategy负责解析,Repository负责存储。**没有哪个类干了两个人的活,也没有哪个类不知道自己的活是什么。**"
>
> "这就是工程化——不是把代码写得快,是把代码写得对。"
---
### 4.6 代码落地(20分钟)
**教师准备**:课前准备一份“W9升级到W10”的改动清单,现场演示关键改动。
**改动清单**:
1. 新建`strategy/`包,创建`CrawlStrategy`接口
2. 新建`strategy/BlogStrategy.java`
3. 新建`strategy/NewsStrategy.java`
4. 新建`strategy/StrategyFactory.java`
5. 新建`repository/`包,创建`ArticleRepository.java`
6. 修改`Command`接口的`execute`签名
7. 修改`CrawlCommand`,引入`StrategyFactory`和`ArticleRepository`
8. 修改其余所有`Command`实现类
9. 修改`CrawlerController`构造器
10. 修改`App.java`
**教师演示关键步骤**(重点演示):
- `ArticleRepository`的`Collections.unmodifiableList()`
- `StrategyFactory`的遍历匹配逻辑
- `CrawlCommand`重写后的调度结构
**刻意埋入的“找茬点”**:
> "我在`StrategyFactory.getStrategy()`里,如果没有匹配的策略就返回`null`。然后在`CrawlCommand`里检查null。这其实叫'null object pattern的前奏'——如果我不想让Command检查null,我应该怎么改工厂?大家带着这个问题用AI探究。"
---
### 4.7 架构反思与W11预告(5分钟)
**教师口播**:
> "现在我们的架构比W9强壮多了:解析逻辑可插拔,数据访问有守卫。但还有一些漏洞——"
**逐一点破**:
1. **异常处理**:`CrawlCommand`用了一个笼统的`catch (IOException e)`,如果解析过程中抛出其他异常怎么办?
2. **网络超时**:如果目标网站3秒没响应,当前代码会一直等吗?
3. **日志缺失**:所有的成功/失败信息只输出到终端,如果程序半夜跑,第二天想看昨晚抓了多少——看不了。
4. **重试机制**:如果一次失败就直接报错,要不要给个重试的机会?
**W11预告**:
> "下周,我们会做三件事:**自定义异常体系**、**工程化日志框架**、**防御式编程与重试机制**。W9搭骨架,W10装盔甲,W11要让这个系统**经得起现实的毒打**。"
---
### 4.8 实践任务(5分钟)
**任务要求**:
1. 从W9代码出发,完成W10升级
2. 实现至少两个`CrawlStrategy`(可以是模拟的,不要求真实爬取)
3. 实现`StrategyFactory`和`ArticleRepository`
4. 确保所有Command通过Repository访问数据
5. 运行并测试完整流程
**验收标准**:
- [x] 新增策略类只需新建文件+工厂注册一行,其余代码零改动
- [x] `ArticleRepository`的`getAll()`返回不可修改视图
- [x] `CrawlCommand`不包含任何网站特定的解析逻辑
- [x] `StrategyFactory`能根据URL自动匹配正确的策略
- [x] 所有Command的`execute`方法签名已更新为`ArticleRepository`
- [x] 无任何地方直接操作`List<Article>`
---
## 五、课后作业
### 5.1 必做任务
1. **完善ArticleRepository**:增加`addAll(List<Article>)`批量添加方法,注意防御null
2. **★ AnalyzeCommand(集大成作业)**:
- 实现`analyze <url>`命令
- 内部调用`StrategyFactory`匹配策略
- 调用策略解析文章后,**不存到Repository**,而是分析统计信息:
- 文章总数
- 标题平均长度
- 按某种规则排名的Top 5
- 结果只输出,不存储
- **提示**:这就是策略的复用——同一个解析策略,既能为`crawl`服务(存入仓库),也能为`analyze`服务(仅分析)
3. **AI架构审计**:将完整代码的类图(或类名与方法签名列表)发给AI,指令:
> "作为Java架构审计师,请检查:①策略模式的实现是否正确解耦(CrawlCommand是否仍然包含网站特定逻辑);②Repository是否真正封装了数据访问(是否存在绕过Repository直接操作List的地方);③工厂的匹配逻辑是否存在性能隐患。请给出具体的改进建议。"
### 5.2 选做任务
1. **正则策略匹配**:将`Supports()`的判断从`url.contains()`改为正则表达式,让一张策略可以匹配一类URL
2. **默认策略(DefaultStrategy)**:当没有策略匹配时,提供一个通用的“标题提取”逻辑
3. **策略优先级**:给每个策略加一个`priority`字段,工厂按优先级匹配(而不是按注册顺序)
4. **思考并回答(200字)**
> "策略模式中,策略的`supports()`方法有可能让两个策略都返回true,这时该选哪个?`StrategyFactory`的遍历顺序会如何影响结果?你有什么解决方案?"
### 5.3 思考题
1. **Repository与List的区别是什么?** 如果Repository只是包了一层List,为什么还要用?
2. **策略工厂的演进**:如果网站数量增加到100个,逐个注册的写法还合适吗?你想到什么解决方案?
3. **`Collections.unmodifiableList()`返回的是什么?** 它真的“不可修改”吗?如果原List被修改,这个不可修改视图会怎样?
---
## 六、AI协同升级
### 架构审计师任务(必做)
**学生执行步骤**:
1. 画出当前项目的类依赖图(手绘或工具生成)
2. 将类名和依赖关系发给AI
3. 输入指令:
> "作为Java架构审计师,请检查这个爬虫项目的架构。重点关注:①策略模式是否真正实现了开闭原则(增加新网站是否真的只需新增类);②Repository封装是否完整(是否有绕过Repository的路径);③是否存在循环依赖。请逐一指出问题并给出改进建议。"
**预期AI输出**:
- 指出是否还存在“改一处影响多处”的耦合
- 判断Repository的API设计是否完备
- 评价整体架构的开闭原则达成度
### 进阶AI探究(选做)
> "假设我有一个CrawlStrategy接口和10个实现类。不用工厂模式,直接用一个Map<String, CrawlStrategy>存起来,key是策略名称。这和StrategyFactory设计有什么本质区别?各自的优缺点是什么?"
---
## 七、教学反思与调整记录
| 日期 | 事项 | 调整内容 |
|------|------|----------|
| 2026-05-01 | 首次编写 | 基于W9骨架,引入策略模式+工厂+Repository |
| 2026-05-07 | 结构优化 | 调整策略模式与工厂的讲解顺序,先策略后工厂更自然 |
---
## 附录1:W9到W10改动对照表
| 改动项 | W9代码 | W10代码 |
|--------|--------|---------|
| 数据存储 | `List<Article> articles` | `ArticleRepository repository` |
| Command接口 | `execute(String[], List<Article>)` | `execute(String[], ArticleRepository)` |
| 解析逻辑位置 | `CrawlCommand`内部 | 各`CrawlStrategy`实现类 |
| URL匹配 | 无(硬编码) | `StrategyFactory.getStrategy(url)` |
| 数据添加 | `articles.add(article)` | `repository.add(article)` |
| 数据读取 | 直接遍历`articles` | `repository.getAll()` |
## 附录2:常见问题速查
| 问题 | 解答 |
|------|------|
| 策略模式和Command模式有什么区别? | Command封装“动作”(做什么事),Strategy封装“算法”(怎么做)。在爬虫中:crawl是命令(动作),如何解析是策略(算法)。 |
| 工厂一定要叫Factory吗? | 不必须。但叫Factory意味着“创建对象”的职责,符合模式命名的惯例。 |
| `Collections.unmodifiableList()`有什么用? | 返回一个只读视图,调用add/remove等方法会抛`UnsupportedOperationException`。 |
| Repository和DAO有什么区别? | 在我们的上下文中可以视为同义词。严谨地说,Repository是领域驱动设计的概念,更偏向“集合语义”;DAO更偏数据库操作。 |
| 策略的`supports()`返回true但解析失败怎么办? | 那是策略实现的bug,该策略应修复。Factory不负责验证策略的正确性。 |
## 附录3:教学逻辑说明
| 顺序 | 内容 | 设计理由 |
|------|------|----------|
| 1 | W9回顾+痛点暴露 | 承上启下,从已知问题引出新知识 |
| 2 | 策略模式 | 解决解析逻辑耦合问题,深化多态理解 |
| 3 | 解析器工厂 | 解决策略选择问题,引入工厂模式 |
| 4 | Repository模式 | 解决数据安全问题,实践封装原则 |
| 5 | 架构串联 | 将所有部件统一,形成完整心智模型 |
| 6 | 代码落地 | 实践验证,从“听懂”到“会做” |
| 7 | 架构反思+预告 | 暴露新问题,为W11健壮性工程铺垫 |
---
## 版本说明
- **v1(本版)**:基于W9教案模式首次编写,包含策略模式、工厂模式、Repository模式的完整引入

358
project/java1/logs/crawler.2026-05-19.log

@ -0,0 +1,358 @@
2026-05-19 10:48:15.013 [main] INFO com.example.datacollect.Main - Starting CLI Crawler application
2026-05-19 10:48:15.022 [main] INFO c.e.d.strategy.StrategyFactory - StrategyFactory initialized with 3 strategies
2026-05-19 10:48:15.025 [main] DEBUG c.e.d.controller.CrawlerController - Registered command: help
2026-05-19 10:48:15.027 [main] DEBUG c.e.d.controller.CrawlerController - Registered command: list
2026-05-19 10:48:15.027 [main] DEBUG c.e.d.controller.CrawlerController - Registered command: crawl
2026-05-19 10:48:15.028 [main] DEBUG c.e.d.controller.CrawlerController - Registered command: exit
2026-05-19 10:48:15.029 [main] INFO c.e.d.controller.CrawlerController - CrawlerController initialized with 4 commands
2026-05-19 10:48:15.031 [main] INFO c.e.datacollect.view.ConsoleView - Success: Welcome to CLI Crawler (w10_3)! Type help for commands.
2026-05-19 10:48:15.032 [main] INFO com.example.datacollect.Main - Application ready, waiting for input
2026-05-19 10:48:15.033 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-19 10:49:49.548 [main] WARN c.e.d.controller.CrawlerController - Unknown command: crawl<https://news.hnu.edu.cn>
2026-05-19 10:49:49.548 [main] ERROR c.e.datacollect.view.ConsoleView - Error: Unknown command: crawl<https://news.hnu.edu.cn>
2026-05-19 10:49:49.559 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-19 11:04:21.863 [main] INFO com.example.datacollect.Main - Starting CLI Crawler application
2026-05-19 11:04:21.878 [main] INFO c.e.d.strategy.StrategyFactory - StrategyFactory initialized with 6 strategies
2026-05-19 11:04:21.883 [main] DEBUG c.e.d.controller.CrawlerController - Registered command: help
2026-05-19 11:04:21.884 [main] DEBUG c.e.d.controller.CrawlerController - Registered command: list
2026-05-19 11:04:21.885 [main] DEBUG c.e.d.controller.CrawlerController - Registered command: crawl
2026-05-19 11:04:21.886 [main] DEBUG c.e.d.controller.CrawlerController - Registered command: exit
2026-05-19 11:04:21.886 [main] INFO c.e.d.controller.CrawlerController - CrawlerController initialized with 4 commands
2026-05-19 11:04:21.886 [main] INFO c.e.datacollect.view.ConsoleView - Success: Welcome to CLI Crawler (w10_3)! Type help for commands.
2026-05-19 11:04:21.887 [main] INFO com.example.datacollect.Main - Application ready, waiting for input
2026-05-19 11:04:21.887 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-19 11:05:05.592 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: crawl
2026-05-19 11:05:05.594 [main] INFO c.e.datacollect.command.CrawlCommand - Crawl started for: <https://s.weibo.com>
2026-05-19 11:05:05.596 [main] DEBUG c.e.d.strategy.HnuNewsStrategy - HnuNewsStrategy supports URL <https://s.weibo.com>: false
2026-05-19 11:05:05.596 [main] DEBUG c.e.d.strategy.BlogStrategy - BlogStrategy supports URL <https://s.weibo.com>: false
2026-05-19 11:05:05.597 [main] DEBUG c.e.d.strategy.NewsStrategy - NewsStrategy supports URL <https://s.weibo.com>: false
2026-05-19 11:05:05.603 [main] DEBUG c.e.d.strategy.WeiboHotStrategy - WeiboHotStrategy supports URL <https://s.weibo.com>: true
2026-05-19 11:05:05.604 [main] DEBUG c.e.d.strategy.StrategyFactory - Found strategy WeiboHotStrategy for URL: <https://s.weibo.com>
2026-05-19 11:05:05.606 [main] INFO c.e.datacollect.command.CrawlCommand - Starting crawl for URL: <https://s.weibo.com>
2026-05-19 11:05:05.621 [main] DEBUG c.e.datacollect.view.ConsoleView - Info: Crawling: <https://s.weibo.com>
2026-05-19 11:05:05.626 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 1 to fetch URL: <https://s.weibo.com>
2026-05-19 11:05:05.667 [main] WARN c.e.datacollect.command.CrawlCommand - Attempt 1 failed for URL <https://s.weibo.com>: The supplied URL, '<https://s.weibo.com>', is malformed. Make sure it is an absolute URL, and starts with 'http://' or 'https://'. See https://jsoup.org/cookbook/extracting-data/working-with-urls
2026-05-19 11:05:05.669 [main] INFO c.e.datacollect.command.CrawlCommand - Retrying in 1000ms...
2026-05-19 11:05:06.672 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 2 to fetch URL: <https://s.weibo.com>
2026-05-19 11:05:06.675 [main] WARN c.e.datacollect.command.CrawlCommand - Attempt 2 failed for URL <https://s.weibo.com>: The supplied URL, '<https://s.weibo.com>', is malformed. Make sure it is an absolute URL, and starts with 'http://' or 'https://'. See https://jsoup.org/cookbook/extracting-data/working-with-urls
2026-05-19 11:05:06.677 [main] INFO c.e.datacollect.command.CrawlCommand - Retrying in 1000ms...
2026-05-19 11:05:07.689 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 3 to fetch URL: <https://s.weibo.com>
2026-05-19 11:05:07.690 [main] WARN c.e.datacollect.command.CrawlCommand - Attempt 3 failed for URL <https://s.weibo.com>: The supplied URL, '<https://s.weibo.com>', is malformed. Make sure it is an absolute URL, and starts with 'http://' or 'https://'. See https://jsoup.org/cookbook/extracting-data/working-with-urls
2026-05-19 11:05:07.692 [main] ERROR c.e.datacollect.command.CrawlCommand - Failed to fetch URL after 3 attempts: <https://s.weibo.com>
2026-05-19 11:05:07.692 [main] ERROR c.e.datacollect.view.ConsoleView - Error: Failed to fetch URL after 3 attempts: <https://s.weibo.com>
2026-05-19 11:05:07.694 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-19 11:06:22.788 [main] WARN c.e.d.controller.CrawlerController - Unknown command: ceawl
2026-05-19 11:06:22.788 [main] ERROR c.e.datacollect.view.ConsoleView - Error: Unknown command: ceawl
2026-05-19 11:06:22.791 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-19 11:06:50.556 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: crawl
2026-05-19 11:06:50.557 [main] INFO c.e.datacollect.command.CrawlCommand - Crawl started for: <
2026-05-19 11:06:50.557 [main] DEBUG c.e.d.strategy.HnuNewsStrategy - HnuNewsStrategy supports URL <: false
2026-05-19 11:06:50.558 [main] DEBUG c.e.d.strategy.BlogStrategy - BlogStrategy supports URL <: false
2026-05-19 11:06:50.558 [main] DEBUG c.e.d.strategy.NewsStrategy - NewsStrategy supports URL <: false
2026-05-19 11:06:50.558 [main] DEBUG c.e.d.strategy.WeiboHotStrategy - WeiboHotStrategy supports URL <: false
2026-05-19 11:06:50.562 [main] DEBUG c.e.d.strategy.CctvNewsStrategy - CctvNewsStrategy supports URL <: false
2026-05-19 11:06:50.563 [main] DEBUG c.e.d.strategy.WeatherStrategy - WeatherStrategy supports URL <: false
2026-05-19 11:06:50.567 [main] WARN c.e.d.strategy.StrategyFactory - No strategy found for URL: <
2026-05-19 11:06:50.574 [main] WARN c.e.datacollect.command.CrawlCommand - No strategy found for: <
2026-05-19 11:06:50.576 [main] ERROR c.e.datacollect.view.ConsoleView - Error: No strategy found for: <
2026-05-19 11:06:50.580 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-19 11:07:24.657 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: crawl
2026-05-19 11:07:24.659 [main] INFO c.e.datacollect.command.CrawlCommand - Crawl started for: <https://tv.cctv.com>
2026-05-19 11:07:24.659 [main] DEBUG c.e.d.strategy.HnuNewsStrategy - HnuNewsStrategy supports URL <https://tv.cctv.com>: false
2026-05-19 11:07:24.659 [main] DEBUG c.e.d.strategy.BlogStrategy - BlogStrategy supports URL <https://tv.cctv.com>: false
2026-05-19 11:07:24.661 [main] DEBUG c.e.d.strategy.NewsStrategy - NewsStrategy supports URL <https://tv.cctv.com>: false
2026-05-19 11:07:24.663 [main] DEBUG c.e.d.strategy.WeiboHotStrategy - WeiboHotStrategy supports URL <https://tv.cctv.com>: false
2026-05-19 11:07:24.666 [main] DEBUG c.e.d.strategy.CctvNewsStrategy - CctvNewsStrategy supports URL <https://tv.cctv.com>: true
2026-05-19 11:07:24.667 [main] DEBUG c.e.d.strategy.StrategyFactory - Found strategy CctvNewsStrategy for URL: <https://tv.cctv.com>
2026-05-19 11:07:24.668 [main] INFO c.e.datacollect.command.CrawlCommand - Starting crawl for URL: <https://tv.cctv.com>
2026-05-19 11:07:24.669 [main] DEBUG c.e.datacollect.view.ConsoleView - Info: Crawling: <https://tv.cctv.com>
2026-05-19 11:07:24.671 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 1 to fetch URL: <https://tv.cctv.com>
2026-05-19 11:07:24.675 [main] WARN c.e.datacollect.command.CrawlCommand - Attempt 1 failed for URL <https://tv.cctv.com>: The supplied URL, '<https://tv.cctv.com>', is malformed. Make sure it is an absolute URL, and starts with 'http://' or 'https://'. See https://jsoup.org/cookbook/extracting-data/working-with-urls
2026-05-19 11:07:24.676 [main] INFO c.e.datacollect.command.CrawlCommand - Retrying in 1000ms...
2026-05-19 11:07:25.678 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 2 to fetch URL: <https://tv.cctv.com>
2026-05-19 11:07:25.681 [main] WARN c.e.datacollect.command.CrawlCommand - Attempt 2 failed for URL <https://tv.cctv.com>: The supplied URL, '<https://tv.cctv.com>', is malformed. Make sure it is an absolute URL, and starts with 'http://' or 'https://'. See https://jsoup.org/cookbook/extracting-data/working-with-urls
2026-05-19 11:07:25.682 [main] INFO c.e.datacollect.command.CrawlCommand - Retrying in 1000ms...
2026-05-19 11:07:26.696 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 3 to fetch URL: <https://tv.cctv.com>
2026-05-19 11:07:26.698 [main] WARN c.e.datacollect.command.CrawlCommand - Attempt 3 failed for URL <https://tv.cctv.com>: The supplied URL, '<https://tv.cctv.com>', is malformed. Make sure it is an absolute URL, and starts with 'http://' or 'https://'. See https://jsoup.org/cookbook/extracting-data/working-with-urls
2026-05-19 11:07:26.700 [main] ERROR c.e.datacollect.command.CrawlCommand - Failed to fetch URL after 3 attempts: <https://tv.cctv.com>
2026-05-19 11:07:26.701 [main] ERROR c.e.datacollect.view.ConsoleView - Error: Failed to fetch URL after 3 attempts: <https://tv.cctv.com>
2026-05-19 11:07:26.702 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-19 11:14:42.973 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: crawl
2026-05-19 11:14:42.975 [main] INFO c.e.datacollect.command.CrawlCommand - Crawl started for: <https://www.tianqi.com/changsha>
2026-05-19 11:14:42.975 [main] DEBUG c.e.d.strategy.HnuNewsStrategy - HnuNewsStrategy supports URL <https://www.tianqi.com/changsha>: false
2026-05-19 11:14:42.988 [main] DEBUG c.e.d.strategy.BlogStrategy - BlogStrategy supports URL <https://www.tianqi.com/changsha>: false
2026-05-19 11:14:43.005 [main] DEBUG c.e.d.strategy.NewsStrategy - NewsStrategy supports URL <https://www.tianqi.com/changsha>: false
2026-05-19 11:14:43.011 [main] DEBUG c.e.d.strategy.WeiboHotStrategy - WeiboHotStrategy supports URL <https://www.tianqi.com/changsha>: false
2026-05-19 11:14:43.015 [main] DEBUG c.e.d.strategy.CctvNewsStrategy - CctvNewsStrategy supports URL <https://www.tianqi.com/changsha>: false
2026-05-19 11:14:43.016 [main] DEBUG c.e.d.strategy.WeatherStrategy - WeatherStrategy supports URL <https://www.tianqi.com/changsha>: true
2026-05-19 11:14:43.017 [main] DEBUG c.e.d.strategy.StrategyFactory - Found strategy WeatherStrategy for URL: <https://www.tianqi.com/changsha>
2026-05-19 11:14:43.019 [main] INFO c.e.datacollect.command.CrawlCommand - Starting crawl for URL: <https://www.tianqi.com/changsha>
2026-05-19 11:14:43.030 [main] DEBUG c.e.datacollect.view.ConsoleView - Info: Crawling: <https://www.tianqi.com/changsha>
2026-05-19 11:14:43.038 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 1 to fetch URL: <https://www.tianqi.com/changsha>
2026-05-19 11:14:43.039 [main] WARN c.e.datacollect.command.CrawlCommand - Attempt 1 failed for URL <https://www.tianqi.com/changsha>: The supplied URL, '<https://www.tianqi.com/changsha>', is malformed. Make sure it is an absolute URL, and starts with 'http://' or 'https://'. See https://jsoup.org/cookbook/extracting-data/working-with-urls
2026-05-19 11:14:43.048 [main] INFO c.e.datacollect.command.CrawlCommand - Retrying in 1000ms...
2026-05-19 11:14:44.062 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 2 to fetch URL: <https://www.tianqi.com/changsha>
2026-05-19 11:14:44.066 [main] WARN c.e.datacollect.command.CrawlCommand - Attempt 2 failed for URL <https://www.tianqi.com/changsha>: The supplied URL, '<https://www.tianqi.com/changsha>', is malformed. Make sure it is an absolute URL, and starts with 'http://' or 'https://'. See https://jsoup.org/cookbook/extracting-data/working-with-urls
2026-05-19 11:14:44.067 [main] INFO c.e.datacollect.command.CrawlCommand - Retrying in 1000ms...
2026-05-19 11:14:45.078 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 3 to fetch URL: <https://www.tianqi.com/changsha>
2026-05-19 11:14:45.079 [main] WARN c.e.datacollect.command.CrawlCommand - Attempt 3 failed for URL <https://www.tianqi.com/changsha>: The supplied URL, '<https://www.tianqi.com/changsha>', is malformed. Make sure it is an absolute URL, and starts with 'http://' or 'https://'. See https://jsoup.org/cookbook/extracting-data/working-with-urls
2026-05-19 11:14:45.082 [main] ERROR c.e.datacollect.command.CrawlCommand - Failed to fetch URL after 3 attempts: <https://www.tianqi.com/changsha>
2026-05-19 11:14:45.083 [main] ERROR c.e.datacollect.view.ConsoleView - Error: Failed to fetch URL after 3 attempts: <https://www.tianqi.com/changsha>
2026-05-19 11:14:45.090 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-19 11:17:17.250 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: crawl
2026-05-19 11:17:17.251 [main] INFO c.e.datacollect.command.CrawlCommand - Crawl started for: https://www.tianqi.com/changsha
2026-05-19 11:17:17.263 [main] DEBUG c.e.d.strategy.HnuNewsStrategy - HnuNewsStrategy supports URL https://www.tianqi.com/changsha: false
2026-05-19 11:17:17.266 [main] DEBUG c.e.d.strategy.BlogStrategy - BlogStrategy supports URL https://www.tianqi.com/changsha: false
2026-05-19 11:17:17.267 [main] DEBUG c.e.d.strategy.NewsStrategy - NewsStrategy supports URL https://www.tianqi.com/changsha: false
2026-05-19 11:17:17.267 [main] DEBUG c.e.d.strategy.WeiboHotStrategy - WeiboHotStrategy supports URL https://www.tianqi.com/changsha: false
2026-05-19 11:17:17.267 [main] DEBUG c.e.d.strategy.CctvNewsStrategy - CctvNewsStrategy supports URL https://www.tianqi.com/changsha: false
2026-05-19 11:17:17.267 [main] DEBUG c.e.d.strategy.WeatherStrategy - WeatherStrategy supports URL https://www.tianqi.com/changsha: true
2026-05-19 11:17:17.267 [main] DEBUG c.e.d.strategy.StrategyFactory - Found strategy WeatherStrategy for URL: https://www.tianqi.com/changsha
2026-05-19 11:17:17.269 [main] INFO c.e.datacollect.command.CrawlCommand - Starting crawl for URL: https://www.tianqi.com/changsha
2026-05-19 11:17:17.269 [main] DEBUG c.e.datacollect.view.ConsoleView - Info: Crawling: https://www.tianqi.com/changsha
2026-05-19 11:17:17.269 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 1 to fetch URL: https://www.tianqi.com/changsha
2026-05-19 11:17:18.361 [main] DEBUG c.e.d.strategy.WeatherStrategy - Parsing weather page: https://www.tianqi.com/changsha
2026-05-19 11:17:18.388 [main] INFO c.e.d.strategy.WeatherStrategy - Parsed 1 weather items
2026-05-19 11:17:18.391 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 2026年05月19日 长沙天气
2026-05-19 11:17:18.395 [main] INFO c.e.datacollect.command.CrawlCommand - Successfully crawled 1 articles from https://www.tianqi.com/changsha
2026-05-19 11:17:18.395 [main] INFO c.e.datacollect.view.ConsoleView - Success: Crawled 1 articles.
2026-05-19 11:17:18.395 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-19 11:17:25.633 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: list
2026-05-19 11:17:25.634 [main] DEBUG c.e.datacollect.command.ListCommand - Listing articles
2026-05-19 11:17:25.635 [main] DEBUG c.e.d.repository.ArticleRepository - Returning 1 articles (unmodifiable)
2026-05-19 11:17:25.636 [main] DEBUG c.e.datacollect.view.ConsoleView - Displaying 1 articles
2026-05-19 11:17:25.660 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-19 11:18:00.938 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: crawl
2026-05-19 11:18:00.939 [main] INFO c.e.datacollect.command.CrawlCommand - Crawl started for: https://tv.cctv.com
2026-05-19 11:18:00.939 [main] DEBUG c.e.d.strategy.HnuNewsStrategy - HnuNewsStrategy supports URL https://tv.cctv.com: false
2026-05-19 11:18:00.942 [main] DEBUG c.e.d.strategy.BlogStrategy - BlogStrategy supports URL https://tv.cctv.com: false
2026-05-19 11:18:00.942 [main] DEBUG c.e.d.strategy.NewsStrategy - NewsStrategy supports URL https://tv.cctv.com: false
2026-05-19 11:18:00.948 [main] DEBUG c.e.d.strategy.WeiboHotStrategy - WeiboHotStrategy supports URL https://tv.cctv.com: false
2026-05-19 11:18:00.950 [main] DEBUG c.e.d.strategy.CctvNewsStrategy - CctvNewsStrategy supports URL https://tv.cctv.com: true
2026-05-19 11:18:00.950 [main] DEBUG c.e.d.strategy.StrategyFactory - Found strategy CctvNewsStrategy for URL: https://tv.cctv.com
2026-05-19 11:18:00.951 [main] INFO c.e.datacollect.command.CrawlCommand - Starting crawl for URL: https://tv.cctv.com
2026-05-19 11:18:00.951 [main] DEBUG c.e.datacollect.view.ConsoleView - Info: Crawling: https://tv.cctv.com
2026-05-19 11:18:00.952 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 1 to fetch URL: https://tv.cctv.com
2026-05-19 11:18:01.315 [main] DEBUG c.e.d.strategy.CctvNewsStrategy - Parsing CCTV news page: https://tv.cctv.com
2026-05-19 11:18:01.318 [main] DEBUG c.e.d.strategy.CctvNewsStrategy - Found 0 news items
2026-05-19 11:18:01.402 [main] INFO c.e.d.strategy.CctvNewsStrategy - Parsed 189 news from CCTV
2026-05-19 11:18:01.403 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 直 播
2026-05-19 11:18:01.403 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 节目单
2026-05-19 11:18:01.404 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 频道大全
2026-05-19 11:18:01.404 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 栏目大全
2026-05-19 11:18:01.404 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 主 持 人
2026-05-19 11:18:01.405 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 听音
2026-05-19 11:18:01.406 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 消费主张
2026-05-19 11:18:01.406 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 文化十分
2026-05-19 11:18:01.407 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 军事科技
2026-05-19 11:18:01.407 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 今日关注
2026-05-19 11:18:01.407 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 天下足球
2026-05-19 11:18:01.408 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 综合
2026-05-19 11:18:01.408 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 新闻
2026-05-19 11:18:01.409 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 财经
2026-05-19 11:18:01.409 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 综艺
2026-05-19 11:18:01.409 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 亚洲
2026-05-19 11:18:01.409 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 体育
2026-05-19 11:18:01.411 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 电影
2026-05-19 11:18:01.411 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 国防军事
2026-05-19 11:18:01.411 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 电视剧
2026-05-19 11:18:01.412 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 纪录
2026-05-19 11:18:01.412 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 科教
2026-05-19 11:18:01.412 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 戏曲
2026-05-19 11:18:01.413 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 社会与法
2026-05-19 11:18:01.413 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 少儿
2026-05-19 11:18:01.413 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 音乐
2026-05-19 11:18:01.414 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 奥林匹克
2026-05-19 11:18:01.420 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 农业农村
2026-05-19 11:18:01.437 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 欧洲
2026-05-19 11:18:01.437 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 美洲
2026-05-19 11:18:01.439 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 体育赛事
2026-05-19 11:18:01.439 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 微视频
2026-05-19 11:18:01.440 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 搜片库
2026-05-19 11:18:01.440 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 找栏目
2026-05-19 11:18:01.440 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 中国经济大讲堂
2026-05-19 11:18:01.441 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 健康中国
2026-05-19 11:18:01.441 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 听音
2026-05-19 11:18:01.441 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 科幻地带
2026-05-19 11:18:01.441 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 如何解读中美关系新定位?
2026-05-19 11:18:01.442 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 垃圾围村 曝多地违法倾倒乱象
2026-05-19 11:18:01.442 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 四川“大摆荡”坠亡事故调查
2026-05-19 11:18:01.443 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 孤寡老人留百万遗产 由谁继承
2026-05-19 11:18:01.443 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 带你破解肠道健康的隐秘真相
2026-05-19 11:18:01.444 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 坐着高铁看中国
2026-05-19 11:18:01.445 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: AI基建新潮涌
2026-05-19 11:18:01.451 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 一帘光影三代人
2026-05-19 11:18:01.451 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 如何把阳光“存进”大海里
2026-05-19 11:18:01.452 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 驾驭“蓝鲸” 潜航深海
2026-05-19 11:18:01.452 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 广告
2026-05-19 11:18:01.454 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 广告
2026-05-19 11:18:01.454 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 广告
2026-05-19 11:18:01.456 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 唐都生活指南(第二部)
2026-05-19 11:18:01.456 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 金石探文明
2026-05-19 11:18:01.457 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 赤壁之战
2026-05-19 11:18:01.457 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 先秦智慧
2026-05-19 11:18:01.458 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 国史通鉴·两晋南北朝篇
2026-05-19 11:18:01.458 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 《叶问》郑嘉颖乱世之中寻求武学真谛
2026-05-19 11:18:01.459 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 《曾少年》张一山关晓彤爱情事业两不误
2026-05-19 11:18:01.461 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 《炊事班的故事Ⅱ》密集承包你的笑点
2026-05-19 11:18:01.461 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 《火蓝刀锋》海军陆战队亮刀锋展军魂
2026-05-19 11:18:01.461 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 《问天》致敬中国航天数十年的峥嵘岁月
2026-05-19 11:18:01.461 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 直播
2026-05-19 11:18:01.461 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 节目单
2026-05-19 11:18:01.462 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 频道大全
2026-05-19 11:18:01.462 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 栏目大全
2026-05-19 11:18:01.462 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 综合
2026-05-19 11:18:01.463 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 财经
2026-05-19 11:18:01.463 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 综艺
2026-05-19 11:18:01.463 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 中文国际
2026-05-19 11:18:01.463 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 体育
2026-05-19 11:18:01.464 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 电影
2026-05-19 11:18:01.464 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 国防军事
2026-05-19 11:18:01.464 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 电视剧
2026-05-19 11:18:01.464 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 纪录
2026-05-19 11:18:01.464 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 科教
2026-05-19 11:18:01.464 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 戏曲
2026-05-19 11:18:01.466 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 社会与法
2026-05-19 11:18:01.466 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 新闻
2026-05-19 11:18:01.466 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 少儿
2026-05-19 11:18:01.467 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 音乐
2026-05-19 11:18:01.467 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 体育赛事
2026-05-19 11:18:01.467 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 农业农村
2026-05-19 11:18:01.467 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 新闻联播
2026-05-19 11:18:01.468 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 挑战不可能
2026-05-19 11:18:01.468 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 开讲啦
2026-05-19 11:18:01.470 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 人与自然
2026-05-19 11:18:01.471 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 生活提示
2026-05-19 11:18:01.471 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 中国诗词大会
2026-05-19 11:18:01.472 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 对话
2026-05-19 11:18:01.472 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 第一时间
2026-05-19 11:18:01.473 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 一槌定音
2026-05-19 11:18:01.474 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 回家吃饭
2026-05-19 11:18:01.474 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 生活家
2026-05-19 11:18:01.475 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 金牌喜剧班
2026-05-19 11:18:01.475 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 舞蹈世界
2026-05-19 11:18:01.475 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 星光大道
2026-05-19 11:18:01.476 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 向幸福出发
2026-05-19 11:18:01.476 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 回声嘹亮
2026-05-19 11:18:01.476 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 健康中国
2026-05-19 11:18:01.477 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 深度国际
2026-05-19 11:18:01.478 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 中国文艺
2026-05-19 11:18:01.478 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 国家记忆
2026-05-19 11:18:01.478 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 华人故事
2026-05-19 11:18:01.479 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 运动一起赢
2026-05-19 11:18:01.479 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 体坛快讯
2026-05-19 11:18:01.479 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 足球之夜
2026-05-19 11:18:01.479 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 今日影评
2026-05-19 11:18:01.479 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 时光军史馆
2026-05-19 11:18:01.481 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 军迷行天下
2026-05-19 11:18:01.482 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 军事报道
2026-05-19 11:18:01.483 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 军事纪实
2026-05-19 11:18:01.483 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 军事纪录
2026-05-19 11:18:01.484 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 国防故事
2026-05-19 11:18:01.484 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 剧说很好看
2026-05-19 11:18:01.484 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 寰宇视野
2026-05-19 11:18:01.485 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 特别呈现
2026-05-19 11:18:01.485 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 9视频
2026-05-19 11:18:01.485 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 百家讲坛
2026-05-19 11:18:01.486 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 健康之路
2026-05-19 11:18:01.486 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 科幻地带
2026-05-19 11:18:01.486 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 考古公开课
2026-05-19 11:18:01.486 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 解码科技史
2026-05-19 11:18:01.487 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 戏曲青年说
2026-05-19 11:18:01.487 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 中国京剧音配像精粹
2026-05-19 11:18:01.487 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 过把瘾
2026-05-19 11:18:01.488 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 宝贝亮相吧
2026-05-19 11:18:01.488 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 剧懂法
2026-05-19 11:18:01.489 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 一线
2026-05-19 11:18:01.489 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 普法剧场
2026-05-19 11:18:01.489 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 天网
2026-05-19 11:18:01.489 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 心理访谈
2026-05-19 11:18:01.490 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 律师来了
2026-05-19 11:18:01.490 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 焦点访谈
2026-05-19 11:18:01.490 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 每周质量报告
2026-05-19 11:18:01.491 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 午夜新闻
2026-05-19 11:18:01.491 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 新闻调查
2026-05-19 11:18:01.491 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 新闻周刊
2026-05-19 11:18:01.491 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 周末动画片
2026-05-19 11:18:01.491 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 音乐快递
2026-05-19 11:18:01.493 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 新闻袋袋裤
2026-05-19 11:18:01.493 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 智慧树
2026-05-19 11:18:01.494 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 快乐童行
2026-05-19 11:18:01.494 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 乐享汇
2026-05-19 11:18:01.494 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: CCTV音乐厅
2026-05-19 11:18:01.494 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 中国音乐电视
2026-05-19 11:18:01.494 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 聆听时刻
2026-05-19 11:18:01.496 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 童声唱
2026-05-19 11:18:01.496 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 冰球冰球
2026-05-19 11:18:01.496 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 田野欢歌
2026-05-19 11:18:01.497 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 三农长短说
2026-05-19 11:18:01.498 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 三农群英汇
2026-05-19 11:18:01.498 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 大地讲堂
2026-05-19 11:18:01.498 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 片库
2026-05-19 11:18:01.499 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 热榜
2026-05-19 11:18:01.499 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 看点
2026-05-19 11:18:01.499 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 微视频
2026-05-19 11:18:01.499 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: ai美食
2026-05-19 11:18:01.500 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 主持人
2026-05-19 11:18:01.500 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 全部
2026-05-19 11:18:01.500 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 直播
2026-05-19 11:18:01.500 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 节目单
2026-05-19 11:18:01.501 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 热榜
2026-05-19 11:18:01.502 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 主持人
2026-05-19 11:18:01.502 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 听音
2026-05-19 11:18:01.502 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 新闻联播
2026-05-19 11:18:01.502 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 焦点访谈
2026-05-19 11:18:01.503 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 等着我
2026-05-19 11:18:01.503 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 今日说法
2026-05-19 11:18:01.503 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 海峡两岸
2026-05-19 11:18:01.504 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 今日关注
2026-05-19 11:18:01.504 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 今日亚洲
2026-05-19 11:18:01.504 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 更多
2026-05-19 11:18:01.504 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 电视剧
2026-05-19 11:18:01.504 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 动画片
2026-05-19 11:18:01.506 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 纪录片
2026-05-19 11:18:01.506 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 特别节目
2026-05-19 11:18:01.506 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 更多
2026-05-19 11:18:01.506 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 微视频
2026-05-19 11:18:01.507 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 视频百科
2026-05-19 11:18:01.508 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 微故事
2026-05-19 11:18:01.510 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: ai美食
2026-05-19 11:18:01.510 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 今日热门
2026-05-19 11:18:01.512 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 直播导视
2026-05-19 11:18:01.512 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 精品
2026-05-19 11:18:01.514 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 片库
2026-05-19 11:18:01.515 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 看点
2026-05-19 11:18:01.516 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 央视大全
2026-05-19 11:18:01.518 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 手机访问 扫描下载央 视影音客户端 扫一扫 手机继续看
2026-05-19 11:18:01.521 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 返回顶部
2026-05-19 11:18:01.521 [main] INFO c.e.datacollect.command.CrawlCommand - Successfully crawled 189 articles from https://tv.cctv.com
2026-05-19 11:18:01.521 [main] INFO c.e.datacollect.view.ConsoleView - Success: Crawled 189 articles.
2026-05-19 11:18:01.522 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-19 11:18:12.244 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: list
2026-05-19 11:18:12.244 [main] DEBUG c.e.datacollect.command.ListCommand - Listing articles
2026-05-19 11:18:12.245 [main] DEBUG c.e.d.repository.ArticleRepository - Returning 190 articles (unmodifiable)
2026-05-19 11:18:12.246 [main] DEBUG c.e.datacollect.view.ConsoleView - Displaying 190 articles
2026-05-19 11:18:12.317 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-19 11:18:49.649 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: crawl
2026-05-19 11:18:49.650 [main] INFO c.e.datacollect.command.CrawlCommand - Crawl started for: https://www.tianqi.com/changsha
2026-05-19 11:18:49.651 [main] DEBUG c.e.d.strategy.HnuNewsStrategy - HnuNewsStrategy supports URL https://www.tianqi.com/changsha: false
2026-05-19 11:18:49.651 [main] DEBUG c.e.d.strategy.BlogStrategy - BlogStrategy supports URL https://www.tianqi.com/changsha: false
2026-05-19 11:18:49.651 [main] DEBUG c.e.d.strategy.NewsStrategy - NewsStrategy supports URL https://www.tianqi.com/changsha: false
2026-05-19 11:18:49.652 [main] DEBUG c.e.d.strategy.WeiboHotStrategy - WeiboHotStrategy supports URL https://www.tianqi.com/changsha: false
2026-05-19 11:18:49.652 [main] DEBUG c.e.d.strategy.CctvNewsStrategy - CctvNewsStrategy supports URL https://www.tianqi.com/changsha: false
2026-05-19 11:18:49.652 [main] DEBUG c.e.d.strategy.WeatherStrategy - WeatherStrategy supports URL https://www.tianqi.com/changsha: true
2026-05-19 11:18:49.663 [main] DEBUG c.e.d.strategy.StrategyFactory - Found strategy WeatherStrategy for URL: https://www.tianqi.com/changsha
2026-05-19 11:18:49.666 [main] INFO c.e.datacollect.command.CrawlCommand - Starting crawl for URL: https://www.tianqi.com/changsha
2026-05-19 11:18:49.668 [main] DEBUG c.e.datacollect.view.ConsoleView - Info: Crawling: https://www.tianqi.com/changsha
2026-05-19 11:18:49.669 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 1 to fetch URL: https://www.tianqi.com/changsha
2026-05-19 11:18:49.912 [main] DEBUG c.e.d.strategy.WeatherStrategy - Parsing weather page: https://www.tianqi.com/changsha
2026-05-19 11:18:49.921 [main] INFO c.e.d.strategy.WeatherStrategy - Parsed 1 weather items
2026-05-19 11:18:49.923 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 2026年05月19日 长沙天气
2026-05-19 11:18:49.941 [main] INFO c.e.datacollect.command.CrawlCommand - Successfully crawled 1 articles from https://www.tianqi.com/changsha
2026-05-19 11:18:49.945 [main] INFO c.e.datacollect.view.ConsoleView - Success: Crawled 1 articles.
2026-05-19 11:18:49.948 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-19 11:18:54.406 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: list
2026-05-19 11:18:54.406 [main] DEBUG c.e.datacollect.command.ListCommand - Listing articles
2026-05-19 11:18:54.407 [main] DEBUG c.e.d.repository.ArticleRepository - Returning 191 articles (unmodifiable)
2026-05-19 11:18:54.407 [main] DEBUG c.e.datacollect.view.ConsoleView - Displaying 191 articles
2026-05-19 11:18:54.473 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console

10
project/java1/logs/crawler.2026-05-29.log

@ -0,0 +1,10 @@
2026-05-29 21:12:33.490 [main] INFO com.example.datacollect.Main - Starting CLI Crawler application
2026-05-29 21:12:33.502 [main] INFO c.e.d.strategy.StrategyFactory - StrategyFactory initialized with 6 strategies
2026-05-29 21:12:33.504 [main] DEBUG c.e.d.controller.CrawlerController - Registered command: help
2026-05-29 21:12:33.504 [main] DEBUG c.e.d.controller.CrawlerController - Registered command: list
2026-05-29 21:12:33.505 [main] DEBUG c.e.d.controller.CrawlerController - Registered command: crawl
2026-05-29 21:12:33.505 [main] DEBUG c.e.d.controller.CrawlerController - Registered command: exit
2026-05-29 21:12:33.505 [main] INFO c.e.d.controller.CrawlerController - CrawlerController initialized with 4 commands
2026-05-29 21:12:33.508 [main] INFO c.e.datacollect.view.ConsoleView - Success: Welcome to CLI Crawler (w10_3)! Type help for commands.
2026-05-29 21:12:33.508 [main] INFO com.example.datacollect.Main - Application ready, waiting for input
2026-05-29 21:12:33.509 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console

295
project/java1/logs/crawler.log

@ -0,0 +1,295 @@
2026-05-30 17:20:04.108 [main] INFO com.example.datacollect.Main - Starting CLI Crawler application
2026-05-30 17:20:04.119 [main] INFO c.e.d.strategy.StrategyFactory - StrategyFactory initialized with 6 strategies
2026-05-30 17:20:04.120 [main] DEBUG c.e.d.controller.CrawlerController - Registered command: help
2026-05-30 17:20:04.121 [main] DEBUG c.e.d.controller.CrawlerController - Registered command: list
2026-05-30 17:20:04.121 [main] DEBUG c.e.d.controller.CrawlerController - Registered command: crawl
2026-05-30 17:20:04.121 [main] DEBUG c.e.d.controller.CrawlerController - Registered command: exit
2026-05-30 17:20:04.122 [main] INFO c.e.d.controller.CrawlerController - CrawlerController initialized with 4 commands
2026-05-30 17:20:04.126 [main] INFO c.e.datacollect.view.ConsoleView - Success: Welcome to CLI Crawler (w10_3)! Type help for commands.
2026-05-30 17:20:04.126 [main] INFO com.example.datacollect.Main - Application ready, waiting for input
2026-05-30 17:20:04.126 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-30 17:20:20.147 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: help
2026-05-30 17:20:20.147 [main] DEBUG c.e.datacollect.command.HelpCommand - Displaying help
2026-05-30 17:20:20.147 [main] DEBUG c.e.datacollect.view.ConsoleView - Info: Commands: crawl <url>, list, help, exit
2026-05-30 17:20:20.148 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-30 17:21:31.732 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: crawl
2026-05-30 17:21:31.732 [main] INFO c.e.datacollect.command.CrawlCommand - Crawl started for: https://www.weibo.com
2026-05-30 17:21:31.732 [main] DEBUG c.e.d.strategy.HnuNewsStrategy - HnuNewsStrategy supports URL https://www.weibo.com: false
2026-05-30 17:21:31.732 [main] DEBUG c.e.d.strategy.BlogStrategy - BlogStrategy supports URL https://www.weibo.com: false
2026-05-30 17:21:31.732 [main] DEBUG c.e.d.strategy.NewsStrategy - NewsStrategy supports URL https://www.weibo.com: false
2026-05-30 17:21:31.732 [main] DEBUG c.e.d.strategy.WeiboHotStrategy - WeiboHotStrategy supports URL https://www.weibo.com: true
2026-05-30 17:21:31.732 [main] DEBUG c.e.d.strategy.StrategyFactory - Found strategy WeiboHotStrategy for URL: https://www.weibo.com
2026-05-30 17:21:31.732 [main] INFO c.e.datacollect.command.CrawlCommand - Starting crawl for URL: https://www.weibo.com
2026-05-30 17:21:31.733 [main] DEBUG c.e.datacollect.view.ConsoleView - Info: Crawling: https://www.weibo.com
2026-05-30 17:21:31.733 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 1 to fetch URL: https://www.weibo.com
2026-05-30 17:21:32.506 [main] DEBUG c.e.d.strategy.WeiboHotStrategy - Parsing Weibo hot page: https://www.weibo.com
2026-05-30 17:21:32.506 [main] DEBUG c.e.d.strategy.WeiboHotStrategy - Found 0 hot items
2026-05-30 17:21:32.506 [main] INFO c.e.d.strategy.WeiboHotStrategy - Parsed 0 hot topics from Weibo
2026-05-30 17:21:32.507 [main] INFO c.e.datacollect.command.CrawlCommand - Successfully crawled 0 articles from https://www.weibo.com
2026-05-30 17:21:32.507 [main] INFO c.e.datacollect.view.ConsoleView - Success: Crawled 0 articles.
2026-05-30 17:21:32.507 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-30 17:21:37.322 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: list
2026-05-30 17:21:37.322 [main] DEBUG c.e.datacollect.command.ListCommand - Listing articles
2026-05-30 17:21:37.323 [main] DEBUG c.e.d.repository.ArticleRepository - Returning 0 articles (unmodifiable)
2026-05-30 17:21:37.323 [main] DEBUG c.e.datacollect.view.ConsoleView - Displaying 0 articles
2026-05-30 17:21:37.323 [main] DEBUG c.e.datacollect.view.ConsoleView - Info: 暂无文章,请先执行 crawl。
2026-05-30 17:21:37.323 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-30 17:22:20.007 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: crawl
2026-05-30 17:22:20.007 [main] INFO c.e.datacollect.command.CrawlCommand - Crawl started for: https://www.tv.cctv.com
2026-05-30 17:22:20.007 [main] DEBUG c.e.d.strategy.HnuNewsStrategy - HnuNewsStrategy supports URL https://www.tv.cctv.com: false
2026-05-30 17:22:20.007 [main] DEBUG c.e.d.strategy.BlogStrategy - BlogStrategy supports URL https://www.tv.cctv.com: false
2026-05-30 17:22:20.007 [main] DEBUG c.e.d.strategy.NewsStrategy - NewsStrategy supports URL https://www.tv.cctv.com: false
2026-05-30 17:22:20.008 [main] DEBUG c.e.d.strategy.WeiboHotStrategy - WeiboHotStrategy supports URL https://www.tv.cctv.com: false
2026-05-30 17:22:20.008 [main] DEBUG c.e.d.strategy.CctvNewsStrategy - CctvNewsStrategy supports URL https://www.tv.cctv.com: true
2026-05-30 17:22:20.008 [main] DEBUG c.e.d.strategy.StrategyFactory - Found strategy CctvNewsStrategy for URL: https://www.tv.cctv.com
2026-05-30 17:22:20.008 [main] INFO c.e.datacollect.command.CrawlCommand - Starting crawl for URL: https://www.tv.cctv.com
2026-05-30 17:22:20.008 [main] DEBUG c.e.datacollect.view.ConsoleView - Info: Crawling: https://www.tv.cctv.com
2026-05-30 17:22:20.008 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 1 to fetch URL: https://www.tv.cctv.com
2026-05-30 17:22:20.034 [main] WARN c.e.datacollect.command.CrawlCommand - Attempt 1 failed for URL https://www.tv.cctv.com: www.tv.cctv.com
2026-05-30 17:22:20.035 [main] INFO c.e.datacollect.command.CrawlCommand - Retrying in 1000ms...
2026-05-30 17:22:21.041 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 2 to fetch URL: https://www.tv.cctv.com
2026-05-30 17:22:21.043 [main] WARN c.e.datacollect.command.CrawlCommand - Attempt 2 failed for URL https://www.tv.cctv.com: www.tv.cctv.com
2026-05-30 17:22:21.043 [main] INFO c.e.datacollect.command.CrawlCommand - Retrying in 1000ms...
2026-05-30 17:22:22.051 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 3 to fetch URL: https://www.tv.cctv.com
2026-05-30 17:22:22.052 [main] WARN c.e.datacollect.command.CrawlCommand - Attempt 3 failed for URL https://www.tv.cctv.com: www.tv.cctv.com
2026-05-30 17:22:22.052 [main] ERROR c.e.datacollect.command.CrawlCommand - Failed to fetch URL after 3 attempts: https://www.tv.cctv.com
2026-05-30 17:22:22.052 [main] ERROR c.e.datacollect.view.ConsoleView - Error: Failed to fetch URL after 3 attempts: https://www.tv.cctv.com
2026-05-30 17:22:22.052 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-30 17:23:04.437 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: crawl
2026-05-30 17:23:04.437 [main] INFO c.e.datacollect.command.CrawlCommand - Crawl started for: https://news.cctv.com
2026-05-30 17:23:04.437 [main] DEBUG c.e.d.strategy.HnuNewsStrategy - HnuNewsStrategy supports URL https://news.cctv.com: false
2026-05-30 17:23:04.437 [main] DEBUG c.e.d.strategy.BlogStrategy - BlogStrategy supports URL https://news.cctv.com: false
2026-05-30 17:23:04.437 [main] DEBUG c.e.d.strategy.NewsStrategy - NewsStrategy supports URL https://news.cctv.com: false
2026-05-30 17:23:04.437 [main] DEBUG c.e.d.strategy.WeiboHotStrategy - WeiboHotStrategy supports URL https://news.cctv.com: false
2026-05-30 17:23:04.438 [main] DEBUG c.e.d.strategy.CctvNewsStrategy - CctvNewsStrategy supports URL https://news.cctv.com: false
2026-05-30 17:23:04.438 [main] DEBUG c.e.d.strategy.WeatherStrategy - WeatherStrategy supports URL https://news.cctv.com: false
2026-05-30 17:23:04.438 [main] WARN c.e.d.strategy.StrategyFactory - No strategy found for URL: https://news.cctv.com
2026-05-30 17:23:04.438 [main] WARN c.e.datacollect.command.CrawlCommand - No strategy found for: https://news.cctv.com
2026-05-30 17:23:04.438 [main] ERROR c.e.datacollect.view.ConsoleView - Error: No strategy found for: https://news.cctv.com
2026-05-30 17:23:04.438 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-30 17:26:09.460 [main] WARN c.e.d.controller.CrawlerController - Unknown command: ceawl
2026-05-30 17:26:09.460 [main] ERROR c.e.datacollect.view.ConsoleView - Error: Unknown command: ceawl
2026-05-30 17:26:09.462 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-30 17:27:17.037 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: crawl
2026-05-30 17:27:17.037 [main] INFO c.e.datacollect.command.CrawlCommand - Crawl started for: https://www.news.cctv.com
2026-05-30 17:27:17.037 [main] DEBUG c.e.d.strategy.HnuNewsStrategy - HnuNewsStrategy supports URL https://www.news.cctv.com: false
2026-05-30 17:27:17.037 [main] DEBUG c.e.d.strategy.BlogStrategy - BlogStrategy supports URL https://www.news.cctv.com: false
2026-05-30 17:27:17.037 [main] DEBUG c.e.d.strategy.NewsStrategy - NewsStrategy supports URL https://www.news.cctv.com: false
2026-05-30 17:27:17.037 [main] DEBUG c.e.d.strategy.WeiboHotStrategy - WeiboHotStrategy supports URL https://www.news.cctv.com: false
2026-05-30 17:27:17.037 [main] DEBUG c.e.d.strategy.CctvNewsStrategy - CctvNewsStrategy supports URL https://www.news.cctv.com: false
2026-05-30 17:27:17.037 [main] DEBUG c.e.d.strategy.WeatherStrategy - WeatherStrategy supports URL https://www.news.cctv.com: false
2026-05-30 17:27:17.037 [main] WARN c.e.d.strategy.StrategyFactory - No strategy found for URL: https://www.news.cctv.com
2026-05-30 17:27:17.037 [main] WARN c.e.datacollect.command.CrawlCommand - No strategy found for: https://www.news.cctv.com
2026-05-30 17:27:17.037 [main] ERROR c.e.datacollect.view.ConsoleView - Error: No strategy found for: https://www.news.cctv.com
2026-05-30 17:27:17.038 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-30 17:37:12.487 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: crawl
2026-05-30 17:37:12.487 [main] INFO c.e.datacollect.command.CrawlCommand - Crawl started for: https://tv.cctv.com
2026-05-30 17:37:12.487 [main] DEBUG c.e.d.strategy.HnuNewsStrategy - HnuNewsStrategy supports URL https://tv.cctv.com: false
2026-05-30 17:37:12.487 [main] DEBUG c.e.d.strategy.BlogStrategy - BlogStrategy supports URL https://tv.cctv.com: false
2026-05-30 17:37:12.487 [main] DEBUG c.e.d.strategy.NewsStrategy - NewsStrategy supports URL https://tv.cctv.com: false
2026-05-30 17:37:12.488 [main] DEBUG c.e.d.strategy.WeiboHotStrategy - WeiboHotStrategy supports URL https://tv.cctv.com: false
2026-05-30 17:37:12.488 [main] DEBUG c.e.d.strategy.CctvNewsStrategy - CctvNewsStrategy supports URL https://tv.cctv.com: true
2026-05-30 17:37:12.488 [main] DEBUG c.e.d.strategy.StrategyFactory - Found strategy CctvNewsStrategy for URL: https://tv.cctv.com
2026-05-30 17:37:12.488 [main] INFO c.e.datacollect.command.CrawlCommand - Starting crawl for URL: https://tv.cctv.com
2026-05-30 17:37:12.488 [main] DEBUG c.e.datacollect.view.ConsoleView - Info: Crawling: https://tv.cctv.com
2026-05-30 17:37:12.488 [main] DEBUG c.e.datacollect.command.CrawlCommand - Attempt 1 to fetch URL: https://tv.cctv.com
2026-05-30 17:37:13.701 [main] DEBUG c.e.d.strategy.CctvNewsStrategy - Parsing CCTV news page: https://tv.cctv.com
2026-05-30 17:37:13.707 [main] DEBUG c.e.d.strategy.CctvNewsStrategy - Found 0 news items
2026-05-30 17:37:13.748 [main] INFO c.e.d.strategy.CctvNewsStrategy - Parsed 189 news from CCTV
2026-05-30 17:37:13.748 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 直 播
2026-05-30 17:37:13.750 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 节目单
2026-05-30 17:37:13.750 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 频道大全
2026-05-30 17:37:13.750 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 栏目大全
2026-05-30 17:37:13.750 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 主 持 人
2026-05-30 17:37:13.750 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 听音
2026-05-30 17:37:13.750 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 焦点访谈
2026-05-30 17:37:13.750 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 朝闻天下
2026-05-30 17:37:13.751 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 星推荐
2026-05-30 17:37:13.751 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 天下财经
2026-05-30 17:37:13.751 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 高端访谈
2026-05-30 17:37:13.751 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 综合
2026-05-30 17:37:13.751 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 新闻
2026-05-30 17:37:13.751 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 财经
2026-05-30 17:37:13.751 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 综艺
2026-05-30 17:37:13.751 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 亚洲
2026-05-30 17:37:13.751 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 体育
2026-05-30 17:37:13.751 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 电影
2026-05-30 17:37:13.751 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 国防军事
2026-05-30 17:37:13.751 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 电视剧
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 纪录
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 科教
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 戏曲
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 社会与法
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 少儿
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 音乐
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 奥林匹克
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 农业农村
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 欧洲
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 美洲
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 体育赛事
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 微视频
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 搜片库
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 找栏目
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 每周质量报告
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 开讲啦
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 听音
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 一槌定音
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 中国创新药叩开欧美大门
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 一旦点击 木马将自动入侵!
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 未成年人打赏47万 钱能退吗?
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 深度透视 伊朗导弹战力如何?
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 黄文秀:最美芳华绽放扶贫路
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 太空“换班”完成!准备回家
2026-05-30 17:37:13.753 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 第二艘国产大邮轮完成海上试航
2026-05-30 17:37:13.758 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 五角大楼和SpaceX吵起来了
2026-05-30 17:37:13.759 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 揭开“西北第一枪厂”神秘面纱
2026-05-30 17:37:13.760 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 华为新路线:半导体新赛道
2026-05-30 17:37:13.760 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 广告
2026-05-30 17:37:13.760 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 广告
2026-05-30 17:37:13.760 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 广告
2026-05-30 17:37:13.760 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 帛书传奇
2026-05-30 17:37:13.760 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 千年包公
2026-05-30 17:37:13.760 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 清宫秘档
2026-05-30 17:37:13.760 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 大唐继位风云
2026-05-30 17:37:13.761 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 花开中国(第二季)
2026-05-30 17:37:13.761 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 《太行谣》再现“太行奶娘”的感人故事
2026-05-30 17:37:13.761 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 《突击再突击》演绎山地战士的热血梦
2026-05-30 17:37:13.761 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 《医者仁心》守卫生命 感恩“医”路有你
2026-05-30 17:37:13.761 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 《大宅门》推开这扇门 看到家、国、天下
2026-05-30 17:37:13.761 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 《湄公河大案》大案实录缉毒版“无间道”
2026-05-30 17:37:13.761 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 直播
2026-05-30 17:37:13.761 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 节目单
2026-05-30 17:37:13.761 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 频道大全
2026-05-30 17:37:13.761 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 栏目大全
2026-05-30 17:37:13.762 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 综合
2026-05-30 17:37:13.762 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 财经
2026-05-30 17:37:13.762 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 综艺
2026-05-30 17:37:13.762 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 中文国际
2026-05-30 17:37:13.762 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 体育
2026-05-30 17:37:13.762 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 电影
2026-05-30 17:37:13.762 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 国防军事
2026-05-30 17:37:13.762 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 电视剧
2026-05-30 17:37:13.762 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 纪录
2026-05-30 17:37:13.764 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 科教
2026-05-30 17:37:13.764 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 戏曲
2026-05-30 17:37:13.764 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 社会与法
2026-05-30 17:37:13.764 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 新闻
2026-05-30 17:37:13.764 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 少儿
2026-05-30 17:37:13.764 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 音乐
2026-05-30 17:37:13.764 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 体育赛事
2026-05-30 17:37:13.764 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 农业农村
2026-05-30 17:37:13.765 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 新闻联播
2026-05-30 17:37:13.765 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 挑战不可能
2026-05-30 17:37:13.765 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 开讲啦
2026-05-30 17:37:13.765 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 人与自然
2026-05-30 17:37:13.765 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 生活提示
2026-05-30 17:37:13.765 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 中国诗词大会
2026-05-30 17:37:13.765 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 对话
2026-05-30 17:37:13.765 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 第一时间
2026-05-30 17:37:13.765 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 一槌定音
2026-05-30 17:37:13.766 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 回家吃饭
2026-05-30 17:37:13.766 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 生活家
2026-05-30 17:37:13.766 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 金牌喜剧班
2026-05-30 17:37:13.766 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 舞蹈世界
2026-05-30 17:37:13.766 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 星光大道
2026-05-30 17:37:13.766 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 向幸福出发
2026-05-30 17:37:13.766 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 回声嘹亮
2026-05-30 17:37:13.766 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 健康中国
2026-05-30 17:37:13.766 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 深度国际
2026-05-30 17:37:13.766 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 中国文艺
2026-05-30 17:37:13.766 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 国家记忆
2026-05-30 17:37:13.766 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 华人故事
2026-05-30 17:37:13.767 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 运动一起赢
2026-05-30 17:37:13.767 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 体坛快讯
2026-05-30 17:37:13.767 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 足球之夜
2026-05-30 17:37:13.767 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 今日影评
2026-05-30 17:37:13.767 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 时光军史馆
2026-05-30 17:37:13.768 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 军迷行天下
2026-05-30 17:37:13.768 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 军事报道
2026-05-30 17:37:13.768 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 军事纪实
2026-05-30 17:37:13.768 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 军事纪录
2026-05-30 17:37:13.768 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 国防故事
2026-05-30 17:37:13.768 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 剧说很好看
2026-05-30 17:37:13.768 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 寰宇视野
2026-05-30 17:37:13.768 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 特别呈现
2026-05-30 17:37:13.769 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 9视频
2026-05-30 17:37:13.769 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 百家讲坛
2026-05-30 17:37:13.769 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 健康之路
2026-05-30 17:37:13.769 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 科幻地带
2026-05-30 17:37:13.769 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 考古公开课
2026-05-30 17:37:13.769 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 解码科技史
2026-05-30 17:37:13.769 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 戏曲青年说
2026-05-30 17:37:13.769 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 中国京剧音配像精粹
2026-05-30 17:37:13.769 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 过把瘾
2026-05-30 17:37:13.769 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 宝贝亮相吧
2026-05-30 17:37:13.770 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 剧懂法
2026-05-30 17:37:13.770 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 一线
2026-05-30 17:37:13.770 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 普法剧场
2026-05-30 17:37:13.770 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 天网
2026-05-30 17:37:13.771 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 心理访谈
2026-05-30 17:37:13.771 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 律师来了
2026-05-30 17:37:13.771 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 焦点访谈
2026-05-30 17:37:13.771 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 每周质量报告
2026-05-30 17:37:13.771 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 午夜新闻
2026-05-30 17:37:13.771 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 新闻调查
2026-05-30 17:37:13.771 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 新闻周刊
2026-05-30 17:37:13.772 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 周末动画片
2026-05-30 17:37:13.772 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 音乐快递
2026-05-30 17:37:13.772 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 新闻袋袋裤
2026-05-30 17:37:13.772 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 智慧树
2026-05-30 17:37:13.772 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 快乐童行
2026-05-30 17:37:13.772 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 乐享汇
2026-05-30 17:37:13.772 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: CCTV音乐厅
2026-05-30 17:37:13.773 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 中国音乐电视
2026-05-30 17:37:13.773 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 聆听时刻
2026-05-30 17:37:13.773 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 童声唱
2026-05-30 17:37:13.773 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 冰球冰球
2026-05-30 17:37:13.773 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 田野欢歌
2026-05-30 17:37:13.773 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 三农长短说
2026-05-30 17:37:13.773 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 三农群英汇
2026-05-30 17:37:13.774 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 大地讲堂
2026-05-30 17:37:13.774 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 片库
2026-05-30 17:37:13.774 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 热榜
2026-05-30 17:37:13.774 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 看点
2026-05-30 17:37:13.774 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 微视频
2026-05-30 17:37:13.774 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: ai美食
2026-05-30 17:37:13.774 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 主持人
2026-05-30 17:37:13.774 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 全部
2026-05-30 17:37:13.774 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 直播
2026-05-30 17:37:13.775 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 节目单
2026-05-30 17:37:13.775 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 热播榜
2026-05-30 17:37:13.775 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 主持人
2026-05-30 17:37:13.775 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 听音
2026-05-30 17:37:13.775 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 新闻联播
2026-05-30 17:37:13.775 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 焦点访谈
2026-05-30 17:37:13.775 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 等着我
2026-05-30 17:37:13.775 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 今日说法
2026-05-30 17:37:13.775 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 海峡两岸
2026-05-30 17:37:13.776 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 今日关注
2026-05-30 17:37:13.776 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 今日亚洲
2026-05-30 17:37:13.776 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 更多
2026-05-30 17:37:13.776 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 电视剧
2026-05-30 17:37:13.777 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 动画片
2026-05-30 17:37:13.777 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 纪录片
2026-05-30 17:37:13.777 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 特别节目
2026-05-30 17:37:13.777 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 更多
2026-05-30 17:37:13.777 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 微视频
2026-05-30 17:37:13.778 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 视频百科
2026-05-30 17:37:13.778 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 微故事
2026-05-30 17:37:13.778 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: ai美食
2026-05-30 17:37:13.778 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 今日热门
2026-05-30 17:37:13.778 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 直播导视
2026-05-30 17:37:13.778 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 精品
2026-05-30 17:37:13.778 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 片库
2026-05-30 17:37:13.778 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 看点
2026-05-30 17:37:13.779 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 央视大全
2026-05-30 17:37:13.779 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 手机访问 扫描下载央 视影音客户端 扫一扫 手机继续看
2026-05-30 17:37:13.779 [main] DEBUG c.e.d.repository.ArticleRepository - Added article: 返回顶部
2026-05-30 17:37:13.779 [main] INFO c.e.datacollect.command.CrawlCommand - Successfully crawled 189 articles from https://tv.cctv.com
2026-05-30 17:37:13.779 [main] INFO c.e.datacollect.view.ConsoleView - Success: Crawled 189 articles.
2026-05-30 17:37:13.779 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
2026-05-30 17:37:21.446 [main] DEBUG c.e.d.controller.CrawlerController - Executing command: list
2026-05-30 17:37:21.446 [main] DEBUG c.e.datacollect.command.ListCommand - Listing articles
2026-05-30 17:37:21.446 [main] DEBUG c.e.d.repository.ArticleRepository - Returning 189 articles (unmodifiable)
2026-05-30 17:37:21.446 [main] DEBUG c.e.datacollect.view.ConsoleView - Displaying 189 articles
2026-05-30 17:37:21.459 [main] DEBUG c.e.datacollect.view.ConsoleView - Reading input from console
Loading…
Cancel
Save