Browse Source

上传文件至 'project'

main
Yuanruirui 3 weeks ago
parent
commit
9f7a211be2
  1. BIN
      project/202506050325-袁锐睿-期末实验报告.docx
  2. 1
      project/crawler.log
  3. 100
      project/crawler_20260527.log
  4. BIN
      project/data.zip
  5. BIN
      project/logs.zip
  6. BIN
      project/src.zip

BIN
project/202506050325-袁锐睿-期末实验报告.docx

Binary file not shown.

1
project/crawler.log

@ -0,0 +1 @@
[2026-05-21 14:54:56] [CRAWLER_004] 未知网站: unknown,可选值: stats, cas, gov, all

100
project/crawler_20260527.log

@ -0,0 +1,100 @@
[2026-05-27 14:58:22] [INFO] [com.crawler.http.JsoupHttpClient] JsoupHttpClient initialized, timeout: 15000ms
[2026-05-27 14:58:22] [INFO] [com.crawler.site.GovNewsCrawler] ========== Start crawling: 中国政府网 ==========
[2026-05-27 14:58:22] [INFO] [com.crawler.site.GovNewsCrawler] Total pages to crawl: 1
[2026-05-27 14:58:22] [DEBUG] [com.crawler.site.GovNewsCrawler] Preparing to crawl page 1: https://www.gov.cn/
[2026-05-27 14:58:22] [DEBUG] [com.crawler.http.JsoupHttpClient] Waiting 1573ms before request
[2026-05-27 14:58:24] [DEBUG] [com.crawler.http.JsoupHttpClient] Using User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 14_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1 Safari/605.1.15
[2026-05-27 14:58:24] [INFO] [com.crawler.http.JsoupHttpClient] Starting request: https://www.gov.cn/
[2026-05-27 14:58:24] [INFO] [com.crawler.http.JsoupHttpClient] Request completed: https://www.gov.cn/ status: 200 duration: 404ms
[2026-05-27 14:58:24] [INFO] [com.crawler.site.GovNewsCrawler] Page 1 completed, got 3 items
[2026-05-27 14:58:24] [INFO] [com.crawler.site.GovNewsCrawler] Saving 3 items
[2026-05-27 14:58:24] [INFO] [com.crawler.site.GovNewsCrawler] ========== Crawling completed: 中国政府网, duration: 2081ms ==========
[2026-05-27 15:00:00] [INFO] [com.crawler.http.JsoupHttpClient] JsoupHttpClient initialized, timeout: 15000ms
[2026-05-27 15:00:00] [INFO] [com.crawler.site.StatsGovCrawler] ========== Start crawling: 国家统计局-新闻发布 ==========
[2026-05-27 15:00:00] [INFO] [com.crawler.site.StatsGovCrawler] Total pages to crawl: 1
[2026-05-27 15:00:00] [DEBUG] [com.crawler.site.StatsGovCrawler] Preparing to crawl page 1: https://www.stats.gov.cn/sj/sjjd/
[2026-05-27 15:00:00] [DEBUG] [com.crawler.http.JsoupHttpClient] Waiting 1244ms before request
[2026-05-27 15:00:01] [DEBUG] [com.crawler.http.JsoupHttpClient] Using User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
[2026-05-27 15:00:01] [INFO] [com.crawler.http.JsoupHttpClient] Starting request: https://www.stats.gov.cn/sj/sjjd/
[2026-05-27 15:00:02] [INFO] [com.crawler.http.JsoupHttpClient] Request completed: https://www.stats.gov.cn/sj/sjjd/ status: 200 duration: 840ms
[2026-05-27 15:00:02] [INFO] [com.crawler.site.StatsGovCrawler] Page 1 completed, got 30 items
[2026-05-27 15:00:02] [INFO] [com.crawler.site.StatsGovCrawler] Saving 30 items
[2026-05-27 15:00:02] [INFO] [com.crawler.site.StatsGovCrawler] ========== Crawling completed: 国家统计局-新闻发布, duration: 2265ms ==========
[2026-05-27 15:00:02] [INFO] [com.crawler.http.JsoupHttpClient] JsoupHttpClient initialized, timeout: 15000ms
[2026-05-27 15:00:02] [INFO] [com.crawler.site.CasResearchCrawler] ========== Start crawling: 中科院-科研动态 ==========
[2026-05-27 15:00:02] [INFO] [com.crawler.site.CasResearchCrawler] Total pages to crawl: 1
[2026-05-27 15:00:02] [DEBUG] [com.crawler.site.CasResearchCrawler] Preparing to crawl page 1: https://www.cas.cn/
[2026-05-27 15:00:02] [DEBUG] [com.crawler.http.JsoupHttpClient] Waiting 2944ms before request
[2026-05-27 15:00:05] [DEBUG] [com.crawler.http.JsoupHttpClient] Using User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
[2026-05-27 15:00:05] [INFO] [com.crawler.http.JsoupHttpClient] Starting request: https://www.cas.cn/
[2026-05-27 15:00:05] [INFO] [com.crawler.http.JsoupHttpClient] Request completed: https://www.cas.cn/ status: 200 duration: 242ms
[2026-05-27 15:00:05] [DEBUG] [com.crawler.http.JsoupHttpClient] Cookie updated
[2026-05-27 15:00:06] [INFO] [com.crawler.site.CasResearchCrawler] Page 1 completed, got 14 items
[2026-05-27 15:00:06] [INFO] [com.crawler.site.CasResearchCrawler] Saving 14 items
[2026-05-27 15:00:06] [INFO] [com.crawler.site.CasResearchCrawler] ========== Crawling completed: 中科院-科研动态, duration: 3254ms ==========
[2026-05-27 15:00:06] [INFO] [com.crawler.http.JsoupHttpClient] JsoupHttpClient initialized, timeout: 15000ms
[2026-05-27 15:00:06] [INFO] [com.crawler.site.GovNewsCrawler] ========== Start crawling: 中国政府网 ==========
[2026-05-27 15:00:06] [INFO] [com.crawler.site.GovNewsCrawler] Total pages to crawl: 1
[2026-05-27 15:00:06] [DEBUG] [com.crawler.site.GovNewsCrawler] Preparing to crawl page 1: https://www.gov.cn/
[2026-05-27 15:00:06] [DEBUG] [com.crawler.http.JsoupHttpClient] Waiting 1484ms before request
[2026-05-27 15:00:07] [DEBUG] [com.crawler.http.JsoupHttpClient] Using User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 14_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1 Safari/605.1.15
[2026-05-27 15:00:07] [INFO] [com.crawler.http.JsoupHttpClient] Starting request: https://www.gov.cn/
[2026-05-27 15:00:07] [INFO] [com.crawler.http.JsoupHttpClient] Request completed: https://www.gov.cn/ status: 200 duration: 236ms
[2026-05-27 15:00:07] [INFO] [com.crawler.site.GovNewsCrawler] Page 1 completed, got 3 items
[2026-05-27 15:00:07] [INFO] [com.crawler.site.GovNewsCrawler] Saving 3 items
[2026-05-27 15:00:07] [INFO] [com.crawler.site.GovNewsCrawler] ========== Crawling completed: 中国政府网, duration: 1740ms ==========
[2026-05-27 15:02:06] [INFO] [com.crawler.http.JsoupHttpClient] JsoupHttpClient initialized, timeout: 15000ms
[2026-05-27 15:02:06] [INFO] [com.crawler.site.StatsGovCrawler] ========== Start crawling: 国家统计局-新闻发布 ==========
[2026-05-27 15:02:06] [INFO] [com.crawler.site.StatsGovCrawler] Total pages to crawl: 1
[2026-05-27 15:02:06] [DEBUG] [com.crawler.site.StatsGovCrawler] Preparing to crawl page 1: https://www.stats.gov.cn/sj/sjjd/
[2026-05-27 15:02:06] [DEBUG] [com.crawler.http.JsoupHttpClient] Waiting 2483ms before request
[2026-05-27 15:02:08] [DEBUG] [com.crawler.http.JsoupHttpClient] Using User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
[2026-05-27 15:02:08] [INFO] [com.crawler.http.JsoupHttpClient] Starting request: https://www.stats.gov.cn/sj/sjjd/
[2026-05-27 15:02:09] [INFO] [com.crawler.http.JsoupHttpClient] Request completed: https://www.stats.gov.cn/sj/sjjd/ status: 200 duration: 642ms
[2026-05-27 15:02:09] [INFO] [com.crawler.site.StatsGovCrawler] Page 1 completed, got 30 items
[2026-05-27 15:02:09] [INFO] [com.crawler.site.StatsGovCrawler] Saving 30 items
[2026-05-27 15:02:09] [INFO] [com.crawler.site.StatsGovCrawler] ========== Crawling completed: 国家统计局-新闻发布, duration: 3231ms ==========
[2026-05-27 15:08:01] [INFO] [com.crawler.http.JsoupHttpClient] JsoupHttpClient initialized, timeout: 15000ms
[2026-05-27 15:08:01] [INFO] [com.crawler.site.StatsGovCrawler] ========== Start crawling: 国家统计局-新闻发布 ==========
[2026-05-27 15:08:01] [INFO] [com.crawler.site.StatsGovCrawler] Total pages to crawl: 1
[2026-05-27 15:08:01] [DEBUG] [com.crawler.site.StatsGovCrawler] Preparing to crawl page 1: https://www.stats.gov.cn/sj/sjjd/
[2026-05-27 15:08:01] [DEBUG] [com.crawler.http.JsoupHttpClient] Waiting 1781ms before request
[2026-05-27 15:08:03] [DEBUG] [com.crawler.http.JsoupHttpClient] Using User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
[2026-05-27 15:08:03] [INFO] [com.crawler.http.JsoupHttpClient] Starting request: https://www.stats.gov.cn/sj/sjjd/
[2026-05-27 15:08:03] [INFO] [com.crawler.http.JsoupHttpClient] Request completed: https://www.stats.gov.cn/sj/sjjd/ status: 200 duration: 507ms
[2026-05-27 15:08:03] [INFO] [com.crawler.site.StatsGovCrawler] Page 1 completed, got 30 items
[2026-05-27 15:08:03] [INFO] [com.crawler.site.StatsGovCrawler] Saving 30 items
[2026-05-27 15:08:03] [INFO] [com.crawler.site.StatsGovCrawler] ========== Crawling completed: 国家统计局-新闻发布, duration: 2507ms ==========
[2026-05-27 15:39:23] [INFO] [com.crawler.http.JsoupHttpClient] JsoupHttpClient initialized, timeout: 15000ms
[2026-05-27 15:39:23] [INFO] [com.crawler.site.StatsGovCrawler] ========== Start crawling: 国家统计局-新闻发布 ==========
[2026-05-27 15:39:23] [INFO] [com.crawler.site.StatsGovCrawler] Total pages to crawl: 1
[2026-05-27 15:39:23] [DEBUG] [com.crawler.site.StatsGovCrawler] Preparing to crawl page 1: https://www.stats.gov.cn/sj/sjjd/
[2026-05-27 15:39:23] [DEBUG] [com.crawler.http.JsoupHttpClient] Waiting 1015ms before request
[2026-05-27 15:39:24] [DEBUG] [com.crawler.http.JsoupHttpClient] Using User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
[2026-05-27 15:39:24] [INFO] [com.crawler.http.JsoupHttpClient] Starting request: https://www.stats.gov.cn/sj/sjjd/
[2026-05-27 15:39:24] [INFO] [com.crawler.http.JsoupHttpClient] Request completed: https://www.stats.gov.cn/sj/sjjd/ status: 200 duration: 647ms
[2026-05-27 15:39:24] [INFO] [com.crawler.site.StatsGovCrawler] Page 1 completed, got 30 items
[2026-05-27 15:39:24] [INFO] [com.crawler.site.StatsGovCrawler] Saving 30 items
[2026-05-27 15:39:24] [INFO] [com.crawler.site.StatsGovCrawler] ========== Crawling completed: 国家统计局-新闻发布, duration: 1795ms ==========
[2026-05-27 15:40:20] [INFO] [com.crawler.http.JsoupHttpClient] JsoupHttpClient initialized, timeout: 15000ms
[2026-05-27 15:40:20] [INFO] [com.crawler.site.StatsGovCrawler] ========== Start crawling: 国家统计局-新闻发布 ==========
[2026-05-27 15:40:20] [INFO] [com.crawler.site.StatsGovCrawler] Total pages to crawl: 1
[2026-05-27 15:40:20] [DEBUG] [com.crawler.site.StatsGovCrawler] Preparing to crawl page 1: https://www.stats.gov.cn/sj/sjjd/
[2026-05-27 15:40:20] [DEBUG] [com.crawler.http.JsoupHttpClient] Waiting 2219ms before request
[2026-05-27 15:40:22] [DEBUG] [com.crawler.http.JsoupHttpClient] Using User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
[2026-05-27 15:40:22] [INFO] [com.crawler.http.JsoupHttpClient] Starting request: https://www.stats.gov.cn/sj/sjjd/
[2026-05-27 15:40:23] [INFO] [com.crawler.http.JsoupHttpClient] Request completed: https://www.stats.gov.cn/sj/sjjd/ status: 200 duration: 600ms
[2026-05-27 15:40:23] [INFO] [com.crawler.site.StatsGovCrawler] Page 1 completed, got 30 items
[2026-05-27 15:40:23] [INFO] [com.crawler.site.StatsGovCrawler] Saving 30 items
[2026-05-27 15:40:23] [INFO] [com.crawler.site.StatsGovCrawler] ========== Crawling completed: 国家统计局-新闻发布, duration: 2922ms ==========
[2026-05-27 15:42:23] [INFO] [com.crawler.http.JsoupHttpClient] JsoupHttpClient initialized, timeout: 15000ms
[2026-05-27 15:42:23] [INFO] [com.crawler.site.StatsGovCrawler] ========== Start crawling: 国家统计局-新闻发布 ==========
[2026-05-27 15:42:23] [INFO] [com.crawler.site.StatsGovCrawler] Total pages to crawl: 1
[2026-05-27 15:42:23] [DEBUG] [com.crawler.site.StatsGovCrawler] Preparing to crawl page 1: https://www.stats.gov.cn/sj/sjjd/
[2026-05-27 15:42:23] [DEBUG] [com.crawler.http.JsoupHttpClient] Waiting 2380ms before request
[2026-05-27 15:42:25] [DEBUG] [com.crawler.http.JsoupHttpClient] Using User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36
[2026-05-27 15:42:25] [INFO] [com.crawler.http.JsoupHttpClient] Starting request: https://www.stats.gov.cn/sj/sjjd/
[2026-05-27 15:42:26] [INFO] [com.crawler.http.JsoupHttpClient] Request completed: https://www.stats.gov.cn/sj/sjjd/ status: 200 duration: 644ms
[2026-05-27 15:42:26] [INFO] [com.crawler.site.StatsGovCrawler] Page 1 completed, got 30 items
[2026-05-27 15:42:26] [INFO] [com.crawler.site.StatsGovCrawler] Saving 30 items
[2026-05-27 15:42:26] [INFO] [com.crawler.site.StatsGovCrawler] ========== Crawling completed: 国家统计局-新闻发布, duration: 3130ms ==========

BIN
project/data.zip

Binary file not shown.

BIN
project/logs.zip

Binary file not shown.

BIN
project/src.zip

Binary file not shown.
Loading…
Cancel
Save