docs(phase-6): add research and 2 plans for quality and frontend
This commit is contained in:
parent
6f9d4df3e2
commit
c58c7ee5c2
92
.planning/phases/06-quality-frontend/06-01-PLAN.md
Normal file
92
.planning/phases/06-quality-frontend/06-01-PLAN.md
Normal file
@ -0,0 +1,92 @@
|
|||||||
|
---
|
||||||
|
phase: 6
|
||||||
|
plan: 1
|
||||||
|
wave: 1
|
||||||
|
title: "三平台数据解析函数单元测试(QUAL-02)"
|
||||||
|
depends_on: []
|
||||||
|
files_modified:
|
||||||
|
- tests/ingest/test_configs_boss.py # NEW
|
||||||
|
- tests/ingest/test_configs_qcwy.py # NEW
|
||||||
|
- tests/ingest/test_configs_zhilian.py # NEW
|
||||||
|
autonomous: true
|
||||||
|
requirements:
|
||||||
|
- QUAL-02
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 6 Plan 01: 三平台解析函数单元测试(QUAL-02)
|
||||||
|
|
||||||
|
## Objective
|
||||||
|
|
||||||
|
为 `app/services/ingest/configs/` 中的三平台 `_extract_*` 和 `_build_*_push` 函数
|
||||||
|
新增单元测试,覆盖正常字段和缺字段场景。
|
||||||
|
|
||||||
|
去重逻辑测试(dedup.py)已在 Phase 5 完成(6 个测试),本 Plan 仅补充解析函数测试。
|
||||||
|
|
||||||
|
## Must Haves
|
||||||
|
|
||||||
|
- [ ] `tests/ingest/test_configs_boss.py`:8 个测试,覆盖 `_extract_job_id`、`_extract_company_name`、`_build_boss_push`
|
||||||
|
- [ ] `tests/ingest/test_configs_qcwy.py`:10 个测试,覆盖 `_extract_job_id`、`_extract_update_dt`、`_extract_company_name`、`_build_qcwy_push`(含 welfare 列表场景)
|
||||||
|
- [ ] `tests/ingest/test_configs_zhilian.py`:9 个测试,覆盖 `_extract_number`、`_extract_fpt`、`_extract_company_name`、`_build_zhilian_push`
|
||||||
|
- [ ] `pipenv run python -m pytest tests/ingest/ -v --tb=short` 全部绿色(含原有 dedup 6 个)
|
||||||
|
- [ ] `pipenv run python -m pytest tests/ -v` 全量通过
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Wave 1
|
||||||
|
|
||||||
|
### Task 1.1: tests/ingest/test_configs_boss.py
|
||||||
|
|
||||||
|
**测试清单:**
|
||||||
|
1. `test_extract_job_id_from_jobBaseInfoVO` — 正常嵌套字段
|
||||||
|
2. `test_extract_job_id_missing` — 缺 jobBaseInfoVO → None
|
||||||
|
3. `test_extract_company_name_from_name` — data["name"] 直接取
|
||||||
|
4. `test_extract_company_name_from_companyFullInfoVO` — 嵌套字段
|
||||||
|
5. `test_extract_company_name_missing` → None
|
||||||
|
6. `test_build_boss_push_full` — 完整字段,验证 source_type="Boss直聘"、url 含 encryptJobId
|
||||||
|
7. `test_build_boss_push_partial` — 缺字段不 raise,返回合理降级值
|
||||||
|
8. `test_build_boss_push_none_data` — 空 dict,关键字段为 None
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Task 1.2: tests/ingest/test_configs_qcwy.py
|
||||||
|
|
||||||
|
**测试清单:**
|
||||||
|
1. `test_extract_job_id_normal`
|
||||||
|
2. `test_extract_job_id_missing` → None
|
||||||
|
3. `test_extract_update_dt_normal`
|
||||||
|
4. `test_extract_update_dt_missing` → None
|
||||||
|
5. `test_extract_company_name_from_companyName`
|
||||||
|
6. `test_extract_company_name_from_company_name_fallback`
|
||||||
|
7. `test_extract_company_name_missing` → None
|
||||||
|
8. `test_build_qcwy_push_welfare_list` — welfare 为对象列表,提取 chineseTitle
|
||||||
|
9. `test_build_qcwy_push_welfare_string` — welfare 为字符串
|
||||||
|
10. `test_build_qcwy_push_partial` — 缺字段 → 合理降级,source_type="前程无忧"
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Task 1.3: tests/ingest/test_configs_zhilian.py
|
||||||
|
|
||||||
|
**测试清单:**
|
||||||
|
1. `test_extract_number_normal`
|
||||||
|
2. `test_extract_number_missing` → None
|
||||||
|
3. `test_extract_fpt_normal`
|
||||||
|
4. `test_extract_fpt_missing` → None
|
||||||
|
5. `test_extract_company_name_from_companyName`
|
||||||
|
6. `test_extract_company_name_from_name_fallback`
|
||||||
|
7. `test_extract_company_name_missing` → None
|
||||||
|
8. `test_build_zhilian_push_skill_labels` — skillLabel 列表,提取 value
|
||||||
|
9. `test_build_zhilian_push_partial` — 缺字段降级,source_type="智联招聘"
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 运行新测试
|
||||||
|
pipenv run python -m pytest tests/ingest/ -v --tb=short
|
||||||
|
|
||||||
|
# 全量回归
|
||||||
|
pipenv run python -m pytest tests/ -v --tb=short
|
||||||
|
```
|
||||||
|
|
||||||
|
**预期:** 所有 27-30 个测试通过,全量 ≥ 130 个测试全绿
|
||||||
191
.planning/phases/06-quality-frontend/06-02-PLAN.md
Normal file
191
.planning/phases/06-quality-frontend/06-02-PLAN.md
Normal file
@ -0,0 +1,191 @@
|
|||||||
|
---
|
||||||
|
phase: 6
|
||||||
|
plan: 2
|
||||||
|
wave: 2
|
||||||
|
title: "爬虫入库统计 API + 前端监控区域(QUAL-06/07)"
|
||||||
|
depends_on:
|
||||||
|
- "06-01-PLAN.md"
|
||||||
|
files_modified:
|
||||||
|
- app/api/v1/job/job.py # 新增 GET /data/stats 端点
|
||||||
|
- web/src/views/cleaning/monitor.vue # 新增爬虫统计区域
|
||||||
|
- web/src/api/index.js # 新增 getIngestStats API
|
||||||
|
autonomous: true
|
||||||
|
requirements:
|
||||||
|
- QUAL-06
|
||||||
|
- QUAL-07
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 6 Plan 02: 爬虫入库统计 API + 前端监控(QUAL-06/07)
|
||||||
|
|
||||||
|
## Objective
|
||||||
|
|
||||||
|
### QUAL-07 状态确认(已完成)
|
||||||
|
|
||||||
|
`cleaning/monitor.vue` 已包含:
|
||||||
|
- ✅ 待清洗公司列表(队列表格)
|
||||||
|
- ✅ 触发清洗
|
||||||
|
- ✅ 查看结果
|
||||||
|
|
||||||
|
**QUAL-07 无需额外改动。**
|
||||||
|
|
||||||
|
### QUAL-06 缺口
|
||||||
|
|
||||||
|
现有监控页面仅展示公司清洗队列状态,**缺少爬虫职位入库的实时统计**:
|
||||||
|
- 各平台最近抓取时间(ClickHouse `created_at` 最大值)
|
||||||
|
- 数量趋势(近 7 天每日入库量)
|
||||||
|
- 错误状态(失败/去重统计暂不通过 ClickHouse,后续可扩展)
|
||||||
|
|
||||||
|
## Must Haves
|
||||||
|
|
||||||
|
- [ ] 后端新增 `GET /api/v1/job/data/stats` 端点,接受 `platform`(可选)和 `days`(默认 7)参数
|
||||||
|
- 返回:各平台 `total`、`today`、`last_ingest_at`、`daily_counts`(列表)
|
||||||
|
- [ ] 前端 `monitor.vue` 在现有 4 个 metric-card 上方新增一个"爬虫入库"统计区域:
|
||||||
|
- 3 个平台卡片,各显示:总量、今日、最近抓取时间
|
||||||
|
- 一个数量趋势表格(近 7 天,按日显示 boss/qcwy/zhilian)
|
||||||
|
- [ ] `web/src/api/index.js` 新增 `getIngestStats` 函数
|
||||||
|
- [ ] 前端 `pnpm dev`(或工具链验证)可正常加载
|
||||||
|
- [ ] 全量 pytest 回归 `pipenv run python -m pytest tests/` 无失败
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Wave 2(依赖 Plan 01)
|
||||||
|
|
||||||
|
### Task 2.1: 后端新增 GET /job/data/stats 端点
|
||||||
|
|
||||||
|
<read_first>
|
||||||
|
- `app/api/v1/job/job.py`(当前 123 行)
|
||||||
|
- `app/core/clickhouse.py`(获取 client 方式)
|
||||||
|
</read_first>
|
||||||
|
|
||||||
|
<action>
|
||||||
|
在 `job.py` 中追加端点:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@router.get("/data/stats", summary="各平台入库统计")
|
||||||
|
async def get_ingest_stats(
|
||||||
|
platform: Optional[PlatformType] = None,
|
||||||
|
days: int = 7,
|
||||||
|
service: IngestService = Depends(get_ingest_service),
|
||||||
|
) -> Dict[str, Any]:
|
||||||
|
"""
|
||||||
|
查询各平台 ClickHouse 入库统计:总量、今日、最近入库时间、近 N 天每日趋势
|
||||||
|
"""
|
||||||
|
from app.core.clickhouse import clickhouse_manager
|
||||||
|
client = await clickhouse_manager.get_client()
|
||||||
|
|
||||||
|
platforms = [platform.value] if platform else ["boss", "qcwy", "zhilian"]
|
||||||
|
table_map = {"boss": "boss_job", "qcwy": "qcwy_job", "zhilian": "zhilian_job"}
|
||||||
|
|
||||||
|
result = {}
|
||||||
|
for p in platforms:
|
||||||
|
table = f"job_data.{table_map[p]}"
|
||||||
|
try:
|
||||||
|
# 总量
|
||||||
|
r_total = await client.query(f"SELECT count() FROM {table}")
|
||||||
|
total = r_total.result_rows[0][0] if r_total.result_rows else 0
|
||||||
|
|
||||||
|
# 今日
|
||||||
|
r_today = await client.query(
|
||||||
|
f"SELECT count() FROM {table} WHERE toDate(created_at) = today()"
|
||||||
|
)
|
||||||
|
today = r_today.result_rows[0][0] if r_today.result_rows else 0
|
||||||
|
|
||||||
|
# 最近入库时间
|
||||||
|
r_last = await client.query(
|
||||||
|
f"SELECT max(created_at) FROM {table}"
|
||||||
|
)
|
||||||
|
last_at = str(r_last.result_rows[0][0]) if r_last.result_rows and r_last.result_rows[0][0] else None
|
||||||
|
|
||||||
|
# 近 N 天每日趋势
|
||||||
|
r_daily = await client.query(
|
||||||
|
f"SELECT toDate(created_at) AS day, count() AS cnt "
|
||||||
|
f"FROM {table} "
|
||||||
|
f"WHERE created_at >= today() - {days} "
|
||||||
|
f"GROUP BY day ORDER BY day DESC"
|
||||||
|
)
|
||||||
|
daily_counts = [{"date": str(row[0]), "count": row[1]} for row in r_daily.result_rows]
|
||||||
|
|
||||||
|
result[p] = {
|
||||||
|
"total": total,
|
||||||
|
"today": today,
|
||||||
|
"last_ingest_at": last_at,
|
||||||
|
"daily_counts": daily_counts,
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
result[p] = {"error": str(e), "total": 0, "today": 0, "last_ingest_at": None, "daily_counts": []}
|
||||||
|
|
||||||
|
return {"code": 200, "data": result}
|
||||||
|
```
|
||||||
|
</action>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Task 2.2: 前端新增 getIngestStats API
|
||||||
|
|
||||||
|
<action>
|
||||||
|
在 `web/src/api/index.js` 找到已有 API 函数,追加:
|
||||||
|
|
||||||
|
```js
|
||||||
|
getIngestStats: (params) => request.get('/job/data/stats', { params }),
|
||||||
|
```
|
||||||
|
</action>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Task 2.3: 前端 monitor.vue 添加爬虫统计区域
|
||||||
|
|
||||||
|
<action>
|
||||||
|
在 `monitor.vue` 的 `<section class="metric-grid">` **之前** 插入一个新 section:
|
||||||
|
|
||||||
|
```html
|
||||||
|
<!-- 爬虫入库统计 -->
|
||||||
|
<section class="ingest-grid">
|
||||||
|
<n-card
|
||||||
|
v-for="p in ingestStats"
|
||||||
|
:key="p.platform"
|
||||||
|
:bordered="false"
|
||||||
|
class="ingest-card"
|
||||||
|
>
|
||||||
|
<div class="ingest-platform-label">{{ p.label }}</div>
|
||||||
|
<div class="ingest-total">{{ p.total.toLocaleString() }}</div>
|
||||||
|
<div class="ingest-meta">
|
||||||
|
今日 +{{ p.today }} · 最近 {{ p.last_ingest_at || '--' }}
|
||||||
|
</div>
|
||||||
|
</n-card>
|
||||||
|
<n-card :bordered="false" class="ingest-trend-card">
|
||||||
|
<div class="ingest-trend-title">近 7 天入库趋势</div>
|
||||||
|
<n-data-table
|
||||||
|
size="small"
|
||||||
|
:columns="trendColumns"
|
||||||
|
:data="trendRows"
|
||||||
|
:pagination="false"
|
||||||
|
/>
|
||||||
|
</n-card>
|
||||||
|
</section>
|
||||||
|
```
|
||||||
|
|
||||||
|
对应 `<script setup>` 中加入:
|
||||||
|
- `const ingestStatsRaw = ref({})`
|
||||||
|
- `const fetchIngestStats = async ()` → 调用 `api.getIngestStats()`
|
||||||
|
- `const ingestStats = computed(...)` → 格式化三平台卡片数据
|
||||||
|
- `const trendRows = computed(...)` → 转置为按日期行,boss/qcwy/zhilian 各列
|
||||||
|
- `const trendColumns` → 日期 + 三平台列
|
||||||
|
- 在 `refreshAll()` 中加入 `fetchIngestStats()`
|
||||||
|
</action>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 后端
|
||||||
|
pipenv run python -m pytest tests/ -v --tb=short
|
||||||
|
|
||||||
|
# 前端(确认 pnpm 就绪,验证构建无报错)
|
||||||
|
cd web && pnpm install 2>&1 | tail -3
|
||||||
|
```
|
||||||
|
|
||||||
|
手动验证:
|
||||||
|
1. `pnpm dev` 启动前端
|
||||||
|
2. 访问「清洗监控」页面,确认顶部出现三个平台入库卡片和近 7 天趋势表格
|
||||||
|
3. 数据加载无报错,最近入库时间正确显示格式化后的时间
|
||||||
102
.planning/phases/06-quality-frontend/06-RESEARCH.md
Normal file
102
.planning/phases/06-quality-frontend/06-RESEARCH.md
Normal file
@ -0,0 +1,102 @@
|
|||||||
|
# Phase 6: 质量 & 前端 — 技术研究
|
||||||
|
|
||||||
|
**研究日期:** 2026-03-21
|
||||||
|
**阶段目标:** 数据解析测试覆盖(QUAL-02)+ 前端监控和清洗页面优化(QUAL-06/07)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. QUAL-02:数据解析和去重逻辑单元测试
|
||||||
|
|
||||||
|
### 现状
|
||||||
|
|
||||||
|
| 测试模块 | 文件 | 状态 |
|
||||||
|
|---------|------|------|
|
||||||
|
| dedup 30天窗口 | `tests/ingest/test_dedup.py` | ✅ 已有(Phase 5 新增 6 个) |
|
||||||
|
| boss 解析函数 | 无 | ❌ 缺失 |
|
||||||
|
| qcwy 解析函数 | 无 | ❌ 缺失 |
|
||||||
|
| zhilian 解析函数 | 无 | ❌ 缺失 |
|
||||||
|
|
||||||
|
### 需要测试的函数(`app/services/ingest/configs/`)
|
||||||
|
|
||||||
|
**boss.py:**
|
||||||
|
- `_extract_job_id(data)` → `data["jobBaseInfoVO"]["jobId"]`
|
||||||
|
- `_extract_company_name(data)` → `data["name"]` 或 `data["companyFullInfoVO"]["name"]`
|
||||||
|
- `_build_boss_push(data)` → 完整 push dict
|
||||||
|
|
||||||
|
**qcwy.py:**
|
||||||
|
- `_extract_job_id(data)` → `data["jobId"]`
|
||||||
|
- `_extract_update_dt(data)` → `data["updateDateTime"]`
|
||||||
|
- `_extract_company_name(data)` → `data["companyName"]` 或 `data["company_name"]`
|
||||||
|
- `_build_qcwy_push(data)` → 完整 push dict(包含 welfare 列表处理)
|
||||||
|
|
||||||
|
**zhilian.py:**
|
||||||
|
- `_extract_number(data)` → `data["number"]`
|
||||||
|
- `_extract_fpt(data)` → `data["firstPublishTime"]`
|
||||||
|
- `_extract_company_name(data)` → `data["companyName"]` 或 `data["name"]`
|
||||||
|
- `_build_zhilian_push(data)` → 完整 push dict
|
||||||
|
|
||||||
|
### 测试策略
|
||||||
|
- 每个函数:正常字段场景 + 缺字段场景(返回 None)
|
||||||
|
- `_build_*_push`:关键字段映射 + None 值降级
|
||||||
|
- 总计:约 24-30 个测试
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. QUAL-06:前端爬虫监控页面
|
||||||
|
|
||||||
|
### 现状
|
||||||
|
|
||||||
|
现有页面(`web/src/views/cleaning/monitor.vue`)显示的是 **公司清洗监控**(MySQL 队列状态),
|
||||||
|
**不是**爬虫职位抓取监控。
|
||||||
|
|
||||||
|
**成功标准要求(缺口):**
|
||||||
|
- ❌ 各平台最近抓取时间
|
||||||
|
- ❌ 数量趋势(历史入库量趋势图)
|
||||||
|
- ❌ 错误状态
|
||||||
|
|
||||||
|
### 后端 ClickHouse 可查询的数据
|
||||||
|
- `job_data.boss_job / qcwy_job / zhilian_job` 表有 `created_at` 字段
|
||||||
|
- 可以按天统计最近 7 天每个平台的入库数量
|
||||||
|
- 最近一条记录的 `created_at` = 最近抓取时间
|
||||||
|
|
||||||
|
### 方案
|
||||||
|
在 `recruitment/` 模块中已有三平台数据查看页面(boss/qcwy/zhilian index.vue),
|
||||||
|
可以在现有 `recruitment/components/PlatformData.vue` 顶部增加统计卡片:
|
||||||
|
- 总数量、今日入库量、最近抓取时间
|
||||||
|
|
||||||
|
或者新建一个 **爬虫数据统计 API**(后端)+ 在 `monitor.vue` 中加一个爬虫入库统计区域。
|
||||||
|
|
||||||
|
**决策(较小改动):**
|
||||||
|
1. 后端新增 `GET /job/data/stats?platform=boss&days=7` 端点,返回:
|
||||||
|
- `total`: 总量
|
||||||
|
- `today`: 今日新增
|
||||||
|
- `last_ingest_at`: 最近入库时间
|
||||||
|
- `daily_counts`: 近 7 天每日入库量(用于趋势展示)
|
||||||
|
2. 在 `monitor.vue` 顶部(现有 metric-grid 之前)新增爬虫入库数量卡片
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. QUAL-07:前端数据清洗管理页面
|
||||||
|
|
||||||
|
### 现状
|
||||||
|
|
||||||
|
`cleaning/monitor.vue`(1191行)已包含:
|
||||||
|
- ✅ 待清洗公司列表(队列表格)
|
||||||
|
- ✅ 触发清洗(按来源批量执行)
|
||||||
|
- ✅ 查看结果(JSON 模态窗口)
|
||||||
|
|
||||||
|
**现有缺口(成功标准 4 实际满足):**
|
||||||
|
- ✅ 查看待清洗公司列表 → 已有
|
||||||
|
- ✅ 触发清洗 → 已有
|
||||||
|
- ✅ 查看结果 → 已有
|
||||||
|
|
||||||
|
**结论:** QUAL-07 基本满足,主要补充文档/说明,或轻微 UI 优化。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. 计划分解
|
||||||
|
|
||||||
|
| 计划 | 内容 | 要求 |
|
||||||
|
|------|------|------|
|
||||||
|
| **Plan 01** | 三平台 _extract_*/_build_*_push 函数单元测试(约 25 个) | QUAL-02 |
|
||||||
|
| **Plan 02** | 后端新增爬虫统计 API + 前端 monitor.vue 添加爬虫入库统计区域 | QUAL-06/07 |
|
||||||
Loading…
x
Reference in New Issue
Block a user