112 lines
5.7 KiB
Markdown
112 lines
5.7 KiB
Markdown
# Phase 4: 后端 & 外部脚本接入 — 技术研究
|
||
|
||
**研究日期:** 2026-03-21
|
||
**阶段目标:** 后端 facade 使用 spiderJobs.platforms.*(已基于 crawler_core),私有复制文件删除;外部脚本 jobs_spider/ 标记废弃
|
||
|
||
---
|
||
|
||
## 1. 现状分析
|
||
|
||
### 1.1 app/services/crawler/ 文件结构
|
||
|
||
```
|
||
app/services/crawler/
|
||
├── _base.py ❌ 复制自 spiderJobs/core/base.py(ApiResult/BaseFetcher/BaseSearcher 旧版)
|
||
├── _http_client.py ❌ 复制自 spiderJobs/core/http_client.py
|
||
├── _boss_api.py ❌ 复制自 spiderJobs/platforms/boss/api.py
|
||
├── _boss_client.py ❌ 复制自 spiderJobs/platforms/boss/client.py(含 batch() 方法)
|
||
├── _boss_sign.py ❌ 复制自 spiderJobs/platforms/boss/sign.py
|
||
├── _job51_api.py ❌ 复制自 spiderJobs/platforms/job51/api.py
|
||
├── _job51_client.py ❌ 复制自 spiderJobs/platforms/job51/client.py
|
||
├── _job51_sign.py ❌ 复制自 spiderJobs/platforms/job51/sign.py(应该如此,推测)
|
||
├── _zhilian_api.py ❌ 复制自 spiderJobs/platforms/zhilian/api.py
|
||
├── _zhilian_client.py ❌ 复制自 spiderJobs/platforms/zhilian/client.py
|
||
├── _zhilian_sign.py ❌ 复制自 spiderJobs/platforms/zhilian/sign.py
|
||
├── boss.py ✅ BossService — 结构良好,使用 _boss_*(需改导入)
|
||
├── qcwy.py ✅ QcwyService — 结构良好,使用 _job51_*(需改导入)
|
||
├── zhilian.py ✅ ZhilianService — 结构良好,使用 _zhilian_*(需改导入)
|
||
└── __init__.py ✅ 不变
|
||
```
|
||
|
||
### 1.2 facade 层的对外接口(调用方)
|
||
|
||
| 调用方 | 使用的 Service |
|
||
|--------|----------------|
|
||
| `app/services/cleaning.py` | BossService, QcwyService, ZhilianService |
|
||
| `app/services/company_cleaner.py` | BossService, QcwyService, ZhilianService |
|
||
| `app/services/company_jobs_sync.py` | BossService, QcwyService, ZhilianService |
|
||
| `app/api/v1/...` | 通过 Service 层调用 |
|
||
|
||
**重要:** 这些调用方使用的接口(`set_proxy()`, `get_job_detail()`, `get_company_detail()`, `search_jobs()` 等)保持不变,只修改 facade 内部实现。
|
||
|
||
### 1.3 迁移前后对应关系
|
||
|
||
| 当前私有文件 | 替换为 |
|
||
|-------------|--------|
|
||
| `_boss_api.py` → `GetBrandDetail, GetJobDetail, SearchBrandJobs, SearchRecJobs` | `spiderJobs.platforms.boss.api` |
|
||
| `_boss_client.py` → `BossClient, create_client` | `spiderJobs.platforms.boss.client` |
|
||
| `_boss_sign.py` → `BossSign` | `spiderJobs.platforms.boss.sign`(→ crawler_core 桩) |
|
||
| `_job51_api.py` → `GetCompanyInfo, GetJobDetail, SearchCompanyJobs, SearchRecommendJobs` | `spiderJobs.platforms.job51.api` |
|
||
| `_job51_client.py` → `Job51Client, create_client` | `spiderJobs.platforms.job51.client` |
|
||
| `_zhilian_api.py` → `GetCompanyDetail, GetPositionDetail, SearchCompanyPositions, SearchPositions` | `spiderJobs.platforms.zhilian.api` |
|
||
| `_zhilian_client.py` → `ZhilianClient, create_capi_client, create_cgate_client` | `spiderJobs.platforms.zhilian.client` |
|
||
| `_zhilian_sign.py` → `ZhilianSign` | `spiderJobs.platforms.zhilian.sign`(→ crawler_core 桩) |
|
||
|
||
### 1.4 关键兼容性检查
|
||
|
||
**BossService 依赖 `_boss_client.BossClient.batch()` 方法:**
|
||
- `_boss_client.py` 第 79-85 行有 `batch()` 方法
|
||
- `spiderJobs/platforms/boss/client.py` 也有 `batch()` 方法(Phase 2 迁移时未删除)
|
||
- ✅ 完全兼容
|
||
|
||
**BossService.set_login_data() 设置 `self._signer.mpt/wt2`:**
|
||
- `BossSign` 在 spiderJobs 版是 crawler_core 的桩(完全透传),`mpt/wt2` 属性在 `crawler_core.boss.sign.BossSign` 中存在
|
||
- ✅ 完全兼容
|
||
|
||
**ZhilianService 使用 `create_cgate_client/create_capi_client`:**
|
||
- `spiderJobs/platforms/zhilian/client.py` 导出这两个函数
|
||
- ✅ 完全兼容
|
||
|
||
### 1.5 asyncio.to_thread() 桥接(ARCH-06)
|
||
|
||
后端是 FastAPI 异步框架,而 crawler_core 使用 requests-go(同步)。
|
||
目前 cleaning.py 等是在同步上下文中调用 Service,推测已有 asyncio.to_thread 或者是同步服务。
|
||
|
||
**决策:** 在 facade 层的 Service 类中添加 async 方法(`async_get_job_detail()` 等),
|
||
内部用 `asyncio.to_thread(self.get_job_detail, ...)` 桥接。不修改已有同步方法(让旧调用方继续工作)。
|
||
|
||
### 1.6 jobs_spider/ 废弃
|
||
|
||
`jobs_spider/boss/boos_api.py`(94KB)是最旧的单文件实现,现在三个平台的新实现都在 `spiderJobs/platforms/` 和 `crawler_core/` 中。
|
||
只需在 `jobs_spider/CLAUDE.md` 和文件头添加 `## ⚠️ DEPRECATED` 标记,不删除文件(保留历史)。
|
||
|
||
---
|
||
|
||
## 2. 工作量对比
|
||
|
||
| 维度 | 描述 |
|
||
|------|------|
|
||
| facade 层修改 | 3 个文件(boss.py/qcwy.py/zhilian.py)各 3-5 行 import 修改 |
|
||
| 私有文件 | 9 个文件可以**保留但标记废弃**(或直接删除,风险更低是保留) |
|
||
| asyncio 桥接 | 3 个 Service 类各加 4-5 个 async 方法 |
|
||
| jobs_spider 废弃 | 1 个 DEPRECATED 文件头注释 |
|
||
|
||
---
|
||
|
||
## 3. 成功标准
|
||
|
||
| 标准 | 验证方式 |
|
||
|------|---------|
|
||
| ARCH-06:facade 有 asyncio.to_thread 桥接 | `grep asyncio.to_thread app/services/crawler/*.py` |
|
||
| ARCH-07:facade import spiderJobs.platforms.* | `grep "spiderJobs.platforms" app/services/crawler/boss.py` |
|
||
| ARCH-08:jobs_spider 标记废弃 | `grep DEPRECATED jobs_spider/boss/boos_api.py` |
|
||
| import 验证 | `python -c "from app.services.crawler.boss import BossService"` 无 ImportError |
|
||
| 回归测试 | `pytest tests/ -v` 全部通过(98+α 个) |
|
||
|
||
---
|
||
|
||
## 4. Phase 4 计划分解
|
||
|
||
- **Plan 01**:迁移三个 facade 文件(boss.py/qcwy.py/zhilian.py)改导入 spiderJobs.platforms.*,添加 asyncio.to_thread 异步包装方法
|
||
- **Plan 02**:废弃 9 个私有复制文件和 jobs_spider/ 旧框架(添加 DEPRECATED 文件头)
|