- ARCH-04: job51 migrated to crawler_core (no old deps) - ARCH-05: zhilian migrated to crawler_core (no old deps) - 34 new mock tests (17 job51 + 17 zhilian) - Added _parse_zhilian_response custom parser for zhilian API format - Fixed POST Searcher _request() overrides for job51/zhilian - Full regression: 98 passed in 0.12s
237 lines
7.5 KiB
Markdown
237 lines
7.5 KiB
Markdown
# Testing Patterns
|
|
|
|
**Analysis Date:** 2026-03-21
|
|
|
|
## Test Framework
|
|
|
|
**Runner:**
|
|
- `unittest` (Python standard library) — the only test framework currently in use
|
|
- No pytest, no pytest-anyio, no httpx.AsyncClient test harness configured
|
|
- No test-specific dependencies in `Pipfile` `[dev-packages]` (section is empty)
|
|
|
|
**Assertion Library:**
|
|
- `unittest.TestCase` assertion methods: `assertEqual`, `assertIsNone`, etc.
|
|
|
|
**Run Commands:**
|
|
```bash
|
|
python -m unittest tests/test_company_storage.py # Run a specific test file
|
|
python -m unittest tests/test_company_jobs_sync.py # Run a specific test file
|
|
python -m unittest discover tests/ # Discover and run all tests
|
|
```
|
|
|
|
**Coverage:**
|
|
- No coverage tool configured (no `.coveragerc`, no `pytest-cov`, no coverage requirement enforced)
|
|
|
|
## Test File Organization
|
|
|
|
**Location:**
|
|
- Separate `tests/` directory at project root: `/Users/win/2025/AICoding/JobData/tests/`
|
|
- NOT co-located with source modules
|
|
|
|
**Naming:**
|
|
- Pattern: `test_{module_name}.py`
|
|
- Examples: `test_company_storage.py`, `test_company_jobs_sync.py`
|
|
|
|
**Structure:**
|
|
```
|
|
tests/
|
|
├── test_company_jobs_sync.py # Tests for app/services/company_jobs_sync.py
|
|
└── test_company_storage.py # Tests for app/services/company_storage.py
|
|
```
|
|
|
|
## Test Structure
|
|
|
|
**Suite Organization:**
|
|
```python
|
|
import unittest
|
|
|
|
from app.services.company_storage import extract_company_fields, normalize_company_id
|
|
|
|
|
|
class CompanyStorageTests(unittest.TestCase):
|
|
def test_normalize_qcwy_company_id(self):
|
|
self.assertEqual(normalize_company_id("qcwy", "co123"), "123")
|
|
self.assertEqual(normalize_company_id("qcwy", "123"), "123")
|
|
self.assertEqual(normalize_company_id("boss", "co123"), "co123")
|
|
|
|
def test_extract_boss_fields(self):
|
|
payload = { ... }
|
|
result = extract_company_fields("boss", payload, "boss-1")
|
|
self.assertEqual(result["source_company_id"], "boss-1")
|
|
|
|
|
|
if __name__ == "__main__":
|
|
unittest.main()
|
|
```
|
|
|
|
**Patterns:**
|
|
- One `TestCase` class per test file
|
|
- Test methods named `test_{what_is_being_tested}_{condition}` (e.g., `test_extract_boss_fields`, `test_normalize_qcwy_company_id`)
|
|
- No setUp/tearDown methods — tests are stateless and self-contained
|
|
- No fixtures or factories; inline dicts serve as test data
|
|
- Tests verify pure functions only (no database, no HTTP, no async)
|
|
|
|
## Mocking
|
|
|
|
**Framework:** None currently used (no `unittest.mock`, no `pytest-mock`)
|
|
|
|
**Current approach:** Tests only cover pure, synchronous functions that have no external dependencies. All existing tests call functions that:
|
|
- Accept plain dicts as input
|
|
- Return plain dicts or strings as output
|
|
- Have no database, network, or filesystem side effects
|
|
|
|
**What to Mock (when tests are added):**
|
|
- `clickhouse_connect.driver.AsyncClient` for repository tests
|
|
- `BossToken.filter(...)` ORM queries for service tests that check token loading
|
|
- `httpx` / `requests` for crawler service tests
|
|
- `asyncio.to_thread(...)` calls wrapping synchronous crawlers
|
|
|
|
**What NOT to Mock:**
|
|
- Pure data transformation functions (`extract_company_fields`, `normalize_company_id`, `_pick_first`, `_clean_text`)
|
|
- Enum definitions and schema validation logic
|
|
- URL pattern matching logic in `CleaningService`
|
|
|
|
## Fixtures and Factories
|
|
|
|
**Test Data:**
|
|
- Inline dicts constructed within each test method — no shared fixtures
|
|
- Platform-specific payloads mirror real API response shapes:
|
|
|
|
```python
|
|
# Boss payload structure (from test_company_storage.py)
|
|
payload = {
|
|
"zpData": {
|
|
"brandComInfoVO": {
|
|
"encryptBrandId": "boss-1",
|
|
"brandName": "Boss公司",
|
|
"industryName": "互联网",
|
|
"stageName": "B轮",
|
|
},
|
|
"companyFullInfoVO": {
|
|
"name": "Boss公司",
|
|
"typeName": "民营",
|
|
"cityName": "上海",
|
|
},
|
|
}
|
|
}
|
|
|
|
# Qcwy payload structure
|
|
payload = {
|
|
"coinfo": {
|
|
"coid": "123",
|
|
"coname": "前程公司",
|
|
"cotype": "民营",
|
|
"cosize": "500-999人",
|
|
},
|
|
"financingStage": {"name": "未融资"},
|
|
}
|
|
|
|
# Zhilian payload structure
|
|
payload = {
|
|
"data": {
|
|
"companyBase": {
|
|
"companyNumber": "zl-1",
|
|
"companyName": "智联公司",
|
|
"companyTypeName": "上市公司",
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Location:** Inline within test methods; no shared fixture file or factory module.
|
|
|
|
## Coverage
|
|
|
|
**Requirements:** None enforced — no coverage target, no CI gates
|
|
|
|
**Current coverage estimation:**
|
|
- `app/services/company_storage.py` — partially covered (extract functions and normalize_company_id)
|
|
- `app/services/company_jobs_sync.py` — partially covered (_extract_boss_jobs, _extract_qcwy_jobs, _extract_zhilian_jobs)
|
|
- All other modules — ZERO test coverage
|
|
|
|
**View Coverage (manual):**
|
|
```bash
|
|
pip install coverage
|
|
coverage run -m unittest discover tests/
|
|
coverage report
|
|
coverage html # generates htmlcov/index.html
|
|
```
|
|
|
|
## Test Types
|
|
|
|
**Unit Tests:**
|
|
- The only type present
|
|
- Scope: pure synchronous functions with no external dependencies
|
|
- Files: `tests/test_company_storage.py`, `tests/test_company_jobs_sync.py`
|
|
|
|
**Integration Tests:**
|
|
- Not present
|
|
- Needed for: API endpoint testing using `httpx.AsyncClient` with `TestClient` or `anyio`
|
|
- Critical gaps: `app/api/v1/job/job.py`, `app/api/v1/analytics.py`, `app/api/v1/keyword/keyword.py`
|
|
|
|
**E2E Tests:**
|
|
- Not present
|
|
- No E2E framework configured
|
|
|
|
## Critical Gaps
|
|
|
|
**Untested service logic (highest priority):**
|
|
|
|
1. `app/services/cleaning.py:CleaningService.clean_target_auto()` — URL platform detection logic (zhipin.com vs 51job.com vs zhaopin.com)
|
|
2. `app/services/cleaning.py:CleaningService.process_single_item()` — multi-branch dispatch based on `clean_type` and `platform`
|
|
3. `app/services/ingest/service.py:IngestService.store_batch()` — deduplication and batch insert flow
|
|
4. `app/services/ingest/dedup.py` — deduplication filter logic
|
|
5. `app/controllers/keyword.py:KeywordController.get_available()` — priority scheduling logic (partial > failed > fresh)
|
|
6. `app/repositories/clickhouse_repo.py:JobAnalyticsRepo` — query building and result mapping
|
|
|
|
**Untested URL parsing (medium priority):**
|
|
|
|
7. `app/services/cleaning.py:_process_boss_url()` — regex extraction of job/company ID from Boss URLs
|
|
8. `app/services/cleaning.py:_process_qcwy_url()` — regex extraction for qcwy URLs
|
|
9. `app/services/cleaning.py:_process_zhilian_url()` — regex extraction for zhilian URLs
|
|
|
|
**Untested API endpoints:**
|
|
|
|
10. All routes in `app/api/v1/` — no integration tests exist for any HTTP endpoints
|
|
|
|
## Recommended Test Setup (when adding tests)
|
|
|
|
**For async tests, add to `Pipfile [dev-packages]`:**
|
|
```toml
|
|
[dev-packages]
|
|
pytest = "*"
|
|
pytest-asyncio = "*"
|
|
anyio = {extras = ["trio"]}
|
|
httpx = "*" # already in packages, for AsyncClient
|
|
```
|
|
|
|
**Async test pattern (recommended):**
|
|
```python
|
|
import pytest
|
|
from httpx import AsyncClient
|
|
from app import app # FastAPI app instance
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_ingest_endpoint():
|
|
async with AsyncClient(app=app, base_url="http://test") as client:
|
|
response = await client.post("/api/v1/ingest/data/store", json={...})
|
|
assert response.status_code == 200
|
|
```
|
|
|
|
**Synchronous unit test pattern (existing):**
|
|
```python
|
|
import unittest
|
|
from app.services.company_storage import normalize_company_id
|
|
|
|
class MyTests(unittest.TestCase):
|
|
def test_normalize(self):
|
|
self.assertEqual(normalize_company_id("qcwy", "co123"), "123")
|
|
|
|
if __name__ == "__main__":
|
|
unittest.main()
|
|
```
|
|
|
|
---
|
|
|
|
*Testing analysis: 2026-03-21*
|