# Testing Patterns **Analysis Date:** 2026-03-21 ## Test Framework **Runner:** - `unittest` (Python standard library) — the only test framework currently in use - No pytest, no pytest-anyio, no httpx.AsyncClient test harness configured - No test-specific dependencies in `Pipfile` `[dev-packages]` (section is empty) **Assertion Library:** - `unittest.TestCase` assertion methods: `assertEqual`, `assertIsNone`, etc. **Run Commands:** ```bash python -m unittest tests/test_company_storage.py # Run a specific test file python -m unittest tests/test_company_jobs_sync.py # Run a specific test file python -m unittest discover tests/ # Discover and run all tests ``` **Coverage:** - No coverage tool configured (no `.coveragerc`, no `pytest-cov`, no coverage requirement enforced) ## Test File Organization **Location:** - Separate `tests/` directory at project root: `/Users/win/2025/AICoding/JobData/tests/` - NOT co-located with source modules **Naming:** - Pattern: `test_{module_name}.py` - Examples: `test_company_storage.py`, `test_company_jobs_sync.py` **Structure:** ``` tests/ ├── test_company_jobs_sync.py # Tests for app/services/company_jobs_sync.py └── test_company_storage.py # Tests for app/services/company_storage.py ``` ## Test Structure **Suite Organization:** ```python import unittest from app.services.company_storage import extract_company_fields, normalize_company_id class CompanyStorageTests(unittest.TestCase): def test_normalize_qcwy_company_id(self): self.assertEqual(normalize_company_id("qcwy", "co123"), "123") self.assertEqual(normalize_company_id("qcwy", "123"), "123") self.assertEqual(normalize_company_id("boss", "co123"), "co123") def test_extract_boss_fields(self): payload = { ... } result = extract_company_fields("boss", payload, "boss-1") self.assertEqual(result["source_company_id"], "boss-1") if __name__ == "__main__": unittest.main() ``` **Patterns:** - One `TestCase` class per test file - Test methods named `test_{what_is_being_tested}_{condition}` (e.g., `test_extract_boss_fields`, `test_normalize_qcwy_company_id`) - No setUp/tearDown methods — tests are stateless and self-contained - No fixtures or factories; inline dicts serve as test data - Tests verify pure functions only (no database, no HTTP, no async) ## Mocking **Framework:** None currently used (no `unittest.mock`, no `pytest-mock`) **Current approach:** Tests only cover pure, synchronous functions that have no external dependencies. All existing tests call functions that: - Accept plain dicts as input - Return plain dicts or strings as output - Have no database, network, or filesystem side effects **What to Mock (when tests are added):** - `clickhouse_connect.driver.AsyncClient` for repository tests - `BossToken.filter(...)` ORM queries for service tests that check token loading - `httpx` / `requests` for crawler service tests - `asyncio.to_thread(...)` calls wrapping synchronous crawlers **What NOT to Mock:** - Pure data transformation functions (`extract_company_fields`, `normalize_company_id`, `_pick_first`, `_clean_text`) - Enum definitions and schema validation logic - URL pattern matching logic in `CleaningService` ## Fixtures and Factories **Test Data:** - Inline dicts constructed within each test method — no shared fixtures - Platform-specific payloads mirror real API response shapes: ```python # Boss payload structure (from test_company_storage.py) payload = { "zpData": { "brandComInfoVO": { "encryptBrandId": "boss-1", "brandName": "Boss公司", "industryName": "互联网", "stageName": "B轮", }, "companyFullInfoVO": { "name": "Boss公司", "typeName": "民营", "cityName": "上海", }, } } # Qcwy payload structure payload = { "coinfo": { "coid": "123", "coname": "前程公司", "cotype": "民营", "cosize": "500-999人", }, "financingStage": {"name": "未融资"}, } # Zhilian payload structure payload = { "data": { "companyBase": { "companyNumber": "zl-1", "companyName": "智联公司", "companyTypeName": "上市公司", } } } ``` **Location:** Inline within test methods; no shared fixture file or factory module. ## Coverage **Requirements:** None enforced — no coverage target, no CI gates **Current coverage estimation:** - `app/services/company_storage.py` — partially covered (extract functions and normalize_company_id) - `app/services/company_jobs_sync.py` — partially covered (_extract_boss_jobs, _extract_qcwy_jobs, _extract_zhilian_jobs) - All other modules — ZERO test coverage **View Coverage (manual):** ```bash pip install coverage coverage run -m unittest discover tests/ coverage report coverage html # generates htmlcov/index.html ``` ## Test Types **Unit Tests:** - The only type present - Scope: pure synchronous functions with no external dependencies - Files: `tests/test_company_storage.py`, `tests/test_company_jobs_sync.py` **Integration Tests:** - Not present - Needed for: API endpoint testing using `httpx.AsyncClient` with `TestClient` or `anyio` - Critical gaps: `app/api/v1/job/job.py`, `app/api/v1/analytics.py`, `app/api/v1/keyword/keyword.py` **E2E Tests:** - Not present - No E2E framework configured ## Critical Gaps **Untested service logic (highest priority):** 1. `app/services/cleaning.py:CleaningService.clean_target_auto()` — URL platform detection logic (zhipin.com vs 51job.com vs zhaopin.com) 2. `app/services/cleaning.py:CleaningService.process_single_item()` — multi-branch dispatch based on `clean_type` and `platform` 3. `app/services/ingest/service.py:IngestService.store_batch()` — deduplication and batch insert flow 4. `app/services/ingest/dedup.py` — deduplication filter logic 5. `app/controllers/keyword.py:KeywordController.get_available()` — priority scheduling logic (partial > failed > fresh) 6. `app/repositories/clickhouse_repo.py:JobAnalyticsRepo` — query building and result mapping **Untested URL parsing (medium priority):** 7. `app/services/cleaning.py:_process_boss_url()` — regex extraction of job/company ID from Boss URLs 8. `app/services/cleaning.py:_process_qcwy_url()` — regex extraction for qcwy URLs 9. `app/services/cleaning.py:_process_zhilian_url()` — regex extraction for zhilian URLs **Untested API endpoints:** 10. All routes in `app/api/v1/` — no integration tests exist for any HTTP endpoints ## Recommended Test Setup (when adding tests) **For async tests, add to `Pipfile [dev-packages]`:** ```toml [dev-packages] pytest = "*" pytest-asyncio = "*" anyio = {extras = ["trio"]} httpx = "*" # already in packages, for AsyncClient ``` **Async test pattern (recommended):** ```python import pytest from httpx import AsyncClient from app import app # FastAPI app instance @pytest.mark.asyncio async def test_ingest_endpoint(): async with AsyncClient(app=app, base_url="http://test") as client: response = await client.post("/api/v1/ingest/data/store", json={...}) assert response.status_code == 200 ``` **Synchronous unit test pattern (existing):** ```python import unittest from app.services.company_storage import normalize_company_id class MyTests(unittest.TestCase): def test_normalize(self): self.assertEqual(normalize_company_id("qcwy", "co123"), "123") if __name__ == "__main__": unittest.main() ``` --- *Testing analysis: 2026-03-21*