- ARCH-04: job51 migrated to crawler_core (no old deps) - ARCH-05: zhilian migrated to crawler_core (no old deps) - 34 new mock tests (17 job51 + 17 zhilian) - Added _parse_zhilian_response custom parser for zhilian API format - Fixed POST Searcher _request() overrides for job51/zhilian - Full regression: 98 passed in 0.12s
213 lines
9.0 KiB
Markdown
213 lines
9.0 KiB
Markdown
# Coding Conventions
|
|
|
|
**Analysis Date:** 2026-03-21
|
|
|
|
## Naming Patterns
|
|
|
|
**Files:**
|
|
- Snake_case for all Python files: `company_storage.py`, `company_cleaner.py`, `clickhouse_repo.py`
|
|
- Private/internal modules prefixed with underscore: `_base.py`, `_boss_api.py`, `_boss_client.py`, `_boss_sign.py`, `_http_client.py`
|
|
- Platform-named service files: `boss.py`, `qcwy.py`, `zhilian.py` under `app/services/crawler/`
|
|
- Router files named after domain: `keyword.py`, `analytics.py`, `cleaning.py`
|
|
|
|
**Classes:**
|
|
- PascalCase throughout: `CleaningService`, `KeywordController`, `ClickHouseBaseRepo`, `JobAnalyticsRepo`
|
|
- Services: `{Domain}Service` — `BossService`, `QcwyService`, `ZhilianService`, `IngestService`, `AnalyticsService`
|
|
- Controllers: `{Domain}Controller` — `KeywordController`
|
|
- Repos: `{Domain}Repo` or `{Domain}BaseRepo` — `ClickHouseBaseRepo`, `JobAnalyticsRepo`
|
|
- Models (Tortoise ORM): `{Platform}{Entity}` — `BossKeyword`, `QcwyCompany`, `ZhilianCompany`
|
|
- Schemas (Pydantic): `{Entity}Base`, `{Entity}Create`, `{Entity}Update`, `{Entity}Out` — see `app/schemas/keyword.py`
|
|
|
|
**Functions and Methods:**
|
|
- Snake_case for all functions and methods: `get_available`, `report_page_progress`, `store_batch`, `build_insert_row`
|
|
- Private helpers prefixed with underscore: `_apply_proxy`, `_ensure_boss_token_loaded`, `_pick_first`, `_nested_get`, `_clean_text`, `_model_for_source`
|
|
- Async dependency factories follow pattern `get_{service/controller}()`: `get_ingest_service`, `get_analytics_service`, `get_keyword_controller`
|
|
|
|
**Variables:**
|
|
- Snake_case: `data_list`, `platform_type`, `check_duplicate`, `page_size`
|
|
- Module-level constants: UPPER_SNAKE_CASE — `COMPANY_SOURCES`, `QUEUE_TERMINAL_STATUSES`
|
|
- Class-level constants: UPPER_SNAKE_CASE prefixed `_` — `_TOKEN_REFRESH_INTERVAL = 3600`
|
|
|
|
**Types and Enums:**
|
|
- Enums use PascalCase class name, UPPER_SNAKE_CASE values: `PlatformType.BOSS`, `ChannelType.MINI`, `DataType.JOB`
|
|
- Enum values are lowercase strings matching URL slugs: `"boss"`, `"mini"`, `"job"` — see `app/schemas/ingest.py`
|
|
- Enums inherit from `(str, Enum)` enabling direct string comparison
|
|
|
|
## Code Style
|
|
|
|
**Formatting:**
|
|
- Tool: `black` v24.10.0
|
|
- Line length: 120 characters (set in `pyproject.toml` `[tool.black]` and `[tool.ruff]`)
|
|
- Target Python versions: 3.10, 3.11 (black), 3.13 (Pipfile)
|
|
|
|
**Linting:**
|
|
- Tool: `ruff` v0.9.1 (configured in `pyproject.toml`)
|
|
- Ignored rules: `F403` (star imports), `F405` (may be undefined from star import)
|
|
- Star imports from internal modules are allowed (used in `app/models/__init__.py`, `app/services/ingest/__init__.py`)
|
|
|
|
**Import Sorting:**
|
|
- Tool: `isort` v5.13.2
|
|
- No explicit isort config found; follows default ordering
|
|
|
|
## Import Organization
|
|
|
|
**Order:**
|
|
1. Standard library (`from __future__`, `os`, `re`, `typing`, `datetime`, `json`)
|
|
2. Third-party (`fastapi`, `pydantic`, `tortoise`, `loguru`, `clickhouse_connect`)
|
|
3. Internal app imports (`from app.core.`, `from app.models.`, `from app.services.`, `from app.schemas.`)
|
|
|
|
**Example from `app/api/v1/analytics.py`:**
|
|
```python
|
|
from typing import Optional
|
|
from datetime import datetime, date, timezone
|
|
from zoneinfo import ZoneInfo
|
|
|
|
from fastapi import APIRouter, Depends, Query
|
|
from app.core.clickhouse import clickhouse_manager
|
|
from app.services.analytics_service import AnalyticsService
|
|
from app.schemas.analytics import JobStatisticsResponse
|
|
```
|
|
|
|
**Path Aliases:**
|
|
- None; all imports use full `app.` prefix paths
|
|
- `from app.log import logger` is the canonical loguru import path
|
|
|
|
**Star Imports:**
|
|
- Used only in `__init__.py` re-export files: `from .admin import *` in `app/models/__init__.py`
|
|
- `# noqa: F401, F403` comments suppress lint warnings for intentional star imports
|
|
|
|
## Error Handling
|
|
|
|
**Patterns:**
|
|
- Services return `Dict[str, Any]` result objects with `"success"`, `"code"`, `"message"` fields instead of raising exceptions to callers
|
|
- Controllers return dict with `"code": 200/400/404` and `"message"` for all outcomes
|
|
- API route handlers do NOT use try/except — they rely on services returning structured results
|
|
- Service methods wrap low-level calls in `try/except Exception as e` and log then return `False` or error dict
|
|
|
|
**Service-level error handling example** (`app/services/cleaning.py`):
|
|
```python
|
|
except Exception as e:
|
|
logger.error(f"Error processing item {target}: {e}")
|
|
return {
|
|
"success": False,
|
|
"target": target,
|
|
"error": str(e),
|
|
"storage_status": "error",
|
|
"remote_sent": False
|
|
}
|
|
```
|
|
|
|
**Repository-level:** `ClickHouseBaseRepo` does not swallow exceptions; they propagate to the service layer.
|
|
|
|
**Auth exceptions:** `app/core/dependency.py` raises `HTTPException(status_code=401/403)` directly — the standard FastAPI pattern for auth failures.
|
|
|
|
## Logging
|
|
|
|
**Framework:** `loguru` v0.7.3
|
|
|
|
**Import:** `from app.log import logger` (centralized re-export) or `from loguru import logger` (direct)
|
|
|
|
**Patterns:**
|
|
- `logger.info(f"...")` for normal operation events
|
|
- `logger.warning(f"...")` for non-fatal recoverable issues (e.g., token not found, API soft failures)
|
|
- `logger.error(f"...")` for caught exceptions and operation failures
|
|
- F-string interpolation used consistently for message formatting
|
|
- No structured fields (no `logger.bind()` usage observed)
|
|
|
|
**Example:**
|
|
```python
|
|
logger.info(f"获取招聘详情: {job_id}")
|
|
logger.warning(f"Boss get_job_detail failed: {result.error}")
|
|
logger.error(f"批量插入失败: {e}")
|
|
```
|
|
|
|
## API Response Format
|
|
|
|
**Two response styles coexist:**
|
|
|
|
**Style 1 — Direct dict return** (most routes in new modules like `app/api/v1/job/job.py`, `app/api/v1/analytics.py`):
|
|
```python
|
|
return {"code": 200, "data": result, "message": "ok"}
|
|
```
|
|
|
|
**Style 2 — JSONResponse subclasses** (older RBAC routes, defined in `app/schemas/base.py`):
|
|
```python
|
|
Success(code=200, msg="OK", data=data)
|
|
Fail(code=400, msg="error message")
|
|
SuccessExtra(code=200, data=data, total=100, page=1, page_size=20)
|
|
```
|
|
|
|
**Paginated responses** include: `code`, `data` (list), `total`, `page`, `page_size`
|
|
|
|
## Comments
|
|
|
|
**When to Comment:**
|
|
- Docstrings on public methods describing purpose, not implementation: `"""获取可用关键词,优先返回断点续爬和失败重试的关键词"""`
|
|
- Inline comments for priority logic and algorithm steps: `# 优先级 1: 断点续爬 (partial)`
|
|
- Module-level docstrings for context: `"""Boss直聘 Service — 基于新算法文件的封装"""`
|
|
- `# noqa` comments for intentional lint suppressions
|
|
|
|
**JSDoc/TSDoc:**
|
|
- Not applicable (Python backend)
|
|
- Docstrings are brief single-line or short multi-line Chinese descriptions
|
|
|
|
## Function Design
|
|
|
|
**Size:** Functions tend to be 10-50 lines; service methods like `process_single_item` in `app/services/cleaning.py` grow to ~70 lines due to multi-platform dispatch
|
|
|
|
**Parameters:**
|
|
- Keyword arguments with defaults preferred for optional params
|
|
- Pydantic schemas used for HTTP request bodies (never raw dicts from router params)
|
|
- `Optional[str]` with `= None` default for optional parameters
|
|
|
|
**Return Values:**
|
|
- Services return `Dict[str, Any]` with consistent keys (`code`, `message`, `data`)
|
|
- Private helpers return `Optional[T]` or primitive types
|
|
- Async functions return awaitable results (no mixing of sync/async)
|
|
|
|
## Module Design
|
|
|
|
**Exports:**
|
|
- `app/models/__init__.py` uses `from .{module} import *` to flatten model imports
|
|
- Router modules export a single named router variable: `router = APIRouter(...)` or `{domain}_router = APIRouter(...)`
|
|
- Service classes are imported directly by name
|
|
|
|
**Barrel Files (`__init__.py`):**
|
|
- `app/models/__init__.py` — re-exports all model classes
|
|
- `app/services/ingest/__init__.py` — re-exports `IngestService` and config registrations
|
|
- `app/api/v1/__init__.py` — aggregates all routers into `v1_router`
|
|
|
|
**Dependency Injection:**
|
|
- FastAPI `Depends()` used for service/controller instantiation in route handlers
|
|
- Dependency factory functions named `get_{service}()` and defined in the same file as the router
|
|
- Shared auth dependencies: `DependAuth`, `DependPermission` in `app/core/dependency.py`
|
|
|
|
## Tortoise ORM Model Conventions
|
|
|
|
**Base class:** All models inherit from `app/models/base.py:BaseModel` (which extends `tortoise.models.Model` with `id = BigIntField(pk=True)`)
|
|
|
|
**Timestamp mixin:** `TimestampMixin` adds `created_at` (auto_now_add) and `updated_at` (auto_now) — applied via multiple inheritance
|
|
|
|
**Abstract base models:** Platform variants use abstract base + concrete subclasses:
|
|
```python
|
|
class BaseKeyword(Model): # abstract = True in Meta
|
|
...
|
|
class BossKeyword(BaseKeyword):
|
|
class Meta:
|
|
table = "boss_keyword"
|
|
```
|
|
|
|
**Field descriptions:** All fields include `description=` parameter for documentation
|
|
|
|
## Pydantic Schema Conventions
|
|
|
|
- All schemas inherit from `pydantic.BaseModel`
|
|
- All fields use `Field(...)` with `description=` for documentation
|
|
- Enums inherit from `(str, Enum)` for JSON serialization compatibility
|
|
- Output schemas include `class Config: from_attributes = True` to support ORM mode
|
|
- Validation patterns use `Field(..., pattern="^(boss|qcwy|zhilian)$")` for enum-like string fields
|
|
|
|
---
|
|
|
|
*Convention analysis: 2026-03-21*
|