win 3d202c3486 feat(05): data pipeline optimization (DATA-01, DATA-04)
Plan 01 - DATA-01: 30-day window dedup fix:
- dedup.py: both single-field and double-field SQL queries now include
  AND created_at > now() - INTERVAL 30 DAY
- tests/ingest/test_dedup.py: 6 mock tests validating 30-day window

Plan 02 - DATA-04: company vs search job channel separation:
- schemas/ingest.py: ChannelType.COMPANY = 'company'
- configs/boss.py: register channel='company' config
- configs/qcwy.py: register channel='company' config
- configs/zhilian.py: register channel='company' config
- company_jobs_sync.py: store_batch(..., 'mini', ...) → (..., 'company', ...)

DATA-02: confirmed already complete (job.py has /data/batch-async endpoint)
DATA-03: confirmed already complete (company_cleaner.py full pipeline)

Full regression: 112 passed (106 existing + 6 new)
2026-03-21 19:50:06 +08:00
up
2026-01-24 17:07:34 +08:00
2026-03-21 17:00:12 +08:00

export DOCKER_DEFAULT_PLATFORM=linux/amd64 docker build -t zfc931912343/admin-crawler:v2.1 . docker push zfc931912343/admin-crawler:v2.1

docker build -t zfc931912343/boss-crawler:v1 . docker push zfc931912343/boss-crawler:v1

sudo docker rm -f admin-crawler &&sudo docker run -d --restart=always --name=admin-crawler --log-driver=json-file --log-opt max-size=10m --log-opt max-file=7 -p 9999:80 nbg2akd8w5diy8.xuanyuan.run/zfc931912343/admin-crawler:v1.5

docker run -d
--name mysql-server
--restart always
-p 3306:3306
-v /opt/mysql/data:/var/lib/mysql
-e MYSQL_ROOT_PASSWORD=jobdata123
-e MYSQL_DATABASE=job_data
mysql:8.0
--character-set-server=utf8mb4
--collation-server=utf8mb4_unicode_ci

Description
No description provided
Readme MIT 15 MiB
Languages
Python 69.3%
Vue 22.8%
JavaScript 6.7%
Dockerfile 0.3%
Makefile 0.3%
Other 0.6%