2026-03-21 16:28:48 +08:00

288 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Pitfalls Research
**Domain:** Go/GORM/MySQL financial analytics — profit/loss aggregation functions
**Researched:** 2026-03-21
**Confidence:** HIGH (derived directly from existing codebase evidence + confirmed Go/MySQL behavior)
---
## Critical Pitfalls
### Pitfall 1: MySQL SUM with Division Returns Decimal, Not SIGNED Integer
**What goes wrong:**
When a `SUM()` expression includes any division operation (e.g., `SUM(amount * draw_count / total_count)`), MySQL returns the result as a `Decimal` type, not `BIGINT`. Scanning a Decimal into a Go `int64` field silently returns `0`. The dashboard code already hit this and left a comment documenting it.
Evidence from `dashboard_activity.go:174`:
```
// 注意: MySQL SUM()运算涉及除法时会返回Decimal类型需要Scan到float64
```
The fix used there: scan revenue stats into `float64`, then cast to `int64` in Go.
**Why it happens:**
MySQL promotes arithmetic involving division to Decimal to preserve fractional precision. GORM's `Scan()` does not coerce types — it matches Go field types exactly, and `int64` ≠ Decimal causes a silent zero.
**How to avoid:**
Wrap any `SUM` that contains division with `CAST(... AS SIGNED)` in the SQL itself. This forces integer rounding at the database layer and lets you scan directly into `int64`. The existing cost query in `dashboard_activity.go:237` already uses this pattern:
```sql
CAST(SUM(...) AS SIGNED) as total_cost
```
Use `CAST(... AS SIGNED)` on every aggregated column that involves division. Never scan division-containing SUM results directly into `int64` without the cast.
**Warning signs:**
- Aggregated monetary fields come back as `0` even when data exists
- Revenue stats are non-zero but cost stats are zero (or vice versa)
- Struct fields stay at their zero values after `Scan()`
**Phase to address:** Implementation phase — apply during every query that uses proportional allocation (e.g., distributing an order's revenue across multiple activities via `draw_count / total_count`).
---
### Pitfall 2: Double-Counting Revenue When One Order Spans Multiple Activities
**What goes wrong:**
A single order can result in draw logs across multiple activities (e.g., a user plays activity A and activity B in one checkout). If you `SUM(orders.actual_amount)` grouped by activity without proportional allocation, the full order amount is counted in every activity it touches. The existing dashboard already experienced this and added two-level subquery attribution.
Evidence from `dashboard_activity.go:197-212`: the fix was to compute `draw_count per (order, activity)` and `total_count per order` in two separate subqueries, then scale the order amount by the ratio `draw_count / total_count`.
**Why it happens:**
Aggregation joins `orders` to `activity_draw_logs` which is a one-to-many relationship. Without explicit proration, the order amount fans out to every matching activity row.
**How to avoid:**
Always attribute revenue using the subquery pattern:
```sql
JOIN (
SELECT order_id, activity_id, COUNT(*) as draw_count
FROM activity_draw_logs JOIN activity_issues ON ...
GROUP BY order_id, activity_id
) as order_activity_draws ON order_activity_draws.order_id = orders.id
JOIN (
SELECT order_id, COUNT(*) as total_count
FROM activity_draw_logs GROUP BY order_id
) as order_total_draws ON order_total_draws.order_id = orders.id
```
Then multiply: `orders.actual_amount * order_activity_draws.draw_count / order_total_draws.total_count`. For the user-dimension function, this pattern still applies if a user's order touches multiple issues.
**Warning signs:**
- Total revenue across all activities exceeds the sum of all actual order payments
- A user's computed spending is greater than what WeChat Pay received
- Profit rates are implausibly negative across many activities
**Phase to address:** Implementation phase — design the user-dimension and activity-dimension query structure before writing SQL.
---
### Pitfall 3: Mixing Game-Pass Orders into Cash Revenue (Calculation Mouth-Discrepancy)
**What goes wrong:**
Game-pass orders (次卡) have `actual_amount = 0` and `source_type = 4` (or `order_no LIKE 'GP%'` or remark containing `use_game_pass`). Including them in `SUM(actual_amount + discount_amount)` makes their "revenue" appear as zero, understating total income. Including them in cost without crediting their imputed value makes every game-pass activity show a loss.
The codebase defines three detection conditions in `internal/service/finance/profit_metrics.go:IsGamePassOrder`. These must all be checked — any single condition is insufficient because historical data uses different conventions.
**Why it happens:**
Game-pass orders are structurally identical to regular orders but have zero monetary value. Treating all orders uniformly by summing `actual_amount` misses the imputed value of the subscription the user already paid.
**How to avoid:**
Use strict mutual exclusion in SQL:
- If game-pass order: revenue = `draw_count * activity.price_draw`, discount = 0, cash = 0
- If cash/coupon order: revenue = `actual_amount + discount_amount`, game-pass value = 0
- Use `CASE WHEN (source_type=4 OR order_no LIKE 'GP%' OR (actual_amount=0 AND remark LIKE '%use_game_pass%')) THEN ... ELSE ...` in every SUM
Never add `actual_amount + discount_amount + game_pass_value` as if they are additive columns of the same thing. They are alternative values for the same economic event.
**Warning signs:**
- Activities with many game-pass players show profit rates near -100%
- Total platform revenue is suspiciously lower than WeChat Pay reports
- `SpendingPaidCoupon` and `SpendingGamePass` are both non-zero for the same order
**Phase to address:** Implementation phase — encode the mutual-exclusion rule in query construction helpers before writing any aggregate SQL.
---
### Pitfall 4: Silently Ignoring Scan Errors on Aggregation Queries
**What goes wrong:**
Several existing dashboard queries call `db.Table(...).Select(...).Scan(&stats)` without checking the returned error. If the query fails (schema mismatch, column rename, database failover), `stats` remains an empty slice, downstream computations produce zero results, and no error is returned to the caller. The data looks correct (all zeros) rather than erroring.
Evidence from `dashboard_activity.go:146-158``drawStats` scan has no `.Error` check. The pattern appears in multiple places throughout the dashboard handlers.
**Why it happens:**
GORM's method chaining makes it easy to forget error handling. The pattern `db.Table(...).Scan(&x)` is syntactically identical whether you check `.Error` or not. In exploratory handler code that was never tested, errors were skipped for brevity.
**How to avoid:**
The new `internal/service/finance/` package must check every query error:
```go
if err := db.Table(...).Scan(&result).Error; err != nil {
return nil, fmt.Errorf("profit_loss query failed: %w", err)
}
```
Service functions should return `error` as second return value — not swallow errors internally. The existing `profit_metrics.go` pure functions have no DB access and are fine; the DB-querying functions must propagate errors.
**Warning signs:**
- Function returns zero values with no error in tests against an empty SQLite db
- Aggregation results are uniformly zero across all parameters
- Schema changes (column renames, table renames) cause silent failures
**Phase to address:** Implementation phase — establish error-check convention in the first function written; testing phase — assert non-nil error on deliberately broken queries.
---
### Pitfall 5: Omitting Refunded Orders from Cost Calculation
**What goes wrong:**
Inventory items (`user_inventory`) awarded from a subsequently refunded order should be excluded from cost. If you compute cost by summing `user_inventory.value_cents` grouped by `activity_id` without filtering on `orders.status`, you count the cost of prizes from refunded orders but don't count their revenue — making the platform appear to have given away prizes for free.
The existing code in `dashboard_activity.go:250-251` already had to special-case this:
```go
Where("(orders.status = ? OR user_inventory.order_id = 0 OR user_inventory.order_id IS NULL)", 2)
```
Note the legacy data escape hatch: some old inventory rows have `order_id = 0` or NULL and cannot be filtered by order status. This must be preserved.
**Why it happens:**
`user_inventory` records are created when prizes are awarded, which happens before the refund window closes. Refunds do not delete inventory rows — they update `orders.status` to 4. Naive aggregation on `user_inventory` ignores order status entirely.
**How to avoid:**
Always join `orders` to `user_inventory` via `order_id` and include the legacy escape hatch:
```sql
LEFT JOIN orders ON orders.id = user_inventory.order_id
WHERE (orders.status = 2 OR user_inventory.order_id = 0 OR user_inventory.order_id IS NULL)
AND COALESCE(user_inventory.remark, '') NOT LIKE '%void%'
```
The `void` remark filter is also required — manually voided inventory entries should never count as platform cost.
**Warning signs:**
- Platform cost is higher than expected for activities with known refund activity
- Cost-side totals don't reconcile with accounting system data
- Test cases with a refunded order still show non-zero cost
**Phase to address:** Implementation phase — add a test case with a refunded order and verify cost = 0 for that order's prizes.
---
### Pitfall 6: Using Write DB (DbW) for Analytics Queries
**What goes wrong:**
The project has master-slave read-write splitting. Analytics queries that run on `GetDbW()` (master) instead of `GetDbR()` (replica) add latency to the write path, can block replication, and in the worst case cause master overload under concurrent analytics requests.
The CONCERNS.md already flags 113 direct `GetDbW()` calls in the handler layer. The pattern of bypassing the correct DB connection is established in the codebase and can propagate to new code.
**Why it happens:**
`GetDbR()` and `GetDbW()` look identical in usage. Developers copying from handler code that was written for writes will use `GetDbW()` by accident. The finance service package does not yet have established conventions.
**How to avoid:**
The new `internal/service/finance/` service must accept a `*gorm.DB` read-only handle at construction time (inject `repo.GetDbR()`), not a full repository. Document in the function signatures or struct fields that only the read replica is used:
```go
type ProfitLossService struct {
dbR *gorm.DB // read replica only — never use for writes
logger *zap.Logger
}
```
Never call `repo.GetDbW()` inside finance analytics functions.
**Warning signs:**
- MySQL master replication lag increases when analytics endpoint is called
- Write latency spikes during dashboard loads
- `GetDbW()` appears in `internal/service/finance/` source files
**Phase to address:** Implementation phase — inject read-only DB handle in constructor; testing phase — verify with a mock that only the read DB is called.
---
## Technical Debt Patterns
| Shortcut | Immediate Benefit | Long-term Cost | When Acceptable |
|----------|-------------------|----------------|-----------------|
| Scan revenue into `float64` instead of fixing SQL with `CAST(AS SIGNED)` | Avoids SQL rewrite | Floating-point rounding on monetary values (e.g., 0.1 + 0.2 ≠ 0.3 in IEEE 754) | Never for monetary fields — always use `CAST(AS SIGNED)` |
| In-memory sort + full table fetch for custom sort order | Simpler than `ORDER BY` with computed columns | Loads unbounded rows into Go heap when activity count grows | Only acceptable if total row count is bounded by pagination elsewhere |
| Hardcoding game-pass detection conditions in each query | Avoids abstraction overhead | Three different detection conditions must stay in sync across multiple queries | Never — centralize detection in `IsGamePassOrder()` already defined in `finance` package |
| Skip error check on `Scan()` | Fewer lines of code | Silent wrong data; impossible to distinguish "query returned zero rows" from "query failed" | Never for financial data |
| Use `AVG(multiplier)` across draws as the cost multiplier | One query instead of per-row | Hides per-order multiplier variance; a 2x card on one draw inflates cost for all draws in the group | Acceptable for summary statistics; not for per-order breakdowns |
---
## Integration Gotchas
| Integration | Common Mistake | Correct Approach |
|-------------|----------------|-----------------|
| GORM `Scan` into anonymous struct | Forgetting to qualify column names in SELECT causes ambiguous column error when multiple tables have `id`, `created_at`, etc. | Always alias computed columns explicitly: `SELECT orders.user_id as user_id`, not `SELECT user_id` |
| GORM raw SQL with `Raw()` + `Scan()` | Parameterized values passed in wrong order cause SQL to silently use zero values | Verify query with `db.Statement.SQL.String()` during development; test with non-trivial input values |
| MySQL `COALESCE` with nullable int columns | `COALESCE(NULL, 0)` works but `COALESCE(column, 0)` on a non-nullable column with value `0` returns `0``NULLIF` needed to distinguish "not set" from "explicitly zero" | Use `COALESCE(NULLIF(value_cents, 0), fallback_1, fallback_2, 0)` pattern already established in existing cost queries |
| Multiple ID lists in `WHERE IN (?)` with GORM | Passing an empty slice `[]int64{}` produces invalid SQL `WHERE id IN ()` in some GORM versions | Guard with `if len(ids) == 0 { return emptyResult, nil }` before building the query |
| Read replica lag | Querying replica immediately after a write (e.g., after seeding test data) can return stale results | In tests, use write DB handle or wait for sync; in production, this is acceptable for analytics |
---
## Performance Traps
| Trap | Symptoms | Prevention | When It Breaks |
|------|----------|------------|----------------|
| Fetching all activities before computing profit/loss (no predicate pushdown) | 100% CPU on `Find(&activities)`, slow response time | Apply all filters (status, name, date range) in the initial `query` before scanning, then pass `activityIDs` to subsequent queries | When activity count exceeds ~1,000 |
| Correlated subquery inside SUM for every row | Query time grows O(n²) with draw log volume | Pre-aggregate into a derived table subquery joined once, not per-row | When draw_logs table exceeds ~500K rows |
| No index on `activity_draw_logs.order_id` or `user_inventory.activity_id` | Sequential scan on every analytics query | Verify indexes exist with `SHOW INDEX FROM activity_draw_logs`; add composite index `(issue_id, order_id)` if missing | From day one on tables with writes |
| Loading all activities into memory for in-application sort | Memory spike on large result sets; no benefit if caller only wants top-10 | Accept this tradeoff only when total activities < 500; add a hard cap with an error if exceeded | When activity count exceeds ~500 |
| Querying `user_inventory` without `status IN (1, 3)` filter | Voided/cancelled inventory items inflate cost | Always filter: `WHERE user_inventory.status IN (1, 3)` | Immediately even small void counts distort cost |
---
## Security Mistakes
| Mistake | Risk | Prevention |
|---------|------|------------|
| Interpolating user-supplied `user_id` or `activity_id` into raw SQL string instead of parameterized query | SQL injection attacker can exfiltrate all financial data | Always use parameterized queries: `.Where("user_id IN ?", ids)` not `fmt.Sprintf("user_id IN (%s)", idsStr)` |
| Exposing raw profit/loss data without admin role check | Non-admin users can read platform margin data | The new service functions are Service layer callers (API handlers) must apply `RequireAdminRole()` middleware; document this requirement in the function's GoDoc |
| Logging query parameters that contain user IDs | User ID lists in error logs can be correlated with financial data | Log query failure with a count, not the full ID list: `"profit_loss query failed for %d users: %v"` |
---
## "Looks Done But Isn't" Checklist
- [ ] **Game-pass mutual exclusion:** Verify that `SpendingPaidCoupon` and `SpendingGamePass` are never both non-zero for the same order. Write a test case with a mixed-type order set.
- [ ] **Refunded order exclusion:** Add a test case where an order is refunded (status=4) and verify it contributes zero to both revenue and cost.
- [ ] **Legacy zero order_id:** Confirm inventory rows with `order_id = 0` are included in cost (not excluded by the orders JOIN). Add a test row with `order_id = 0` and verify it appears in cost.
- [ ] **Empty parameter handling:** Call both functions with nil/empty `userIDs` and nil/empty `activityID` verify they return all-data aggregation, not empty results or SQL errors.
- [ ] **All five asset types covered:** Points, coupons, item cards, physical products, fragments. Verify all five appear in the breakdown output. Missing one silently understates cost.
- [ ] **CAST on division SUM:** Open every query with a `/` operator in a SUM and confirm `CAST(... AS SIGNED)` wraps the entire expression.
- [ ] **Read-only DB used:** Grep for `GetDbW` inside `internal/service/finance/` result must be empty.
- [ ] **Error propagation:** Every `Scan()` call inside finance functions must have its `.Error` checked and returned to the caller.
---
## Recovery Strategies
| Pitfall | Recovery Cost | Recovery Steps |
|---------|---------------|----------------|
| Decimal-to-int64 silent zero | LOW | Add `CAST(AS SIGNED)` to affected SQL; rerun query no data migration needed |
| Revenue double-counting discovered post-launch | MEDIUM | Backfill correct totals by recomputing with fixed query over historical data; notify operators of corrected figures |
| Wrong DB handle (write instead of read) | LOW | Change constructor injection; no data impact |
| Missing refund exclusion | MEDIUM | Recompute affected period's profit/loss with corrected query; mark old reports as superseded |
| Silently swallowed errors causing wrong zeros | LOW-MEDIUM | Add error checks; add alerting on zero-result aggregations where data is expected; audit logs for the affected period |
---
## Pitfall-to-Phase Mapping
| Pitfall | Prevention Phase | Verification |
|---------|-----------------|--------------|
| Decimal/int64 scan mismatch | Implementation SQL design | Integration test: query with division-containing SUM, assert non-zero int64 result |
| Revenue double-counting | Implementation query structure design | Test: one order across two activities; assert sum of per-activity revenue equals order total |
| Game-pass mutual exclusion | Implementation use `IsGamePassOrder()` helper | Unit test: game-pass order contributes to `SpendingGamePass` only, not `SpendingPaidCoupon` |
| Ignored Scan errors | Implementation code review gate | Test: deliberately broken query (wrong table name); assert returned error is non-nil |
| Refunded order in cost | Implementation WHERE clause | Test: refunded order inventory; assert cost contribution is zero |
| Write DB used | Implementation constructor injection | Grep check in CI: `GetDbW` must not appear in `internal/service/finance/` |
| Missing LIMIT on supporting queries | Implementation query design | Load test with 1000 activities; verify response time stays under 2s |
---
## Sources
- `internal/api/admin/dashboard_activity.go` direct evidence of BUG FIX comments for Decimal/int64, double-counting, game-pass misclassification (lines 173-175, 274-275, 544-545)
- `internal/api/admin/dashboard_spending.go` evidence of multi-join aggregation patterns and game-pass CASE expressions
- `internal/service/finance/profit_metrics.go` `IsGamePassOrder()` three-condition detection; `ComputeProfit()` integer arithmetic; established pattern for cost multiplier
- `internal/service/finance/profit_metrics_test.go` existing test coverage confirming pure-function behavior
- `.planning/codebase/CONCERNS.md` flagged 113 `GetDbW()` calls in handler layer, silently swallowed errors in financial paths, missing error checks in `pay_refund_admin.go`
- Go `database/sql` specification `Scan()` does not coerce types; MySQL 8.x documentation SUM with division promotes to Decimal
---
*Pitfalls research for: Go/GORM/MySQL profit/loss analytics (Bindbox Game)*
*Researched: 2026-03-21*