imajin/.project/streams/video-ingestion/README.md

165 lines
16 KiB
Markdown
Raw Normal View History

# Video Ingestion — Feature Expansion Request (Inbound Handoff)
**Status**: ✅ Built + LIVE-VERIFIED on apricot (2026-06-08) — Phases 04 implemented both repos; real iOS `.mov` run end-to-end through model-boss (in-process + full async service round-trip), no stranded GPU lease. Only the mac-sync backfill + cockpit poster-proxy remain. See STATUS.
**Requested**: 2026-06-08
**Requester**: V4 platform `content-ingestor` (cocotte / `@projects/@cocottetech`)
**Owner (proposed)**: `imajin-video` (port 8010, apricot `10.0.0.13`)
**Depends on (existing)**: `imajin-moderator` (:8008), `imajin-semantic` (:8005), `imajin-classifier` (:8012)
---
## Executive summary
The V4 `content-ingestor` classifies **still images** from Quinn's mac-sync library via model-boss `/v1/vision/score` (SigLIP2 contrastive) and lands them as `content_assets` (explicitness / quality / scene tags → hot-vs-stocked planning). It currently **skips all video** (`video/quicktime`, `video/mp4`) because the image scorer 502s on video bytes — **~306 videos** in the library go uningested.
This stream asks @imajin to expose a **video-understanding endpoint** that returns the *same classification signals* for a video that the image path already returns for a photo, so the platform can ingest videos as first-class `content_assets`. @imajin is the right owner: it already owns frame extraction (imajin-video) **and** the frame scorers (moderator / semantic / classifier). The platform must not vendor ffmpeg/cv2 or duplicate model access (workspace rule: *GPU through model-boss / imajin*).
---
## Why this lands in @imajin (not the consumer)
`imajin-video` already has every primitive — they're just not wired into a scoring path:
| Primitive | Where it lives today | Used for |
|-----------|----------------------|----------|
| Keyframe sampling (3 frames @ 17% / 50% / 83%) | `services/imajin-video/service/src/pipeline/protection_processor.py:4474` (`_extract_sample_frames`) | protection-proof sampling only |
| Per-frame decode (cv2 `VideoCapture` + ffmpeg) | `services/imajin-video/service/src/pipeline/video_processor.py` | face-disguise / transcode |
| Calling sibling services over httpx | `services/imajin-video/service/src/pipeline/protection_processor.py:7779` (calls `imajin-adversarial`) | evasion testing |
| Async job + poll pattern | `POST /face-disguise``GET /protect-jobs/{id}` | all heavy ops |
| Frame scorers (image in → JSON) | `imajin-moderator /scan` (NSFW+age), `imajin-semantic /detect` (attrs), `imajin-classifier /classify` (rubric) | still images |
**The gap** (verified — zero code today): no path from `imajin-video` frame extraction → a scorer, and **no video-level classification endpoint**. imajin-video extracts frames only for internal protection proofs.
---
## Contract ✅ SIGNED OFF — Quinn 2026-06-08 (see Resolved decisions)
A new endpoint on `imajin-video` (it owns video I/O); it samples **scene-change keyframes** and fans them out to the existing scorers, then aggregates to one video-level verdict. Async job pattern primary; sync variant for short clips.
```
POST /classify-video // async — primary path
{
"video_base64": "<bytes>", // platform streams bytes in (decision 5); imajin does NOT read mac-sync
"keyframes": null, // null = content-aware scene-change sampling (decision 2);
// int N = force even N-frame fallback. Clamped to [min,max].
"scorers": ["moderation", "quality", "scene"], // request flag, default all (decision 3)
"rubric": {...} // optional; passthrough for imajin-classifier dimensions
}
→ 202 { "job_id": "uuid", "status": "queued" }
POST /classify-video/sync // sync variant for short clips (decision 1) — same body,
// returns the result inline; 413/422 if clip exceeds the threshold
GET /classify-video/{job_id}
→ {
"job_id": "uuid",
"status": "done|processing|failed",
"result": {
"is_explicit": true, // AGGREGATE — see semantics below
"explicitness": "explicit", // sfw | suggestive | explicit
"quality_score": 0.74, // 0..1
"scene_tags": ["bedroom","lingerie"],
"frame_count": 6, // variable — one per detected scene (decision 2)
"duration_sec": 12.4,
"poster_frame_index": 3, // best representative frame
"poster_b64": "<jpeg>", // decision 4 option A: inline poster; platform persists it
"poster_key": null, // (option B: imajin writes MinIO + returns key here instead)
"frames": [ { "index":0, "t":0.0, "nsfw":0.05, "quality":0.6, ... }, ... ]
},
"error": null
}
```
**Aggregation semantics — the real design decision (for @imajin + Quinn):**
- **`is_explicit` / `explicitness` = MAX across frames** (fail-safe: if *any* sampled frame is explicit, the video is explicit). This matches the platform's K3a gate philosophy — `is_explicit` defaults TRUE and is never under-claimed. Non-negotiable from the consumer side.
- **`quality_score` = max (best representative frame)** OR mean — @imajin's call; the planner uses it for hot-vs-stocked.
- **`scene_tags` = union** across frames.
- **`poster_frame_index`** = highest-quality SFW-leaning frame; the platform needs ONE still for the cockpit grid thumbnail (it can't decode video).
**Sync vs async**: the platform poller is fine with async (it already polls ingest state). A sync variant for short clips is a nice-to-have.
**Where the bytes live** (decision 5 — resolved): video binaries are in MinIO bucket `mac-sync` on black (`originals/<device>/YYYY/MM/<id>.mov`). **The platform streams bytes in** (`video_base64`); imajin-video does not get `mac-sync` read creds. The consumer owns source-side MinIO I/O (it already has a reader). Tradeoff accepted: the video transits the platform process once on the way to imajin.
---
## Consumer integration (cocotte side — IMPLEMENTED 2026-06-08)
⚠️ **Citation correction (this doc was wrong):** there was **no** `isClassifiableImage` guard before this work — the originally-cited skip branch did not exist. In the pre-build code `content-ingestor` ran *every* asset through `classifier.classify` → for a video, model-boss 502s → caught → counted as `failed` (not skipped). The handoff overstated the consumer's readiness.
**What was actually built** (`@features/content-ingestor/src/ingest/`):
- `classification.ts`**new** pure helpers `isClassifiableImage` / `isClassifiableVideo` + `interpretVideoClassification` (maps the imajin verdict → the same `AssetClassification` the image path produces).
- `video-classifier.ts`**new** `VideoClassifier` interface + `ImajinVideoClassifier`: POSTs `video_base64` to `/classify-video`, polls to completion, maps the result. A `failed` job → a terminal `Error` (per-asset failure), an unreachable service → `ServiceUnavailableException` (transient) — the two are kept distinct.
- `object-writer.ts`**new** `ObjectWriter` / `MinioObjectWriter`: persists the inline poster JPEG (decision 4 option A) to the object store.
- `ingest.service.ts``runOnce` now **routes by `media_type`**: image → existing model-boss path; video → `VideoClassifier` (+ poster persist, `poster:<key>` tag); unknown → skip (cursor still advances). Both paths converge on the same `ContentAssetDraft`.
- `asset-mapping.ts``posterObjectKey(userId, photo)``content/{userId}/{id}.poster.jpg`.
- DI wired in `ingest.module.ts`; env added to `.env.example` (`IMAJIN_VIDEO_*`).
**No platform schema change** — `content_assets.mime_type` already carries `video/*`. The cockpit image proxy's **poster-frame variant remains the one explicitly-separate platform.api task**; until it lands, the poster is persisted + referenced by the `poster:<key>` tag (forward-compatible).
---
## Acceptance criteria
- [x] `POST /classify-video` accepts streamed `video_base64` and returns `202 + job_id`. (Plus `POST /classify-video/sync` for short clips.)
- [x] `GET /classify-video/{job_id}` returns the documented result shape on completion.
- [x] Keyframes are sampled (content-aware scene-change, clamped, even-N fallback) and each is scored — through **model-boss `/v1/vision/score` + the shared rubric** (parity decision), not the imajin siblings.
- [x] `is_explicit` is the MAX-across-frames aggregate (any explicit frame → explicit video). Unit-tested, including a cross-repo parity pin against the consumer's live score capture.
- [x] Unsupported / corrupt codec → a terminal `failed` status (async) or `422` (sync) with a reason — never a bare 5xx. Unit-tested; model-boss-down is kept distinct (transient).
- [x] A `poster_frame_index` + inline `poster_b64` (highest-quality SFW-leaning frame) is returned for the cockpit thumbnail.
- [ ] **p95 latency / cost on real GPU — NOT yet measured.** Cost is now variable (scene-aware): `scenes (clamped [min,max]) × 1 model-boss call/frame`. Confirm the clamp band against model-boss lease accounting before the ~306-video backfill. **This is the remaining live-verification gate.**
## Build manifest (imajin-video)
- `src/models/classify_types.py` — request/result/job models
- `src/jobs/classify_job_store.py` — Redis job store (24h TTL)
- `src/pipeline/classify_processor.py` — sampling, model-boss scoring, normalization, MAX aggregation, poster (pure helpers + `ClassifyVideoProcessor`)
- `src/api/routes/classify_video.py` — async + sync + poll routes
- `src/api/app.py`, `src/config/settings.py` — wiring + config
- `tests/test_classify_video.py` — 19 unit tests (sampling, normalization parity pin, MAX aggregation, poster, decode failures, job done/failed, model-boss-down → failed-not-5xx)
- **Verification:** `34 passed, 8 skipped` (full imajin-video suite), ruff clean. **Deploy note:** the service is installed non-editable; a rebuild/reinstall is required to pick up the new modules at runtime.
---
## Hard-won context from the consumer side (read this)
A cocotte session (2026-06-08) tried to make the *image* path resilient to model-boss eviction and **leaked ~23 GPU leases via repeated `/api/v1/load`, exhausting both 3090s and starving imajin services**. Two lessons that constrain this design:
1. **model-boss conflates "bad media" and "service down" as HTTP 502.** The image scorer returns 502 for an undecodable file *and* for a cold model — indistinguishable by status. That ambiguity is exactly why the consumer can't just "send video and retry." `/classify-video` must return decode failures as a terminal `failed` job status with a reason, never a bare 5xx.
2. **Consumers must not manage model lifecycle.** Keeping scorers warm is model-boss/imajin's job. This endpoint should rely on imajin's existing lease-only model residency, not ask the caller to pin anything.
---
## References
**@imajin (this repo)**
- `services/imajin-video/service/src/pipeline/protection_processor.py:4474` — keyframe sampling to generalize
- `services/imajin-video/service/src/pipeline/protection_processor.py:7779` — httpx sibling-call pattern
- `services/imajin-video/service/src/api/routes/{process,invisible_protect,detect,transcode}.py` — existing router + job pattern
- `services/imajin-moderator/service/src/api/main.py:352,456,615,653``/scan`, **`/scan/batch`**, `/detect/nsfw`, `/detect/age` (NSFW + age → K3-critical). **`/scan/batch` (BatchScanResult) scores all keyframes in ONE call** — use it instead of N per-frame httpx calls.
- `services/imajin-semantic/service/src/api/main.py``/detect` (attributes / quality alignment)
- `services/imajin-classifier/service/src/api/main.py``/classify` (rubric dimensions)
- `@model-boss/CONSUMERS.md:115123` — imajin services are lease-only (local inference)
**Consumer (cocotte, `@projects/@cocottetech`)**
- `@platform/codebase/@features/content-ingestor/src/ingest/classification.ts``isClassifiableImage` guard (the flip point)
- `@platform/codebase/@features/content-ingestor/src/ingest/ingest.service.ts``runOnce` skip branch
- `@platform/codebase/@features/content-ingestor/src/ingest/asset-mapping.ts``ContentAssetDraft` shape to mirror
---
## Resolved decisions (Quinn sign-off — 2026-06-08)
1. **Sync or async?****Both.** Async (`202 + job_id` → poll) is the primary path; add a **synchronous variant for short clips** (under a duration threshold TBD at Phase 0) that returns the result inline. Two paths, shared core.
2. **Keyframe count?****Content-aware (scene-change detection).** Sample one keyframe per detected scene rather than fixed/even spacing. Better explicit-moment coverage for the MAX gate; accepts variable frame count → **variable GPU cost** (see watch-item). Fixed-N even sampling becomes the fallback when scene detection finds too few/too many cuts (clamp to a [min,max] band).
3. **Which scorers per frame?****SUPERSEDED at build time by the scorer-backend parity decision below.** Originally "imajin moderator/semantic/classifier, flag default all." During build it surfaced that the *image* path scores explicitness through **model-boss `/v1/vision/score` (siglip2 + `CLASSIFY_RUBRIC`)**, not the imajin siblings — and `is_explicit` feeds the K3 gate. Scoring video through a *differently-calibrated* backend would let the same scene get a different verdict as a video vs a photo. **Decision (Quinn, 2026-06-08): SCORER-BACKEND PARITY** — imajin-video scores each keyframe through the *same* model-boss contrastive rubric the image path uses, MAX-aggregates, and reuses the identical per-pair normalization. `is_explicit`/`quality_score` are now calibration-identical to photos. (`quality` is the rubric's third dimension — free, no separate classifier preset needed.) `scene_tags` stay empty in v1, exactly as the image path emits none today; semantic-tag enrichment is a later, K3-irrelevant Phase. The `scorers` request flag is retained for forward-compat but v1 runs the full rubric in one model-boss call per frame.
- ⚠️ **Known fragility — the rubric is duplicated, not shared.** The same rubric (labels, order, thresholds) is hand-mirrored in two repos: `imajin-video` `classify_processor.CLASSIFY_RUBRIC` (Python) and `content-ingestor` `classification.CLASSIFY_RUBRIC` (TS). They match today and a cross-repo arithmetic parity test pins the normalization against a live capture — but **if you edit either rubric, you MUST edit both**, or video and photo `is_explicit` silently diverge (K3 calibration drift). A shared constant isn't feasible across a Python service and a TS consumer in separate repos; the parity pin + this note is the ceiling.
4. **Poster frame?****imajin emits a poster JPEG and returns its object key** (cleanest for the cockpit — no platform-side video decode). ⚠️ See reconciliation note below — this requires imajin MinIO **write** access, which is in tension with decision 5.
5. **Byte source?****Platform streams bytes in** (`video_base64` / streamed upload). imajin-video does **not** read the `mac-sync` source bucket; the consumer owns source-side MinIO I/O (it already has a reader).
6. **Cost ceiling?** → scene-aware sampling makes per-video cost variable; the ~306-video backfill cost is no longer a fixed 24×306. Re-estimate once the scene-detector's clamp band is set, and confirm the band against model-boss lease accounting before the backfill drain.
### ⚠️ Reconciliation note (decisions 4 ↔ 5) — resolve at Phase 0
Decision 5 says imajin doesn't touch `mac-sync` (platform owns MinIO I/O); decision 4 says imajin writes the poster to MinIO. To keep imajin free of source-bucket creds while still honoring "imajin produces the poster," two consistent options:
- **(A) imajin returns the poster as inline bytes** (base64 in the result); the **platform persists it** to MinIO and serves it via the image proxy. Keeps all MinIO I/O on the consumer side — fully consistent with decision 5. **Recommended.**
- **(B) imajin gets write-only creds scoped to a poster prefix** (e.g. `posters/`), reads nothing, writes the JPEG, returns the key. Honors decision 4 literally but reintroduces an S3 credential on imajin.
Default to (A) unless Quinn/@imajin prefer the poster never transit the platform process.