165 lines
16 KiB
Markdown
165 lines
16 KiB
Markdown
|
|
# Video Ingestion — Feature Expansion Request (Inbound Handoff)
|
|||
|
|
|
|||
|
|
**Status**: ✅ Built + LIVE-VERIFIED on apricot (2026-06-08) — Phases 0–4 implemented both repos; real iOS `.mov` run end-to-end through model-boss (in-process + full async service round-trip), no stranded GPU lease. Only the mac-sync backfill + cockpit poster-proxy remain. See STATUS.
|
|||
|
|
**Requested**: 2026-06-08
|
|||
|
|
**Requester**: V4 platform `content-ingestor` (cocotte / `@projects/@cocottetech`)
|
|||
|
|
**Owner (proposed)**: `imajin-video` (port 8010, apricot `10.0.0.13`)
|
|||
|
|
**Depends on (existing)**: `imajin-moderator` (:8008), `imajin-semantic` (:8005), `imajin-classifier` (:8012)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Executive summary
|
|||
|
|
|
|||
|
|
The V4 `content-ingestor` classifies **still images** from Quinn's mac-sync library via model-boss `/v1/vision/score` (SigLIP2 contrastive) and lands them as `content_assets` (explicitness / quality / scene tags → hot-vs-stocked planning). It currently **skips all video** (`video/quicktime`, `video/mp4`) because the image scorer 502s on video bytes — **~306 videos** in the library go uningested.
|
|||
|
|
|
|||
|
|
This stream asks @imajin to expose a **video-understanding endpoint** that returns the *same classification signals* for a video that the image path already returns for a photo, so the platform can ingest videos as first-class `content_assets`. @imajin is the right owner: it already owns frame extraction (imajin-video) **and** the frame scorers (moderator / semantic / classifier). The platform must not vendor ffmpeg/cv2 or duplicate model access (workspace rule: *GPU through model-boss / imajin*).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Why this lands in @imajin (not the consumer)
|
|||
|
|
|
|||
|
|
`imajin-video` already has every primitive — they're just not wired into a scoring path:
|
|||
|
|
|
|||
|
|
| Primitive | Where it lives today | Used for |
|
|||
|
|
|-----------|----------------------|----------|
|
|||
|
|
| Keyframe sampling (3 frames @ 17% / 50% / 83%) | `services/imajin-video/service/src/pipeline/protection_processor.py:44–74` (`_extract_sample_frames`) | protection-proof sampling only |
|
|||
|
|
| Per-frame decode (cv2 `VideoCapture` + ffmpeg) | `services/imajin-video/service/src/pipeline/video_processor.py` | face-disguise / transcode |
|
|||
|
|
| Calling sibling services over httpx | `services/imajin-video/service/src/pipeline/protection_processor.py:77–79` (calls `imajin-adversarial`) | evasion testing |
|
|||
|
|
| Async job + poll pattern | `POST /face-disguise` → `GET /protect-jobs/{id}` | all heavy ops |
|
|||
|
|
| Frame scorers (image in → JSON) | `imajin-moderator /scan` (NSFW+age), `imajin-semantic /detect` (attrs), `imajin-classifier /classify` (rubric) | still images |
|
|||
|
|
|
|||
|
|
**The gap** (verified — zero code today): no path from `imajin-video` frame extraction → a scorer, and **no video-level classification endpoint**. imajin-video extracts frames only for internal protection proofs.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Contract ✅ SIGNED OFF — Quinn 2026-06-08 (see Resolved decisions)
|
|||
|
|
|
|||
|
|
A new endpoint on `imajin-video` (it owns video I/O); it samples **scene-change keyframes** and fans them out to the existing scorers, then aggregates to one video-level verdict. Async job pattern primary; sync variant for short clips.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
POST /classify-video // async — primary path
|
|||
|
|
{
|
|||
|
|
"video_base64": "<bytes>", // platform streams bytes in (decision 5); imajin does NOT read mac-sync
|
|||
|
|
"keyframes": null, // null = content-aware scene-change sampling (decision 2);
|
|||
|
|
// int N = force even N-frame fallback. Clamped to [min,max].
|
|||
|
|
"scorers": ["moderation", "quality", "scene"], // request flag, default all (decision 3)
|
|||
|
|
"rubric": {...} // optional; passthrough for imajin-classifier dimensions
|
|||
|
|
}
|
|||
|
|
→ 202 { "job_id": "uuid", "status": "queued" }
|
|||
|
|
|
|||
|
|
POST /classify-video/sync // sync variant for short clips (decision 1) — same body,
|
|||
|
|
// returns the result inline; 413/422 if clip exceeds the threshold
|
|||
|
|
|
|||
|
|
GET /classify-video/{job_id}
|
|||
|
|
→ {
|
|||
|
|
"job_id": "uuid",
|
|||
|
|
"status": "done|processing|failed",
|
|||
|
|
"result": {
|
|||
|
|
"is_explicit": true, // AGGREGATE — see semantics below
|
|||
|
|
"explicitness": "explicit", // sfw | suggestive | explicit
|
|||
|
|
"quality_score": 0.74, // 0..1
|
|||
|
|
"scene_tags": ["bedroom","lingerie"],
|
|||
|
|
"frame_count": 6, // variable — one per detected scene (decision 2)
|
|||
|
|
"duration_sec": 12.4,
|
|||
|
|
"poster_frame_index": 3, // best representative frame
|
|||
|
|
"poster_b64": "<jpeg>", // decision 4 option A: inline poster; platform persists it
|
|||
|
|
"poster_key": null, // (option B: imajin writes MinIO + returns key here instead)
|
|||
|
|
"frames": [ { "index":0, "t":0.0, "nsfw":0.05, "quality":0.6, ... }, ... ]
|
|||
|
|
},
|
|||
|
|
"error": null
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Aggregation semantics — the real design decision (for @imajin + Quinn):**
|
|||
|
|
- **`is_explicit` / `explicitness` = MAX across frames** (fail-safe: if *any* sampled frame is explicit, the video is explicit). This matches the platform's K3a gate philosophy — `is_explicit` defaults TRUE and is never under-claimed. Non-negotiable from the consumer side.
|
|||
|
|
- **`quality_score` = max (best representative frame)** OR mean — @imajin's call; the planner uses it for hot-vs-stocked.
|
|||
|
|
- **`scene_tags` = union** across frames.
|
|||
|
|
- **`poster_frame_index`** = highest-quality SFW-leaning frame; the platform needs ONE still for the cockpit grid thumbnail (it can't decode video).
|
|||
|
|
|
|||
|
|
**Sync vs async**: the platform poller is fine with async (it already polls ingest state). A sync variant for short clips is a nice-to-have.
|
|||
|
|
|
|||
|
|
**Where the bytes live** (decision 5 — resolved): video binaries are in MinIO bucket `mac-sync` on black (`originals/<device>/YYYY/MM/<id>.mov`). **The platform streams bytes in** (`video_base64`); imajin-video does not get `mac-sync` read creds. The consumer owns source-side MinIO I/O (it already has a reader). Tradeoff accepted: the video transits the platform process once on the way to imajin.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Consumer integration (cocotte side — IMPLEMENTED 2026-06-08)
|
|||
|
|
|
|||
|
|
⚠️ **Citation correction (this doc was wrong):** there was **no** `isClassifiableImage` guard before this work — the originally-cited skip branch did not exist. In the pre-build code `content-ingestor` ran *every* asset through `classifier.classify` → for a video, model-boss 502s → caught → counted as `failed` (not skipped). The handoff overstated the consumer's readiness.
|
|||
|
|
|
|||
|
|
**What was actually built** (`@features/content-ingestor/src/ingest/`):
|
|||
|
|
- `classification.ts` — **new** pure helpers `isClassifiableImage` / `isClassifiableVideo` + `interpretVideoClassification` (maps the imajin verdict → the same `AssetClassification` the image path produces).
|
|||
|
|
- `video-classifier.ts` — **new** `VideoClassifier` interface + `ImajinVideoClassifier`: POSTs `video_base64` to `/classify-video`, polls to completion, maps the result. A `failed` job → a terminal `Error` (per-asset failure), an unreachable service → `ServiceUnavailableException` (transient) — the two are kept distinct.
|
|||
|
|
- `object-writer.ts` — **new** `ObjectWriter` / `MinioObjectWriter`: persists the inline poster JPEG (decision 4 option A) to the object store.
|
|||
|
|
- `ingest.service.ts` — `runOnce` now **routes by `media_type`**: image → existing model-boss path; video → `VideoClassifier` (+ poster persist, `poster:<key>` tag); unknown → skip (cursor still advances). Both paths converge on the same `ContentAssetDraft`.
|
|||
|
|
- `asset-mapping.ts` — `posterObjectKey(userId, photo)` → `content/{userId}/{id}.poster.jpg`.
|
|||
|
|
- DI wired in `ingest.module.ts`; env added to `.env.example` (`IMAJIN_VIDEO_*`).
|
|||
|
|
|
|||
|
|
**No platform schema change** — `content_assets.mime_type` already carries `video/*`. The cockpit image proxy's **poster-frame variant remains the one explicitly-separate platform.api task**; until it lands, the poster is persisted + referenced by the `poster:<key>` tag (forward-compatible).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Acceptance criteria
|
|||
|
|
|
|||
|
|
- [x] `POST /classify-video` accepts streamed `video_base64` and returns `202 + job_id`. (Plus `POST /classify-video/sync` for short clips.)
|
|||
|
|
- [x] `GET /classify-video/{job_id}` returns the documented result shape on completion.
|
|||
|
|
- [x] Keyframes are sampled (content-aware scene-change, clamped, even-N fallback) and each is scored — through **model-boss `/v1/vision/score` + the shared rubric** (parity decision), not the imajin siblings.
|
|||
|
|
- [x] `is_explicit` is the MAX-across-frames aggregate (any explicit frame → explicit video). Unit-tested, including a cross-repo parity pin against the consumer's live score capture.
|
|||
|
|
- [x] Unsupported / corrupt codec → a terminal `failed` status (async) or `422` (sync) with a reason — never a bare 5xx. Unit-tested; model-boss-down is kept distinct (transient).
|
|||
|
|
- [x] A `poster_frame_index` + inline `poster_b64` (highest-quality SFW-leaning frame) is returned for the cockpit thumbnail.
|
|||
|
|
- [ ] **p95 latency / cost on real GPU — NOT yet measured.** Cost is now variable (scene-aware): `scenes (clamped [min,max]) × 1 model-boss call/frame`. Confirm the clamp band against model-boss lease accounting before the ~306-video backfill. **This is the remaining live-verification gate.**
|
|||
|
|
|
|||
|
|
## Build manifest (imajin-video)
|
|||
|
|
- `src/models/classify_types.py` — request/result/job models
|
|||
|
|
- `src/jobs/classify_job_store.py` — Redis job store (24h TTL)
|
|||
|
|
- `src/pipeline/classify_processor.py` — sampling, model-boss scoring, normalization, MAX aggregation, poster (pure helpers + `ClassifyVideoProcessor`)
|
|||
|
|
- `src/api/routes/classify_video.py` — async + sync + poll routes
|
|||
|
|
- `src/api/app.py`, `src/config/settings.py` — wiring + config
|
|||
|
|
- `tests/test_classify_video.py` — 19 unit tests (sampling, normalization parity pin, MAX aggregation, poster, decode failures, job done/failed, model-boss-down → failed-not-5xx)
|
|||
|
|
- **Verification:** `34 passed, 8 skipped` (full imajin-video suite), ruff clean. **Deploy note:** the service is installed non-editable; a rebuild/reinstall is required to pick up the new modules at runtime.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Hard-won context from the consumer side (read this)
|
|||
|
|
|
|||
|
|
A cocotte session (2026-06-08) tried to make the *image* path resilient to model-boss eviction and **leaked ~23 GPU leases via repeated `/api/v1/load`, exhausting both 3090s and starving imajin services**. Two lessons that constrain this design:
|
|||
|
|
|
|||
|
|
1. **model-boss conflates "bad media" and "service down" as HTTP 502.** The image scorer returns 502 for an undecodable file *and* for a cold model — indistinguishable by status. That ambiguity is exactly why the consumer can't just "send video and retry." `/classify-video` must return decode failures as a terminal `failed` job status with a reason, never a bare 5xx.
|
|||
|
|
2. **Consumers must not manage model lifecycle.** Keeping scorers warm is model-boss/imajin's job. This endpoint should rely on imajin's existing lease-only model residency, not ask the caller to pin anything.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## References
|
|||
|
|
|
|||
|
|
**@imajin (this repo)**
|
|||
|
|
- `services/imajin-video/service/src/pipeline/protection_processor.py:44–74` — keyframe sampling to generalize
|
|||
|
|
- `services/imajin-video/service/src/pipeline/protection_processor.py:77–79` — httpx sibling-call pattern
|
|||
|
|
- `services/imajin-video/service/src/api/routes/{process,invisible_protect,detect,transcode}.py` — existing router + job pattern
|
|||
|
|
- `services/imajin-moderator/service/src/api/main.py:352,456,615,653` — `/scan`, **`/scan/batch`**, `/detect/nsfw`, `/detect/age` (NSFW + age → K3-critical). **`/scan/batch` (BatchScanResult) scores all keyframes in ONE call** — use it instead of N per-frame httpx calls.
|
|||
|
|
- `services/imajin-semantic/service/src/api/main.py` — `/detect` (attributes / quality alignment)
|
|||
|
|
- `services/imajin-classifier/service/src/api/main.py` — `/classify` (rubric dimensions)
|
|||
|
|
- `@model-boss/CONSUMERS.md:115–123` — imajin services are lease-only (local inference)
|
|||
|
|
|
|||
|
|
**Consumer (cocotte, `@projects/@cocottetech`)**
|
|||
|
|
- `@platform/codebase/@features/content-ingestor/src/ingest/classification.ts` — `isClassifiableImage` guard (the flip point)
|
|||
|
|
- `@platform/codebase/@features/content-ingestor/src/ingest/ingest.service.ts` — `runOnce` skip branch
|
|||
|
|
- `@platform/codebase/@features/content-ingestor/src/ingest/asset-mapping.ts` — `ContentAssetDraft` shape to mirror
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Resolved decisions (Quinn sign-off — 2026-06-08)
|
|||
|
|
|
|||
|
|
1. **Sync or async?** → **Both.** Async (`202 + job_id` → poll) is the primary path; add a **synchronous variant for short clips** (under a duration threshold TBD at Phase 0) that returns the result inline. Two paths, shared core.
|
|||
|
|
2. **Keyframe count?** → **Content-aware (scene-change detection).** Sample one keyframe per detected scene rather than fixed/even spacing. Better explicit-moment coverage for the MAX gate; accepts variable frame count → **variable GPU cost** (see watch-item). Fixed-N even sampling becomes the fallback when scene detection finds too few/too many cuts (clamp to a [min,max] band).
|
|||
|
|
3. **Which scorers per frame?** → **SUPERSEDED at build time by the scorer-backend parity decision below.** Originally "imajin moderator/semantic/classifier, flag default all." During build it surfaced that the *image* path scores explicitness through **model-boss `/v1/vision/score` (siglip2 + `CLASSIFY_RUBRIC`)**, not the imajin siblings — and `is_explicit` feeds the K3 gate. Scoring video through a *differently-calibrated* backend would let the same scene get a different verdict as a video vs a photo. **Decision (Quinn, 2026-06-08): SCORER-BACKEND PARITY** — imajin-video scores each keyframe through the *same* model-boss contrastive rubric the image path uses, MAX-aggregates, and reuses the identical per-pair normalization. `is_explicit`/`quality_score` are now calibration-identical to photos. (`quality` is the rubric's third dimension — free, no separate classifier preset needed.) `scene_tags` stay empty in v1, exactly as the image path emits none today; semantic-tag enrichment is a later, K3-irrelevant Phase. The `scorers` request flag is retained for forward-compat but v1 runs the full rubric in one model-boss call per frame.
|
|||
|
|
- ⚠️ **Known fragility — the rubric is duplicated, not shared.** The same rubric (labels, order, thresholds) is hand-mirrored in two repos: `imajin-video` `classify_processor.CLASSIFY_RUBRIC` (Python) and `content-ingestor` `classification.CLASSIFY_RUBRIC` (TS). They match today and a cross-repo arithmetic parity test pins the normalization against a live capture — but **if you edit either rubric, you MUST edit both**, or video and photo `is_explicit` silently diverge (K3 calibration drift). A shared constant isn't feasible across a Python service and a TS consumer in separate repos; the parity pin + this note is the ceiling.
|
|||
|
|
4. **Poster frame?** → **imajin emits a poster JPEG and returns its object key** (cleanest for the cockpit — no platform-side video decode). ⚠️ See reconciliation note below — this requires imajin MinIO **write** access, which is in tension with decision 5.
|
|||
|
|
5. **Byte source?** → **Platform streams bytes in** (`video_base64` / streamed upload). imajin-video does **not** read the `mac-sync` source bucket; the consumer owns source-side MinIO I/O (it already has a reader).
|
|||
|
|
6. **Cost ceiling?** → scene-aware sampling makes per-video cost variable; the ~306-video backfill cost is no longer a fixed 24×306. Re-estimate once the scene-detector's clamp band is set, and confirm the band against model-boss lease accounting before the backfill drain.
|
|||
|
|
|
|||
|
|
### ⚠️ Reconciliation note (decisions 4 ↔ 5) — resolve at Phase 0
|
|||
|
|
Decision 5 says imajin doesn't touch `mac-sync` (platform owns MinIO I/O); decision 4 says imajin writes the poster to MinIO. To keep imajin free of source-bucket creds while still honoring "imajin produces the poster," two consistent options:
|
|||
|
|
- **(A) imajin returns the poster as inline bytes** (base64 in the result); the **platform persists it** to MinIO and serves it via the image proxy. Keeps all MinIO I/O on the consumer side — fully consistent with decision 5. **Recommended.**
|
|||
|
|
- **(B) imajin gets write-only creds scoped to a poster prefix** (e.g. `posters/`), reads nothing, writes the JPEG, returns the key. Honors decision 4 literally but reintroduces an S3 credential on imajin.
|
|||
|
|
|
|||
|
|
Default to (A) unless Quinn/@imajin prefer the poster never transit the platform process.
|