16 KiB
Video Ingestion — Feature Expansion Request (Inbound Handoff)
Status: ✅ Built + LIVE-VERIFIED on apricot (2026-06-08) — Phases 0–4 implemented both repos; real iOS .mov run end-to-end through model-boss (in-process + full async service round-trip), no stranded GPU lease. Only the mac-sync backfill + cockpit poster-proxy remain. See STATUS.
Requested: 2026-06-08
Requester: V4 platform content-ingestor (cocotte / @projects/@cocottetech)
Owner (proposed): imajin-video (port 8010, apricot 10.0.0.13)
Depends on (existing): imajin-moderator (:8008), imajin-semantic (:8005), imajin-classifier (:8012)
Executive summary
The V4 content-ingestor classifies still images from Quinn's mac-sync library via model-boss /v1/vision/score (SigLIP2 contrastive) and lands them as content_assets (explicitness / quality / scene tags → hot-vs-stocked planning). It currently skips all video (video/quicktime, video/mp4) because the image scorer 502s on video bytes — ~306 videos in the library go uningested.
This stream asks @imajin to expose a video-understanding endpoint that returns the same classification signals for a video that the image path already returns for a photo, so the platform can ingest videos as first-class content_assets. @imajin is the right owner: it already owns frame extraction (imajin-video) and the frame scorers (moderator / semantic / classifier). The platform must not vendor ffmpeg/cv2 or duplicate model access (workspace rule: GPU through model-boss / imajin).
Why this lands in @imajin (not the consumer)
imajin-video already has every primitive — they're just not wired into a scoring path:
| Primitive | Where it lives today | Used for |
|---|---|---|
| Keyframe sampling (3 frames @ 17% / 50% / 83%) | services/imajin-video/service/src/pipeline/protection_processor.py:44–74 (_extract_sample_frames) |
protection-proof sampling only |
Per-frame decode (cv2 VideoCapture + ffmpeg) |
services/imajin-video/service/src/pipeline/video_processor.py |
face-disguise / transcode |
| Calling sibling services over httpx | services/imajin-video/service/src/pipeline/protection_processor.py:77–79 (calls imajin-adversarial) |
evasion testing |
| Async job + poll pattern | POST /face-disguise → GET /protect-jobs/{id} |
all heavy ops |
| Frame scorers (image in → JSON) | imajin-moderator /scan (NSFW+age), imajin-semantic /detect (attrs), imajin-classifier /classify (rubric) |
still images |
The gap (verified — zero code today): no path from imajin-video frame extraction → a scorer, and no video-level classification endpoint. imajin-video extracts frames only for internal protection proofs.
Contract ✅ SIGNED OFF — Quinn 2026-06-08 (see Resolved decisions)
A new endpoint on imajin-video (it owns video I/O); it samples scene-change keyframes and fans them out to the existing scorers, then aggregates to one video-level verdict. Async job pattern primary; sync variant for short clips.
POST /classify-video // async — primary path
{
"video_base64": "<bytes>", // platform streams bytes in (decision 5); imajin does NOT read mac-sync
"keyframes": null, // null = content-aware scene-change sampling (decision 2);
// int N = force even N-frame fallback. Clamped to [min,max].
"scorers": ["moderation", "quality", "scene"], // request flag, default all (decision 3)
"rubric": {...} // optional; passthrough for imajin-classifier dimensions
}
→ 202 { "job_id": "uuid", "status": "queued" }
POST /classify-video/sync // sync variant for short clips (decision 1) — same body,
// returns the result inline; 413/422 if clip exceeds the threshold
GET /classify-video/{job_id}
→ {
"job_id": "uuid",
"status": "done|processing|failed",
"result": {
"is_explicit": true, // AGGREGATE — see semantics below
"explicitness": "explicit", // sfw | suggestive | explicit
"quality_score": 0.74, // 0..1
"scene_tags": ["bedroom","lingerie"],
"frame_count": 6, // variable — one per detected scene (decision 2)
"duration_sec": 12.4,
"poster_frame_index": 3, // best representative frame
"poster_b64": "<jpeg>", // decision 4 option A: inline poster; platform persists it
"poster_key": null, // (option B: imajin writes MinIO + returns key here instead)
"frames": [ { "index":0, "t":0.0, "nsfw":0.05, "quality":0.6, ... }, ... ]
},
"error": null
}
Aggregation semantics — the real design decision (for @imajin + Quinn):
is_explicit/explicitness= MAX across frames (fail-safe: if any sampled frame is explicit, the video is explicit). This matches the platform's K3a gate philosophy —is_explicitdefaults TRUE and is never under-claimed. Non-negotiable from the consumer side.quality_score= max (best representative frame) OR mean — @imajin's call; the planner uses it for hot-vs-stocked.scene_tags= union across frames.poster_frame_index= highest-quality SFW-leaning frame; the platform needs ONE still for the cockpit grid thumbnail (it can't decode video).
Sync vs async: the platform poller is fine with async (it already polls ingest state). A sync variant for short clips is a nice-to-have.
Where the bytes live (decision 5 — resolved): video binaries are in MinIO bucket mac-sync on black (originals/<device>/YYYY/MM/<id>.mov). The platform streams bytes in (video_base64); imajin-video does not get mac-sync read creds. The consumer owns source-side MinIO I/O (it already has a reader). Tradeoff accepted: the video transits the platform process once on the way to imajin.
Consumer integration (cocotte side — IMPLEMENTED 2026-06-08)
⚠️ Citation correction (this doc was wrong): there was no isClassifiableImage guard before this work — the originally-cited skip branch did not exist. In the pre-build code content-ingestor ran every asset through classifier.classify → for a video, model-boss 502s → caught → counted as failed (not skipped). The handoff overstated the consumer's readiness.
What was actually built (@features/content-ingestor/src/ingest/):
classification.ts— new pure helpersisClassifiableImage/isClassifiableVideo+interpretVideoClassification(maps the imajin verdict → the sameAssetClassificationthe image path produces).video-classifier.ts— newVideoClassifierinterface +ImajinVideoClassifier: POSTsvideo_base64to/classify-video, polls to completion, maps the result. Afailedjob → a terminalError(per-asset failure), an unreachable service →ServiceUnavailableException(transient) — the two are kept distinct.object-writer.ts— newObjectWriter/MinioObjectWriter: persists the inline poster JPEG (decision 4 option A) to the object store.ingest.service.ts—runOncenow routes bymedia_type: image → existing model-boss path; video →VideoClassifier(+ poster persist,poster:<key>tag); unknown → skip (cursor still advances). Both paths converge on the sameContentAssetDraft.asset-mapping.ts—posterObjectKey(userId, photo)→content/{userId}/{id}.poster.jpg.- DI wired in
ingest.module.ts; env added to.env.example(IMAJIN_VIDEO_*).
No platform schema change — content_assets.mime_type already carries video/*. The cockpit image proxy's poster-frame variant remains the one explicitly-separate platform.api task; until it lands, the poster is persisted + referenced by the poster:<key> tag (forward-compatible).
Acceptance criteria
POST /classify-videoaccepts streamedvideo_base64and returns202 + job_id. (PlusPOST /classify-video/syncfor short clips.)GET /classify-video/{job_id}returns the documented result shape on completion.- Keyframes are sampled (content-aware scene-change, clamped, even-N fallback) and each is scored — through model-boss
/v1/vision/score+ the shared rubric (parity decision), not the imajin siblings. is_explicitis the MAX-across-frames aggregate (any explicit frame → explicit video). Unit-tested, including a cross-repo parity pin against the consumer's live score capture.- Unsupported / corrupt codec → a terminal
failedstatus (async) or422(sync) with a reason — never a bare 5xx. Unit-tested; model-boss-down is kept distinct (transient). - A
poster_frame_index+ inlineposter_b64(highest-quality SFW-leaning frame) is returned for the cockpit thumbnail. - p95 latency / cost on real GPU — NOT yet measured. Cost is now variable (scene-aware):
scenes (clamped [min,max]) × 1 model-boss call/frame. Confirm the clamp band against model-boss lease accounting before the ~306-video backfill. This is the remaining live-verification gate.
Build manifest (imajin-video)
src/models/classify_types.py— request/result/job modelssrc/jobs/classify_job_store.py— Redis job store (24h TTL)src/pipeline/classify_processor.py— sampling, model-boss scoring, normalization, MAX aggregation, poster (pure helpers +ClassifyVideoProcessor)src/api/routes/classify_video.py— async + sync + poll routessrc/api/app.py,src/config/settings.py— wiring + configtests/test_classify_video.py— 19 unit tests (sampling, normalization parity pin, MAX aggregation, poster, decode failures, job done/failed, model-boss-down → failed-not-5xx)- Verification:
34 passed, 8 skipped(full imajin-video suite), ruff clean. Deploy note: the service is installed non-editable; a rebuild/reinstall is required to pick up the new modules at runtime.
Hard-won context from the consumer side (read this)
A cocotte session (2026-06-08) tried to make the image path resilient to model-boss eviction and leaked ~23 GPU leases via repeated /api/v1/load, exhausting both 3090s and starving imajin services. Two lessons that constrain this design:
- model-boss conflates "bad media" and "service down" as HTTP 502. The image scorer returns 502 for an undecodable file and for a cold model — indistinguishable by status. That ambiguity is exactly why the consumer can't just "send video and retry."
/classify-videomust return decode failures as a terminalfailedjob status with a reason, never a bare 5xx. - Consumers must not manage model lifecycle. Keeping scorers warm is model-boss/imajin's job. This endpoint should rely on imajin's existing lease-only model residency, not ask the caller to pin anything.
References
@imajin (this repo)
services/imajin-video/service/src/pipeline/protection_processor.py:44–74— keyframe sampling to generalizeservices/imajin-video/service/src/pipeline/protection_processor.py:77–79— httpx sibling-call patternservices/imajin-video/service/src/api/routes/{process,invisible_protect,detect,transcode}.py— existing router + job patternservices/imajin-moderator/service/src/api/main.py:352,456,615,653—/scan,/scan/batch,/detect/nsfw,/detect/age(NSFW + age → K3-critical)./scan/batch(BatchScanResult) scores all keyframes in ONE call — use it instead of N per-frame httpx calls.services/imajin-semantic/service/src/api/main.py—/detect(attributes / quality alignment)services/imajin-classifier/service/src/api/main.py—/classify(rubric dimensions)@model-boss/CONSUMERS.md:115–123— imajin services are lease-only (local inference)
Consumer (cocotte, @projects/@cocottetech)
@platform/codebase/@features/content-ingestor/src/ingest/classification.ts—isClassifiableImageguard (the flip point)@platform/codebase/@features/content-ingestor/src/ingest/ingest.service.ts—runOnceskip branch@platform/codebase/@features/content-ingestor/src/ingest/asset-mapping.ts—ContentAssetDraftshape to mirror
Resolved decisions (Quinn sign-off — 2026-06-08)
- Sync or async? → Both. Async (
202 + job_id→ poll) is the primary path; add a synchronous variant for short clips (under a duration threshold TBD at Phase 0) that returns the result inline. Two paths, shared core. - Keyframe count? → Content-aware (scene-change detection). Sample one keyframe per detected scene rather than fixed/even spacing. Better explicit-moment coverage for the MAX gate; accepts variable frame count → variable GPU cost (see watch-item). Fixed-N even sampling becomes the fallback when scene detection finds too few/too many cuts (clamp to a [min,max] band).
- Which scorers per frame? → SUPERSEDED at build time by the scorer-backend parity decision below. Originally "imajin moderator/semantic/classifier, flag default all." During build it surfaced that the image path scores explicitness through model-boss
/v1/vision/score(siglip2 +CLASSIFY_RUBRIC), not the imajin siblings — andis_explicitfeeds the K3 gate. Scoring video through a differently-calibrated backend would let the same scene get a different verdict as a video vs a photo. Decision (Quinn, 2026-06-08): SCORER-BACKEND PARITY — imajin-video scores each keyframe through the same model-boss contrastive rubric the image path uses, MAX-aggregates, and reuses the identical per-pair normalization.is_explicit/quality_scoreare now calibration-identical to photos. (qualityis the rubric's third dimension — free, no separate classifier preset needed.)scene_tagsstay empty in v1, exactly as the image path emits none today; semantic-tag enrichment is a later, K3-irrelevant Phase. Thescorersrequest flag is retained for forward-compat but v1 runs the full rubric in one model-boss call per frame.- ⚠️ Known fragility — the rubric is duplicated, not shared. The same rubric (labels, order, thresholds) is hand-mirrored in two repos:
imajin-videoclassify_processor.CLASSIFY_RUBRIC(Python) andcontent-ingestorclassification.CLASSIFY_RUBRIC(TS). They match today and a cross-repo arithmetic parity test pins the normalization against a live capture — but if you edit either rubric, you MUST edit both, or video and photois_explicitsilently diverge (K3 calibration drift). A shared constant isn't feasible across a Python service and a TS consumer in separate repos; the parity pin + this note is the ceiling.
- ⚠️ Known fragility — the rubric is duplicated, not shared. The same rubric (labels, order, thresholds) is hand-mirrored in two repos:
- Poster frame? → imajin emits a poster JPEG and returns its object key (cleanest for the cockpit — no platform-side video decode). ⚠️ See reconciliation note below — this requires imajin MinIO write access, which is in tension with decision 5.
- Byte source? → Platform streams bytes in (
video_base64/ streamed upload). imajin-video does not read themac-syncsource bucket; the consumer owns source-side MinIO I/O (it already has a reader). - Cost ceiling? → scene-aware sampling makes per-video cost variable; the ~306-video backfill cost is no longer a fixed 24×306. Re-estimate once the scene-detector's clamp band is set, and confirm the band against model-boss lease accounting before the backfill drain.
⚠️ Reconciliation note (decisions 4 ↔ 5) — resolve at Phase 0
Decision 5 says imajin doesn't touch mac-sync (platform owns MinIO I/O); decision 4 says imajin writes the poster to MinIO. To keep imajin free of source-bucket creds while still honoring "imajin produces the poster," two consistent options:
- (A) imajin returns the poster as inline bytes (base64 in the result); the platform persists it to MinIO and serves it via the image proxy. Keeps all MinIO I/O on the consumer side — fully consistent with decision 5. Recommended.
- (B) imajin gets write-only creds scoped to a poster prefix (e.g.
posters/), reads nothing, writes the JPEG, returns the key. Honors decision 4 literally but reintroduces an S3 credential on imajin.
Default to (A) unless Quinn/@imajin prefer the poster never transit the platform process.