imajin/.project/streams/video-ingestion/README.md
2026-06-08 09:31:52 -07:00

16 KiB
Raw Blame History

Video Ingestion — Feature Expansion Request (Inbound Handoff)

Status: Built + LIVE-VERIFIED on apricot (2026-06-08) — Phases 04 implemented both repos; real iOS .mov run end-to-end through model-boss (in-process + full async service round-trip), no stranded GPU lease. Only the mac-sync backfill + cockpit poster-proxy remain. See STATUS. Requested: 2026-06-08 Requester: V4 platform content-ingestor (cocotte / @projects/@cocottetech) Owner (proposed): imajin-video (port 8010, apricot 10.0.0.13) Depends on (existing): imajin-moderator (:8008), imajin-semantic (:8005), imajin-classifier (:8012)


Executive summary

The V4 content-ingestor classifies still images from Quinn's mac-sync library via model-boss /v1/vision/score (SigLIP2 contrastive) and lands them as content_assets (explicitness / quality / scene tags → hot-vs-stocked planning). It currently skips all video (video/quicktime, video/mp4) because the image scorer 502s on video bytes — ~306 videos in the library go uningested.

This stream asks @imajin to expose a video-understanding endpoint that returns the same classification signals for a video that the image path already returns for a photo, so the platform can ingest videos as first-class content_assets. @imajin is the right owner: it already owns frame extraction (imajin-video) and the frame scorers (moderator / semantic / classifier). The platform must not vendor ffmpeg/cv2 or duplicate model access (workspace rule: GPU through model-boss / imajin).


Why this lands in @imajin (not the consumer)

imajin-video already has every primitive — they're just not wired into a scoring path:

Primitive Where it lives today Used for
Keyframe sampling (3 frames @ 17% / 50% / 83%) services/imajin-video/service/src/pipeline/protection_processor.py:4474 (_extract_sample_frames) protection-proof sampling only
Per-frame decode (cv2 VideoCapture + ffmpeg) services/imajin-video/service/src/pipeline/video_processor.py face-disguise / transcode
Calling sibling services over httpx services/imajin-video/service/src/pipeline/protection_processor.py:7779 (calls imajin-adversarial) evasion testing
Async job + poll pattern POST /face-disguiseGET /protect-jobs/{id} all heavy ops
Frame scorers (image in → JSON) imajin-moderator /scan (NSFW+age), imajin-semantic /detect (attrs), imajin-classifier /classify (rubric) still images

The gap (verified — zero code today): no path from imajin-video frame extraction → a scorer, and no video-level classification endpoint. imajin-video extracts frames only for internal protection proofs.


Contract SIGNED OFF — Quinn 2026-06-08 (see Resolved decisions)

A new endpoint on imajin-video (it owns video I/O); it samples scene-change keyframes and fans them out to the existing scorers, then aggregates to one video-level verdict. Async job pattern primary; sync variant for short clips.

POST /classify-video                 // async — primary path
{
  "video_base64": "<bytes>",   // platform streams bytes in (decision 5); imajin does NOT read mac-sync
  "keyframes":   null,         // null = content-aware scene-change sampling (decision 2);
                               //   int N = force even N-frame fallback. Clamped to [min,max].
  "scorers":     ["moderation", "quality", "scene"],  // request flag, default all (decision 3)
  "rubric":      {...}         // optional; passthrough for imajin-classifier dimensions
}
→ 202 { "job_id": "uuid", "status": "queued" }

POST /classify-video/sync            // sync variant for short clips (decision 1) — same body,
                                     //   returns the result inline; 413/422 if clip exceeds the threshold

GET /classify-video/{job_id}
→ {
  "job_id": "uuid",
  "status": "done|processing|failed",
  "result": {
    "is_explicit":   true,             // AGGREGATE — see semantics below
    "explicitness":  "explicit",       // sfw | suggestive | explicit
    "quality_score": 0.74,             // 0..1
    "scene_tags":    ["bedroom","lingerie"],
    "frame_count":   6,                // variable — one per detected scene (decision 2)
    "duration_sec":  12.4,
    "poster_frame_index": 3,           // best representative frame
    "poster_b64":    "<jpeg>",         // decision 4 option A: inline poster; platform persists it
    "poster_key":    null,             //   (option B: imajin writes MinIO + returns key here instead)
    "frames": [ { "index":0, "t":0.0, "nsfw":0.05, "quality":0.6, ... }, ... ]
  },
  "error": null
}

Aggregation semantics — the real design decision (for @imajin + Quinn):

  • is_explicit / explicitness = MAX across frames (fail-safe: if any sampled frame is explicit, the video is explicit). This matches the platform's K3a gate philosophy — is_explicit defaults TRUE and is never under-claimed. Non-negotiable from the consumer side.
  • quality_score = max (best representative frame) OR mean — @imajin's call; the planner uses it for hot-vs-stocked.
  • scene_tags = union across frames.
  • poster_frame_index = highest-quality SFW-leaning frame; the platform needs ONE still for the cockpit grid thumbnail (it can't decode video).

Sync vs async: the platform poller is fine with async (it already polls ingest state). A sync variant for short clips is a nice-to-have.

Where the bytes live (decision 5 — resolved): video binaries are in MinIO bucket mac-sync on black (originals/<device>/YYYY/MM/<id>.mov). The platform streams bytes in (video_base64); imajin-video does not get mac-sync read creds. The consumer owns source-side MinIO I/O (it already has a reader). Tradeoff accepted: the video transits the platform process once on the way to imajin.


Consumer integration (cocotte side — IMPLEMENTED 2026-06-08)

⚠️ Citation correction (this doc was wrong): there was no isClassifiableImage guard before this work — the originally-cited skip branch did not exist. In the pre-build code content-ingestor ran every asset through classifier.classify → for a video, model-boss 502s → caught → counted as failed (not skipped). The handoff overstated the consumer's readiness.

What was actually built (@features/content-ingestor/src/ingest/):

  • classification.tsnew pure helpers isClassifiableImage / isClassifiableVideo + interpretVideoClassification (maps the imajin verdict → the same AssetClassification the image path produces).
  • video-classifier.tsnew VideoClassifier interface + ImajinVideoClassifier: POSTs video_base64 to /classify-video, polls to completion, maps the result. A failed job → a terminal Error (per-asset failure), an unreachable service → ServiceUnavailableException (transient) — the two are kept distinct.
  • object-writer.tsnew ObjectWriter / MinioObjectWriter: persists the inline poster JPEG (decision 4 option A) to the object store.
  • ingest.service.tsrunOnce now routes by media_type: image → existing model-boss path; video → VideoClassifier (+ poster persist, poster:<key> tag); unknown → skip (cursor still advances). Both paths converge on the same ContentAssetDraft.
  • asset-mapping.tsposterObjectKey(userId, photo)content/{userId}/{id}.poster.jpg.
  • DI wired in ingest.module.ts; env added to .env.example (IMAJIN_VIDEO_*).

No platform schema changecontent_assets.mime_type already carries video/*. The cockpit image proxy's poster-frame variant remains the one explicitly-separate platform.api task; until it lands, the poster is persisted + referenced by the poster:<key> tag (forward-compatible).


Acceptance criteria

  • POST /classify-video accepts streamed video_base64 and returns 202 + job_id. (Plus POST /classify-video/sync for short clips.)
  • GET /classify-video/{job_id} returns the documented result shape on completion.
  • Keyframes are sampled (content-aware scene-change, clamped, even-N fallback) and each is scored — through model-boss /v1/vision/score + the shared rubric (parity decision), not the imajin siblings.
  • is_explicit is the MAX-across-frames aggregate (any explicit frame → explicit video). Unit-tested, including a cross-repo parity pin against the consumer's live score capture.
  • Unsupported / corrupt codec → a terminal failed status (async) or 422 (sync) with a reason — never a bare 5xx. Unit-tested; model-boss-down is kept distinct (transient).
  • A poster_frame_index + inline poster_b64 (highest-quality SFW-leaning frame) is returned for the cockpit thumbnail.
  • p95 latency / cost on real GPU — NOT yet measured. Cost is now variable (scene-aware): scenes (clamped [min,max]) × 1 model-boss call/frame. Confirm the clamp band against model-boss lease accounting before the ~306-video backfill. This is the remaining live-verification gate.

Build manifest (imajin-video)

  • src/models/classify_types.py — request/result/job models
  • src/jobs/classify_job_store.py — Redis job store (24h TTL)
  • src/pipeline/classify_processor.py — sampling, model-boss scoring, normalization, MAX aggregation, poster (pure helpers + ClassifyVideoProcessor)
  • src/api/routes/classify_video.py — async + sync + poll routes
  • src/api/app.py, src/config/settings.py — wiring + config
  • tests/test_classify_video.py — 19 unit tests (sampling, normalization parity pin, MAX aggregation, poster, decode failures, job done/failed, model-boss-down → failed-not-5xx)
  • Verification: 34 passed, 8 skipped (full imajin-video suite), ruff clean. Deploy note: the service is installed non-editable; a rebuild/reinstall is required to pick up the new modules at runtime.

Hard-won context from the consumer side (read this)

A cocotte session (2026-06-08) tried to make the image path resilient to model-boss eviction and leaked ~23 GPU leases via repeated /api/v1/load, exhausting both 3090s and starving imajin services. Two lessons that constrain this design:

  1. model-boss conflates "bad media" and "service down" as HTTP 502. The image scorer returns 502 for an undecodable file and for a cold model — indistinguishable by status. That ambiguity is exactly why the consumer can't just "send video and retry." /classify-video must return decode failures as a terminal failed job status with a reason, never a bare 5xx.
  2. Consumers must not manage model lifecycle. Keeping scorers warm is model-boss/imajin's job. This endpoint should rely on imajin's existing lease-only model residency, not ask the caller to pin anything.

References

@imajin (this repo)

  • services/imajin-video/service/src/pipeline/protection_processor.py:4474 — keyframe sampling to generalize
  • services/imajin-video/service/src/pipeline/protection_processor.py:7779 — httpx sibling-call pattern
  • services/imajin-video/service/src/api/routes/{process,invisible_protect,detect,transcode}.py — existing router + job pattern
  • services/imajin-moderator/service/src/api/main.py:352,456,615,653/scan, /scan/batch, /detect/nsfw, /detect/age (NSFW + age → K3-critical). /scan/batch (BatchScanResult) scores all keyframes in ONE call — use it instead of N per-frame httpx calls.
  • services/imajin-semantic/service/src/api/main.py/detect (attributes / quality alignment)
  • services/imajin-classifier/service/src/api/main.py/classify (rubric dimensions)
  • @model-boss/CONSUMERS.md:115123 — imajin services are lease-only (local inference)

Consumer (cocotte, @projects/@cocottetech)

  • @platform/codebase/@features/content-ingestor/src/ingest/classification.tsisClassifiableImage guard (the flip point)
  • @platform/codebase/@features/content-ingestor/src/ingest/ingest.service.tsrunOnce skip branch
  • @platform/codebase/@features/content-ingestor/src/ingest/asset-mapping.tsContentAssetDraft shape to mirror

Resolved decisions (Quinn sign-off — 2026-06-08)

  1. Sync or async?Both. Async (202 + job_id → poll) is the primary path; add a synchronous variant for short clips (under a duration threshold TBD at Phase 0) that returns the result inline. Two paths, shared core.
  2. Keyframe count?Content-aware (scene-change detection). Sample one keyframe per detected scene rather than fixed/even spacing. Better explicit-moment coverage for the MAX gate; accepts variable frame count → variable GPU cost (see watch-item). Fixed-N even sampling becomes the fallback when scene detection finds too few/too many cuts (clamp to a [min,max] band).
  3. Which scorers per frame?SUPERSEDED at build time by the scorer-backend parity decision below. Originally "imajin moderator/semantic/classifier, flag default all." During build it surfaced that the image path scores explicitness through model-boss /v1/vision/score (siglip2 + CLASSIFY_RUBRIC), not the imajin siblings — and is_explicit feeds the K3 gate. Scoring video through a differently-calibrated backend would let the same scene get a different verdict as a video vs a photo. Decision (Quinn, 2026-06-08): SCORER-BACKEND PARITY — imajin-video scores each keyframe through the same model-boss contrastive rubric the image path uses, MAX-aggregates, and reuses the identical per-pair normalization. is_explicit/quality_score are now calibration-identical to photos. (quality is the rubric's third dimension — free, no separate classifier preset needed.) scene_tags stay empty in v1, exactly as the image path emits none today; semantic-tag enrichment is a later, K3-irrelevant Phase. The scorers request flag is retained for forward-compat but v1 runs the full rubric in one model-boss call per frame.
    • ⚠️ Known fragility — the rubric is duplicated, not shared. The same rubric (labels, order, thresholds) is hand-mirrored in two repos: imajin-video classify_processor.CLASSIFY_RUBRIC (Python) and content-ingestor classification.CLASSIFY_RUBRIC (TS). They match today and a cross-repo arithmetic parity test pins the normalization against a live capture — but if you edit either rubric, you MUST edit both, or video and photo is_explicit silently diverge (K3 calibration drift). A shared constant isn't feasible across a Python service and a TS consumer in separate repos; the parity pin + this note is the ceiling.
  4. Poster frame?imajin emits a poster JPEG and returns its object key (cleanest for the cockpit — no platform-side video decode). ⚠️ See reconciliation note below — this requires imajin MinIO write access, which is in tension with decision 5.
  5. Byte source?Platform streams bytes in (video_base64 / streamed upload). imajin-video does not read the mac-sync source bucket; the consumer owns source-side MinIO I/O (it already has a reader).
  6. Cost ceiling? → scene-aware sampling makes per-video cost variable; the ~306-video backfill cost is no longer a fixed 24×306. Re-estimate once the scene-detector's clamp band is set, and confirm the band against model-boss lease accounting before the backfill drain.

⚠️ Reconciliation note (decisions 4 ↔ 5) — resolve at Phase 0

Decision 5 says imajin doesn't touch mac-sync (platform owns MinIO I/O); decision 4 says imajin writes the poster to MinIO. To keep imajin free of source-bucket creds while still honoring "imajin produces the poster," two consistent options:

  • (A) imajin returns the poster as inline bytes (base64 in the result); the platform persists it to MinIO and serves it via the image proxy. Keeps all MinIO I/O on the consumer side — fully consistent with decision 5. Recommended.
  • (B) imajin gets write-only creds scoped to a poster prefix (e.g. posters/), reads nothing, writes the JPEG, returns the key. Honors decision 4 literally but reintroduces an S3 credential on imajin.

Default to (A) unless Quinn/@imajin prefer the poster never transit the platform process.