imajin/docs/architecture/data-flow.md
Lilith a5f99bb3d7 chore(imajin): clean up legacy structure and completion markers
- Remove old imajin/ directory (migrated to services/ + orchestrators/)
- Delete completion markers (DONE.md, INTEGRATION-COMPLETE.md, TESTING.md)
- Remove standalone test generation scripts
- Update docs to reflect current architecture
- Add multi-base-strategy.md documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 17:01:10 -08:00

6.2 KiB

Data Flow

End-to-End Image Generation

The typical request flows through multiple services:

sequenceDiagram
    participant User
    participant UI as imajin-app
    participant Assist as imajin-prompt
    participant Gen as imajin-diffusion
    participant Proc as imajin-processing
    participant GPU

    User->>UI: Enter prompt description
    UI->>Assist: POST /analyze-context

    Note over Assist: Stage 1: Cultural Classification
    Assist->>GPU: Load classifier
    GPU-->>Assist: Classification result

    Note over Assist: Stage 2: LLM Reasoning
    Assist->>GPU: Load DeepSeek R1 70B
    GPU-->>Assist: Generated prompts
    Assist-->>UI: GenerationConfig + prompts

    User->>UI: Select prompts, click Generate
    UI->>Gen: POST /generate/async
    Gen-->>UI: { jobId: "abc123" }

    loop Poll Status
        UI->>Gen: GET /jobs/abc123
        Gen-->>UI: { status: "processing" }
    end

    Note over Gen: Diffusion Model Inference
    Gen->>GPU: Load diffusion model
    GPU-->>Gen: Generated image

    UI->>Gen: GET /jobs/abc123/result
    Gen-->>UI: { imageData: "base64..." }

    opt Post-Processing
        UI->>Proc: POST /derivatives
        Proc-->>UI: Processed variants
    end

    UI-->>User: Display final image

Request Types

1. Prompt Generation Flow

Entry: POST /analyze-context (imajin-prompt)

User Input (category, filters)
    ↓
Cultural Classifier (fast, rule-based)
    ↓
LLM Reasoning (DeepSeek R1 70B)
    ↓
GenerationConfig + Image Prompts

Duration: 15-60 seconds (LLM inference)

2. Image Generation Flow

Entry: POST /generate or POST /generate/async (imajin-diffusion)

Image Prompt + Parameters
    ↓
Model Selection (photorealistic/anime)
    ↓
Diffusion Inference Pipeline
    ↓
Optional: Text Overlay
    ↓
Optional: Watermark
    ↓
Optional: Moderation
    ↓
Base64 Image Output

Duration: 5-30 seconds (depends on resolution)

3. Post-Processing Flow (Integrated)

Entry: POST /process (imajin-processing)

Default Pipeline (used by imajin orchestrator):

Base64 PNG Input (from SDXL)
    ↓
Optimize (WebP quality 82)
    ↓
Convert to WebP (quality 90)
    ↓
Generate Derivatives (family-based responsive variants)
    ↓
Processed Image + Derivatives + Metadata

Available Operations:

  • sanitize - Strip metadata, validate (for user-uploaded images only)
  • optimize - WebP conversion with balanced preset
  • convert-webp - High-quality WebP conversion
  • derivatives - Generate responsive image variants

Integration: The main orchestrator (orchestrators/imajin-app/src/imajin_app/main.py) automatically processes generated images unless skip_processing=true.

Duration: 1-5 seconds (depends on resolution and derivative count)

4. Batch Multi-Size Generation Flow

Entry: POST /generate/batch-sizes (imajin orchestrator)

sequenceDiagram
    participant Consumer
    participant Orchestrator as imajin (main.py)
    participant Strategy as BaseImageStrategy
    participant VRAMBoss as vram-boss
    participant Diffusion as imajin-diffusion
    participant Focal as FocalPointDetector
    participant Processing as imajin-processing

    Consumer->>Orchestrator: POST /generate/batch-sizes
    Note over Orchestrator: { sizes: ["hero", "og", "sidebar"] }
    Orchestrator-->>Consumer: { job_id: "...", status: "queued" }

    Note over Strategy: Analyze sizes, group by aspect
    Strategy-->>Orchestrator: Need 2 bases: landscape, portrait

    loop For each base needed
        Orchestrator->>VRAMBoss: Acquire GPU lease
        VRAMBoss-->>Orchestrator: Lease granted
        Orchestrator->>Diffusion: Generate base (seed=X, layout=Y)
        Diffusion-->>Orchestrator: Base image
        VRAMBoss-->>Orchestrator: Lease released
    end

    loop For each base generated
        Orchestrator->>Focal: Detect focal point
        Focal-->>Orchestrator: FocalPoint(x, y)
    end

    loop For each requested size
        Orchestrator->>Processing: POST /derivatives/clip-focal
        Note over Processing: Crop with focal point preservation
        Processing-->>Orchestrator: Cropped derivative
    end

    Consumer->>Orchestrator: GET /jobs/{job_id}
    Orchestrator-->>Consumer: { status: "completed", images: {...} }

Batch Pipeline Stages:

BatchSizesRequest { sizes[], seed?, priority }
    ↓
Stage 1: AnalyzeSizesStage
    → Determine minimal bases needed (landscape/square/portrait)
    → Generate or use provided seed
    ↓
Stage 2: GenerateBasesStage
    → Acquire GPU lease via vram-boss
    → Generate each base with consistent seed
    → Same "person" across all bases
    ↓
Stage 3: DetectFocalPointsStage
    → MediaPipe face detection per base
    → Fallback to center if no face
    ↓
Stage 4: CropDerivativesStage
    → Crop bases to requested sizes
    → Preserve focal point in crop region
    ↓
BatchSizesResponse { images, bases_generated, seed }

Key Benefits:

  • Visual Coherence: Same seed = same "person" across all sizes
  • Efficiency: 4 sizes from 2 bases instead of 4 separate generations
  • Smart Cropping: Faces preserved via focal point detection

Duration: 8-15 seconds (vs 20-40s generating each independently)

See Also: Multi-Base Strategy for full implementation details.

Data Formats

Image Data

All image data is transmitted as base64-encoded strings:

interface GenerateResponse {
  imageData: string;  // base64-encoded PNG/WebP
  format: 'png' | 'webp';
  width: number;
  height: number;
}

Prompt Data

interface ParsedPrompt {
  name: string;           // Human-readable identifier
  prompt: string;         // Positive image prompt
  negativePrompt: string; // Negative image prompt
}

Error Propagation

Errors bubble up through the service chain:

graph LR
    GPU[GPU OOM] --> GEN[imajin-diffusion 500]
    GEN --> UI[UI Error State]

    LLM[LLM Timeout] --> ASSIST[imajin-prompt 500]
    ASSIST --> UI

All services return structured error responses:

{
  "error": "GPU out of memory",
  "code": "GPU_OOM",
  "details": { "requested": "8GB", "available": "4GB" }
}