Bulk Prediction Pipeline — built end to end

The stack

Ten PRs, each one thing, stacked on the last.

Every PR does exactly one job and builds on the one before it — from the app skeleton through the engine, the API, the worker image, the AWS infrastructure, the UI, and the docs.

PR 1 · #80App skeleton — models, migration, local stackadd

bulk-pr1 → dev

Adds the apps/bulk app skeleton: two tables and the local dev stack. Additive only — no registry/predict/web paths touched.

BulkRequest and BulkJob models with migration 0001_initial.
App registration, Django admin, factories, and model tests.
Local stack: a Redis broker service in docker-compose plus justfile recipes.

erDiagram bulk_requests ||--o{ bulk_jobs : "fans out to" bulk_requests { bigint id PK "exp id" text status text target text model_id FK text smiles_path int smiles_count json result_paths } bulk_jobs { bigserial id PK bigint request_id FK int index text status text model_id int completed_smiles }

PR 2 · #81Extract format_predictions into a shared modulerefactor

bulk-pr2 → bulk-pr1

Moves prediction-result formatting out of the predict view into a shared module so the bulk pipeline can reuse it. Logic moved verbatim — no behaviour change.

New apps/predict/services/formatting.py with format_predictions / extract_prediction.
Predict view updated to import from the shared module; no other call sites.

graph LR subgraph Before V1["predict view"] --> F1["formatting logic (inline)"] end subgraph After V2["predict view"] --> M["formatting.py"] B["bulk process_job (PR 4)"] --> M end

PR 3 · #82Ingestion + routing validation servicesadd

bulk-pr3 → bulk-pr2

Adds the service layer that reads input SMILES and validates routing before a request is accepted.

storage_service — read SMILES from an inline list or an S3 source (txt/csv/parquet/sdf), save the input, merge per-job results.
validation — check routing (target/series/model_id) against recommendations and sample-check the SMILES.

flowchart LR IN["smiles[] or smiles_source"] --> ING["read SMILES"] ROUTE["target / series / model_id"] --> VAL["validate routing + SMILES"] ING --> OK{valid?} VAL --> OK OK -- yes --> RDY["ready to submit"] OK -- no --> ERR["400"]

PR 4 · #83Celery wiring + worker tasks + service engineadd

bulk-pr4 → bulk-pr3

Adds the Celery app, the worker task graph, and the service engine that runs a request end to end.

Task chain orchestrate -> process_job (one per model) -> finalize.
BulkService submit / cancel / retry; routing resolver shared with the preview endpoint.
Inference-engine and experiment-tracking integration.
Dispatch on transaction commit; cancel terminates running Batch jobs; empty routing fails fast; finalize idempotent via status check-and-set.

PR 5 · #84/api/bulk/ REST APIadd

bulk-pr5 → bulk-pr4

Exposes the bulk pipeline over /api/bulk/ (browser, Okta) and /api/service/bulk/ (PAT). Same views back both prefixes.

Views, serializers, URLs: submit (202), list (paginated, scoped to caller), status, cancel, retry, result (presigned downloads).
Per-caller ownership scoping; rate-limit and input-size breaches surface as 429 / 400.

flowchart LR C["client"] --> API["/api/bulk/"] API --> SUB["POST requests/ -> 202"] API --> LIST["GET requests/"] API --> RES["GET requests/{id}/result/"] API --> CAN["POST requests/{id}/cancel/"] SUB --> SVC["BulkService"] CAN --> SVC SVC --> DB[("bulk_requests / bulk_jobs")]

PR 6 · #85GPU worker image + cuda extra + CI build jobadd

bulk-pr6 → bulk-pr5

Builds the GPU worker container image and wires its CI build. At this point the worker still drains bulk_gpu — queues are renamed in PR 7.

Dockerfile.bulk — CUDA image that runs the one-shot Celery worker (bulk_one_shot).
cuda optional-dependency extra: the GPU inference stack plus pycurl for the SQS consumer.
CI build-bulk job builds the image and pushes it to ECR.

flowchart LR CI["CI build-bulk"] --> IMG["Dockerfile.bulk (CUDA + cuda extra)"] IMG --> ECR["ECR: bulk-worker-dev"] ECR --> W["one-shot worker: bulk_one_shot"]

PR 7 · #86AWS infra provisioner + SQS broker wiringrefactor

bulk-pr7 → bulk-pr6

Provisions the AWS runtime and switches Celery onto the SQS broker. Resources are renamed from the PR 6 placeholders to descriptive, role-based names.

Idempotent provisioner: SQS queues + DLQs, Batch GPU compute environment / queue / job definition, CloudWatch alarm, EventBridge autoscale rule + IAM role.
Celery on the SQS broker, routed to bulk-inference / bulk-finalize.
Renamed: queues bulk_gpu / bulk_cpu, the worker CMD, and the task docstrings.

flowchart LR subgraph Before["Before (PR 6)"] W1["web / worker"] --> BG["SQS bulk_gpu"] W1 --> BC["SQS bulk_cpu"] end subgraph After["After (this PR)"] W2["web: submit"] -->|process_job| Q1["SQS bulk-inference"] Q1 --> AL["CloudWatch alarm"] AL --> EB["EventBridge rule"] EB -->|SubmitJob| BW["Batch GPU worker (scale from zero)"] BW --> S3[("S3 results")] BW -->|finalize| Q2["SQS bulk-finalize"] Q2 --> BW end

PR 8 · #87Bulk Predictions UI (frontend)add

bulk-pr8 → bulk-pr7

Adds the Bulk Predict screen to the React app, styled to match Quick Predict.

frontend/src/bulk.jsx: submit form, requests list (refresh + client-side pagination), detail pane (progress, per-job status, cancel/retry, downloads).
Route and sidebar registration.

flowchart LR UI["Bulk Predict screen"] --> FORM["submit form"] UI --> LISTV["requests list"] UI --> DET["detail pane"] FORM -->|POST| A1["/api/bulk/requests/"] LISTV -->|GET| A1 DET -->|GET| A2["requests/{id}/ and /result/"] DET -->|POST| A3["cancel / retry"]

PR 9 · #88Submit-form BFF endpoints — options / quota / preview-routingadd

bulk-pr9 → bulk-pr8

Adds the read-only helper endpoints the submit form needs.

GET options/ — routable targets and series, output formats, limits (cached 60s).
GET quota/ — the caller's per-user usage plus the global cluster cap.
GET preview-routing/ — the model count for a routing, via the shared resolve_model_ids so it never drifts from submit.

flowchart LR FORM["submit form"] --> OPT["GET options/"] FORM --> QUO["GET quota/"] FORM --> PRE["GET preview-routing/"] OPT --> REC[("recommendations")] QUO --> DB[("bulk_requests")] PRE --> RES["resolve_model_ids (shared with submit)"]

PR 10 · #89Documentation — app, AWS resources, schema, ERDadd

bulk-pr10 → bulk-pr9

Documents the bulk feature across the reference docs. Docs only — no code changes.

schema.md (bulk tables), api-reference.md (/api/bulk/ endpoints), architecture.md (flow + diagrams).
code-organization.md (now five apps), bulk-aws-resources.md, README index, CLAUDE.md.
erd.html (v9) — the bulk_requests / bulk_jobs tables with their FK edges.

graph TD PR["PR 10: docs"] --> SCH["schema.md: bulk tables"] PR --> API["api-reference.md: /api/bulk/"] PR --> ARCH["architecture.md: flow"] PR --> ORG["code-organization.md: 5 apps"] PR --> AWS["bulk-aws-resources.md"] PR --> ERD["erd.html v9"]

Asynchronous bulk prediction, built end to end.

One request, fanned across models, run on demand.

Ten PRs, each one thing, stacked on the last.

How it came together.

Not a mock — real runs on dev.

Multi-model merge

Rename smoke test

Cancel that bites