System overview
Purpose
Give an operator- or integrator-level understanding of Srulik's lab's major subsystems, what each owns, and how data moves between them. Architecture serves operator judgment loops—ingestion and scoring exist to support scan → proof → decide → feedback, not to replace human operators.
After reading this you should be able to locate any behavior of the running system to a subsystem, and reason about the blast radius of a change before you make it.
Prerequisites
- Required: Familiarity with Node.js services and a React frontend.
- Useful: System dataflow for the conceptual pipeline.
Inputs
- Ingestion sources: NewsAPI.ai (news), Meta WhatsApp Cloud API (messages), OpenAI Whisper (audio → text), and user submissions via the UI.
- Configuration: server env (secrets, auth posture, SQLite path) and client build-time env (Firebase web config).
Outputs
- Persisted evidence in SQLite, with provenance for every row.
- Assessment artifacts on disk (
reports/,signals/, source exports) that the API serves. - UI surfaces: the Report tab, the four domain tabs (Submissions, Education, Municipalities, Naftali), the Docs panel, and the evidence submission bar.
Constraints
- Separation of concerns. Business logic lives in
business_modules/. Cross-cutting utilities (persistence, budget/cost tracking, shared helpers) live incross-cut-modules/. The UI lives inclient/. The Fastify app shell in the repo root (app.js,server.js) wires these together — it shouldn't contain business logic. - Operational safety. Long-running work must have timeouts and explicit failure signals. Silent partial completion is the worst outcome — prefer loud failure.
- Artifact-first pipelines. Stages communicate through files on disk, not in-memory handoffs. This is what makes the pipeline restartable and reviewable.
- One Node process for both API and SPA. Fastify serves
client/distfor non-/apiroutes. This keeps deploys simple at the cost of not being able to scale the two independently — acceptable at the current scale.
The subsystems
Ingestion (business_modules/{news-sites,whatsapp,audio,video,recording,radio}/) — one module per source, each with an input/ directory containing CLI entry points and an output/ or reports/ directory where dated markdown exports land. Ingestion modules normalize source-specific quirks into a shared markdown format that downstream stages can parse.
Storage (cross-cut-modules/ + SQLite) — a small SQLite database holds evidence, messages, drafts, submissions, and artifacts. The path is controlled by SQLITE_PATH. Only the server writes; stages that need persisted data go through the cross-cut helpers rather than opening the DB directly.
Analysis (business_modules/resilience/) — the scoring pipeline. extract-signals reads dated exports and emits typed signals; assess-signals applies the weight table and produces reports. Prompts and the taxonomy are kept in this module.
Cross-cut (cross-cut-modules/) — shared concerns: budget accounting, LLM clients, persistence helpers, and utilities. Anything that two or more business modules would otherwise duplicate belongs here.
Translation (business_modules/translation/) — on-demand translation of the report narrative when the UI language differs from the source language. Called per-request from the UI.
Domain dashboards — Education, Municipalities, and Naftali each have their own module under business_modules/ with their own data and ingestion paths.
HTTP surface — Fastify in app.js wires routes to the relevant modules. OpenAPI is the contract; see API reference. Swagger UI is mounted at /api/swagger.
Frontend (client/) — React + Vite SPA. Tabs for Report, Submissions, Education, Municipalities, Naftali. A Docs panel (in-app) reads /api/docs/index and /api/docs/page/:slug to render this same product documentation inside the app.
Examples
Data flow, high level
source adapters ──▶ dated markdown exports ──▶ SQLite evidence ──▶
──▶ signals JSON ──▶ assessment report ──▶ API + UI
Each arrow crosses a file boundary. Each stage is restartable from the last artifact on disk.
Where a change usually belongs
- "I want to ingest a new news source." → add an adapter under
business_modules/news-sites/and teach the homefront extractor about it. Don't touch analysis code. - "I want to add a new signal type." → update the extraction prompt, the validator, and the scoring weight table in
business_modules/resilience/. See Signal taxonomy. - "I want a new tab in the UI." → add a component under
client/src/components/, wire it intoMainApp.jsx, and expose the data it needs via a new API route. Don't duplicate analysis logic in the client. - "I want to rotate an API key." → update the runtime env and restart the server. No rebuilds needed unless it was a
VITE_*value (then rebuildclient/dist).
Operational boundaries
- Stateless API. Every request is self-contained; the server doesn't hold per-user session state.
- Stateful pipeline. The pipeline writes dated artifacts to disk. That's the state.
- No shared writers on SQLite. One server per database file.
Troubleshooting
- Reports stop updating
- Check: ingestion sources, background job exit codes, API keys, and budget caps.
- Fix: verify environment variables and inspect server logs for upstream failures. Start with Observability.
- A change to the UI surfaces analysis differently but the numbers are the same
- Check: is the code in
client/reshaping data, or is an analysis module changing values? - Fix: keep analysis in
business_modules/resilience/; keep presentation inclient/. Data mutations in the UI are a code smell.
- Check: is the code in
- Docs panel doesn't show a page that exists on disk
- Check: the page's frontmatter (especially
intentandgated) against the in-app panel's user-guide filter. - Fix: fix frontmatter, or confirm the page appears on full docs.
- Check: the page's frontmatter (especially
See Module map for the "where do I add code?" view, and Storage model for persistence specifics.