# Floom unvalidated assumptions tracker
_Source of truth for every hypothesis the wireframe and roadmap depend on._
_Status: drafted 2026-04-14, updated as validation agents land._

## Purpose

Before writing any production code, every assumption baked into the wireframe + roadmap must be tested, validated, or knowingly deferred. Federico's explicit directive: "test and validate all assumptions and hypothesis. Everything defined and prepared."

Each assumption has a status:
- 🔴 **Unvalidated** — we don't know if it's true
- 🟡 **Being validated** — agent running right now
- 🟢 **Validated** — we have evidence
- ⚫️ **Deferred** — known unknown, we'll validate later
- ❌ **Invalidated** — turned out false, wireframe/roadmap needs to adapt

## The 20 assumptions

| # | Assumption | Source (where it's baked in) | Status | Validation method | Owner |
|---|---|---|---|---|---|
| 1 | **Floom's "spec-to-UI" renderer is achievable and polished** | v7 wireframe `/r/:slug` + v9 converter list | 🟢 **VALIDATED** — [research/spec-to-ui-reference.md](research/spec-to-ui-reference.md). Recommendation: **rjsf for inputs** (covers 11 of 13 locked converters), **TanStack Table for outputs** (Appsmith is gold standard), **Vercel AI SDK `parts` pattern** as the contract shape. **KEY FINDING: no existing tool auto-renders response schemas as rich output UI — Floom's wedge**. Phase 2 MVP: 8 input + 6 output converters; defer PDF viewer, audio, CSV preview to Phase 3. | Complete |
| 2 | **Floom apps work as MCP servers in Claude Desktop / Cursor / Cline today** | wireframe "works with every agent" + install sheet `/p/flyfast` | 🟢 **VALIDATED + P0 FIXED in v0.2.0** — [research/mcp-integration-validation.md](research/mcp-integration-validation.md) + Phase 2 sprint ([floom-monorepo#v0.2.0](https://github.com/floomhq/floom-monorepo/releases/tag/v0.2.0)). MCP handshake still returns `protocolVersion 2024-11-05`. **The per-user secrets P0 is fixed via the `_auth` meta-param extension**: every per-app MCP server that declares `secrets_needed` now advertises an optional `_auth: { SECRET_NAME: "..." }` object in the tool's `inputSchema`. Clients populate it per call; secrets flow through to the per-run env and are never persisted. Missing secrets return a structured `{error: "missing_secrets", required: [...], help: "..."}` response the LLM can use to prompt the user. Also fixed: unknown-slug now returns a JSON-RPC error envelope not bare 404. `FLOOM_AUTH_TOKEN` adds a shared-token gate on `/mcp/*`. Aggregated `/mcp` endpoint deferred to v0.3. | Complete |
| 3 | **The v0.2.0 Docker image works on a clean machine** | Home "Self-host in 60 seconds" + /docs Quickstart + phase 2 roadmap | 🟢 **VALIDATED in v0.2.0** — [research/self-host-validation.md](research/self-host-validation.md) + Phase 2 sprint ([floom-monorepo#v0.2.0](https://github.com/floomhq/floom-monorepo/releases/tag/v0.2.0)). Published as `ghcr.io/floomhq/floom-monorepo:v0.2.0` with **both linux/amd64 AND linux/arm64 manifests** (Apple Silicon native). Smoke-tested on AX41: `docker run` with a 1-app apps.yaml (petstore, no base_url needed) produces a working hub in <10s; `POST /api/petstore/run` returns real pet data end-to-end. **All 3 P0 fixes landed**: (1) `FLOOM_SEED_APPS` defaults to off → empty hub instead of 15 apps crashing on docker.sock; (2) base_url path-stripping fixed (verified against live Petstore, 11-case unit test); (3) `FLOOM_AUTH_TOKEN` env var gates `/api/*`, `/mcp/*`, `/p/*` with constant-time bearer comparison (health stays open). Plus P1s: SELF_HOST.md rewritten with working example, `docker/apps.yaml.example` + `docker/.env.example` shipped, `/openapi.json` now returns a real spec instead of being swallowed by the SPA wildcard. | Complete |
| 4 | **Complex real-world OpenAPI specs can be ingested via the current Floom pipeline** | wireframe "bring any API" + /build "Paste OpenAPI spec" ramp + "deploy in 30 seconds" | 🟢 **VALIDATED in v0.2.0** — [research/openapi-ingest-stress-test.md](research/openapi-ingest-stress-test.md) + Phase 2 sprint ([floom-monorepo#v0.2.0](https://github.com/floomhq/floom-monorepo/releases/tag/v0.2.0)). Full rewrite landed. Stress test `node test/stress/test-ingest-stress.mjs` passes 4/4 against real specs: **Stripe 587 ops (3,778 refs resolved, 1,801 cyclic handled), GitHub 1,107 ops (9,642 refs → 0), Petstore 19 ops (spec-relative `/api/v3` URL resolved against fetch URL), Resend 83 ops (root base)**. Fixes landed: base_url path-stripping (1-case live verified against Petstore), `spec.servers[]` auto-resolve with variable substitution + spec-relative support, `$ref` via `@apidevtools/json-schema-ref-parser` with circular handling, `allOf` merge + `oneOf`/`anyOf` flatten + `discriminator` enum, header + cookie + multipart file upload support, `FLOOM_MAX_ACTIONS_PER_APP` env var (default 200, 0=unlimited), OAuth2 client_credentials + HTTP Basic, SSE + NDJSON streaming. **The "paste any OpenAPI, get a Floom app" promise is now real for OpenAPI 3.x specs with absolute or spec-relative server URLs and none-to-any auth mode except OAuth2 authorization_code.** | Complete |
| 5 | **MCP is the right primary client surface, not HTTP** | wireframe 4-surface lineup + home ASCII flow + product positioning | 🔴 | Needs market research: what % of AI apps shipped in 2026 are MCP-callable vs HTTP-only? Plus Federico's strategic bet. | Deferred until Vercel research lands (has) + Federico decision |
| 6 | **Better-Auth is the right auth library for OSS Docker core (vs Lucia, Auth.js v5, Supabase Auth, Clerk, WorkOS)** | roadmap phase 2 tech stack | 🟢 **VALIDATED** — [research/better-auth-comparison.md](research/better-auth-comparison.md). Verdict: **pick `better-auth@1.6.3` (MIT, 27.8k GitHub stars, 2.19M weekly npm downloads)** for both phase 2 OSS and phase 3 cloud. Only library on the shortlist that is MIT, runs inside the single-container Docker image with SQLite default, ships first-class Hono + Next.js App Router + Drizzle integrations, AND has plugins covering every production layer Floom needs across OSS and cloud (organizations, teams, SAML/OIDC/SSO, passkeys, magic link, 2FA, API keys, multi-session). **Auth.js merged into Better Auth** in 2026 (`authjs.dev` states "The Auth.js project is now part of Better Auth"); **Lucia explicitly deprecated** ("Lucia is now a learning resource on implementing auth from scratch"); **Clerk** hosted-only + SAML Enterprise-gated at $75/connection + no Docker; **WorkOS** enterprise-priced at $125/SAML connection + no Hono adapter + no self-host; **Supabase Auth** is a Go daemon that hard-requires Postgres, breaking the SQLite default (assumption #7). Phase 2 ships email+password + magic link + Google/GitHub OAuth; phase 3 adds `@better-auth/sso` for SAML without a rewrite. Pin exact version, release cadence is fast (v1.6.0→v1.6.3 in 9 days). | Complete |
| 7 | **SQLite + BYO Postgres is the right OSS DB split** | roadmap + positioning "solo can run it, team can scale" | 🔴 | Research agent: how Supabase/Langfuse/Plausible/Umami handle this split, scaling realities | Queued for Phase 2 planning |
| 8 | **Composio is a viable partner for OAuth integrations (Gmail/Notion/Sheets/Slack/Stripe/etc)** | /build "Connect a tool" ramp + roadmap phase 4 | 🟢 **VALIDATED** — [research/composio-validation.md](research/composio-validation.md). Pick: **Composio** (982 toolkits / 11K tools, Gmail+Notion+Sheets+Slack+Stripe+HubSpot all confirmed, `user_id`-flat keying, automatic token refresh, MCP-native, usage-based pricing $0/$29/$229 NOT per-user). Nango (700+ APIs, Elastic License 2.0 NOT Apache, `connectionId+providerConfigKey`, $50/mo Starter + $1/connection overage) is the documented self-host fallback for Phase 5. **Pipedream Connect dismissed** on per-unique-external-user billing (landmine for two-sided marketplace). **Native OAuth dismissed** on 15-day build cost for 6 providers. **W2.3 dependency inversion resolved with option (a)**: ship with `device_id` fallback cookie (same pattern as W2.1), `user_connections(owner_kind, owner_id, ...)` schema, migrate to real `user_id` post-W3.1 Better Auth via a v0.3→v0.4 runbook in `docs/app-memory.md`. Cost: ~1.5 dev days on top of Phase 4's 2-week budget. W2.3 stays in Wave 2. | Complete |
| 9 | **Stripe partner app is the right creator monetization model (vs Floom-side revenue share)** | memory `project_floom_layers.md` | 🟢 **VALIDATED** — [research/stripe-connect-validation.md](research/stripe-connect-validation.md). Verdict: **Stripe Connect Express accounts** as default, **Standard** as opt-in for existing Stripe users. Direct charges with `application_fee_amount = 5%`, creators are merchant of record, Floom stays out of tax/dispute liability. **Stripe Tax Basic (API) at €0.45/tx** handles VAT/GST globally, Floom never becomes MoR. **Paddle and LemonSqueezy rejected on topology** — both are single-seller MoR platforms, no multi-seller marketplace mode; LemonSqueezy also trending to 10-18% effective fees post-Stripe-acquisition. Gumroad rejected (it IS the marketplace). Raw Stripe-per-creator rejected (no platform fee mechanic). **Dev budget: 8 focused days** in Phase 3 Week 5; if slipping, ship flat subscriptions first and defer metered billing. Phase 4+ upgrade: Stripe Managed Payments as premium MoR tier. | Complete |
| 10 | **Multi-tenant install per Floom app with per-user state + per-user credentials is feasible in OSS Docker** | wireframe creator dashboard "App memory" + positioning "multi-tenant moat" | 🟢 **VALIDATED** — [research/multi-tenant-architecture.md](research/multi-tenant-architecture.md). Pattern: one DB, `workspace_id` on every row, app-code scoping via `scoped(db, ctx)` helper (NOT Postgres RLS), `app_memory` keyed by `(workspace_id, app_slug, user_id, key)` as JSONB, `user_secrets` AES-256-GCM envelope-encrypted with per-workspace DEK wrapped by `FLOOM_MASTER_KEY` KEK. Solo OSS = synthetic `workspace_id = 'local'`, same schema, zero feature flags — Langfuse's "single-user as special case of multi-user" rule. SQLite holds ~100 concurrent users on 1 vCPU, ~500 on 4 vCPU; `FLOOM_DATABASE_URL=postgres://...` flips driver without schema change. Session re-key from anonymous `device_id` to `user_id` is one idempotent UPDATE per table, runs once on first login. Unblocks W2.1 (per-user session state) and W3.1 (workspaces), both ship v0.3.0 schema + v0.4.0 UI. | Complete |
| 11 | **"GitHub import" (paste a repo URL → auto-detect spec/Dockerfile → deploy) is a ~30-second flow** | v7 /build primary ramp + positioning "deploy in 30 seconds" | 🔴 | Research agent: how Vercel / Railway / Render do GitHub import, what Floom would need to build for MVP | Queued for Phase 2 |
| 12 | **Custom domain routing (yourname.floom.dev, myapp.com) is achievable with a standard reverse proxy + Let's Encrypt** | wireframe custom domain section + layers | 🟢 **VALIDATED** — [research/wildcard-ssl-at-scale.md](research/wildcard-ssl-at-scale.md). Recommendation: **Caddy 2 two-layer hybrid**. (a) `*.floom.dev` creator subdomains via Let's Encrypt DNS-01 against IONOS (lego provider `ionos`, supported since 4.2.0 — one wildcard cert covers infinite creator subdomains, never hits LE rate limits). (b) BYO custom domains (`myapp.com`) via Caddy on-demand TLS with a mandatory `/internal/caddy-ask` endpoint backed by the `custom_domains` table. First-hit latency 2-5s, cached forever after. Caddy scales to "tens of thousands" per docs, proven at Vercel/Render/Railway scale. **Cloudflare SSL-for-SaaS** ($0.10/hostname/month, 100 free, 50k max per zone) ships dark as runtime-switchable fallback via `FLOOM_SSL_BACKEND` env var; flip triggers: >5 LE failures/day, >500 BYO domains, or any LE 429. **Current nginx-per-vhost + certbot pattern on AX41 does not scale** and must be replaced with Caddy for the Floom edge (other vhosts stay on nginx). W3.2 shippable in 3 days: Day 1 wildcard tier, Day 2 BYO layer, Day 3 monitoring + Cloudflare escape hatch. | Complete |
| 13 | **Per-app rate limiting is achievable in multi-tenant OSS with a shared Redis** | dashboard rate limit sliders | 🔴 | Architecture research: token bucket vs leaky bucket, Redis patterns, fallback when no Redis in self-host | Queued for Phase 2 |
| 14 | **The feedback inbox email delivery path works in OSS self-host without SMTP config** | wireframe feedback inbox | 🔴 | Decision: require SMTP setup (friction) or skip email delivery in OSS (features disabled)? | Queued for Phase 2 |
| 15 | **Full-text search for the /store works on 15-1000 apps without a dedicated search engine** | wireframe /store search | 🔴 | Research agent: Postgres tsvector vs Meilisearch vs Typesense, what's easiest in Docker | Queued for Phase 2 |
| 16 | **Secrets encryption in OSS SQLite is trustworthy with a user-provided master key** | wireframe secrets vault | 🔴 | Architecture research: age / libsodium / node-crypto patterns, key rotation, compromise scenarios | Queued for Phase 2 |
| 17 | **Each Floom app can declare its own identity-provider config (Google/GitHub/OIDC/SAML/Magic link)** | creator dashboard "Authentication" section | 🔴 | Architecture research: how to do multi-provider per-tenant auth without exploding complexity; are there libraries? | Queued for Phase 2/3 |
| 18 | **The Vercel Workflow SDK and Vercel open-agents are complements, not competitors, for Floom** | positioning memory + v7 copy reframe | 🟢 | Already validated by the Vercel research agent that ran earlier. Result: Floom is not a runtime, position as complement. | Complete |
| 19 | **Langfuse/n8n/Supabase's open-core split is replicable for Floom** | roadmap phase 2-3 | 🟢 | Already validated by competitive benchmark audit that ran earlier. Result: Floom should copy the split exactly. | Complete |
| 20 | **Biz users will install AI apps from a community app store instead of building their own** | whole product thesis | 🔴 | Market research: Notion template gallery adoption, Zapier template library, HF Spaces install counts, ChatGPT GPT Store drop-off | Deferred — market research, not agent-solvable |

## Validation batches

### Batch 1 — Running now (4 agents in parallel)

- **Agent 1**: Spec-to-UI rendering reference (assumption #1)
- **Agent 2**: MCP integration end-to-end validation (assumption #2)
- **Agent 3**: Self-host docker validation (assumption #3)
- **Agent 4**: OpenAPI ingest stress test (assumption #4)

Expected completion: 30-60 min each.

### Batch 2 — Queued for Phase 2 planning (Docker core)

Assumptions #6, #7, #10, #11, #12, #13, #14, #15, #16, #17. Fire after Batch 1 lands.

### Batch 3 — Queued for Phase 3 planning (Cloud)

Assumptions #8, #9. Fire when Phase 3 kicks off.

### Unresolvable by agent

Assumptions #5 (surface primacy) and #20 (biz user install hypothesis) require Federico's strategic bet + real market data. Agent can inform, cannot decide.

## How to use this doc

When an agent lands:
1. Flip the assumption's status from 🟡 to 🟢 (validated) or ❌ (invalidated)
2. Link the agent's output file
3. If invalidated, note what needs to change in wireframe/roadmap
4. Update this file's "Last updated" stamp

When Federico makes a strategic decision:
1. Flip status from 🔴 to 🟢 with the decision noted

When all 20 are 🟢 or ⚫️, we have "everything defined and prepared" and can confidently start building Phase 1.