From 90-Second Timeouts to Sub-Second Pathways: How We Made Upsoma Lightning Fast
The Problem
Upsoma generates personalized learning pathways using an external AI service. When a user clicks "Generate," the system calls an AI endpoint, waits for a structured pathway response, saves it to a MySQL database, and then navigates the user to their new learning plan.
In production, this flow was failing. Users were hitting timeouts. Gateway proxies were killing connections. The experience was broken.
Here is the story of how we diagnosed the root causes, implemented fixes layer by layer, and ultimately built a two-speed architecture that serves recommended pathways in under 500 milliseconds.
Phase 1: The Database Layer Was Drowning
Diagnosis
The first investigation revealed the save-pathway API was doing something deeply inefficient. When a pathway came back from the AI generator, the code needed to persist a tree structure: one pathway, multiple modules, each with chapters, each with resource groups, each with resource sections, each with individual resources and books.
The original implementation used nested for loops with individual Prisma create() calls:
for each module:
await prisma.module.create(...)
for each chapter:
await prisma.chapter.create(...)
for each resource type:
await prisma.resourceSection.create(...)
for each resource:
await prisma.resource.create(...)A typical pathway with 6 modules, 30 chapters, and 300+ resources generated 400+ individual SQL INSERT statements, all wrapped in a single Prisma interactive transaction. The transaction would regularly exceed its default timeout and fail.
On top of that, every API route file was instantiating its own PrismaClient:
const prisma = new PrismaClient();Each request created a new connection pool. And each handler called prisma.$disconnect() in a finally block, tearing it down immediately after. Under any concurrency, the database connection limit was exhausted.
The Fix: Bulk Inserts with Pre-Generated IDs
We rewrote the entire module tree persistence strategy. Instead of creating records one by one, we pre-generate all IDs upfront using crypto.randomUUID(), collect every row into flat arrays, and execute approximately six createMany() calls regardless of data size:
const moduleRows: any[] = []
const chapterRows: any[] = []
const sectionRows: any[] = []
const rgRows: any[] = []
const resourceRows: any[] = []
const bookRows: any[] = []
for (const mod of modules) {
const moduleId = crypto.randomUUID()
moduleRows.push({ id: moduleId, pathwayId, ... })
for (const chapter of mod.chapters) {
const chapterId = crypto.randomUUID()
chapterRows.push({ id: chapterId, moduleId, ... })
// ... collect resources into flat arrays
}
}
// ~6 queries total, regardless of data size
await tx.lPModule.createMany({ data: moduleRows })
await tx.lPChapter.createMany({ data: chapterRows })
await tx.resourceSection.createMany({ data: sectionRows })
await tx.lPResourceGroup.createMany({ data: rgRows })
await tx.resource.createMany({ data: resourceRows })
await tx.book.createMany({ data: bookRows })This reduced 400+ queries to 6. The key insight: by pre-generating IDs, we can reference foreign keys before the parent rows exist in the database, because everything gets inserted in dependency order within the same transaction.
We also added explicit transaction timeouts (maxWait: 10s, timeout: 30s), replaced all new PrismaClient() instances with a shared singleton, and moved the expensive deep-read query (7-table nested include) outside the write transaction so it does not hold locks.
Orphan Cleanup
The Prisma schema uses relationMode = "prisma" (emulated cascades, not database-level foreign keys). This means deleting a module cascades to chapters and resource groups, but ResourceSection, Resource, and chapter-level Book records are not cascade-deleted. They become orphans.
We added an explicit cleanup function that walks the tree before re-inserting:
const cleanupExistingTree = async (tx, pathwayId) => {
const moduleIds = await tx.lPModule.findMany({ where: { pathwayId } });
const chapterIds = await tx.lPChapter.findMany({
where: { moduleId: { in: moduleIds } }
});
const rgs = await tx.lPResourceGroup.findMany({
where: { chapterId: { in: chapterIds } }
});
// Delete orphan-prone records explicitly
await tx.resource.deleteMany({ where: { sectionId: { in: sectionIds } } });
await tx.resourceSection.deleteMany({ where: { id: { in: sectionIds } } });
await tx.book.deleteMany({ where: { lpResourceGroupId: { in: rgIds } } });
// Now Prisma-emulated cascade handles the rest
await tx.lPModule.deleteMany({ where: { pathwayId } });
};Phase 2: The Real Bottleneck Was Silence
After the database optimizations, the bulk inserts were fast. But users were still timing out. The problem had shifted.
Diagnosis
The external AI generator takes 30 to 90 seconds to produce a pathway. During that time, the server held an open HTTP connection but sent zero bytes. Gateway proxies, CDNs, and load balancers have idle connection timeouts (typically 30-60 seconds). They would kill the connection before any response arrived.
The previous retry configuration made this worse: 280 seconds per attempt, 3 retries. Worst case: 840 seconds of total budget, far exceeding the 300-second Vercel function limit.
The Fix: NDJSON Streaming with Heartbeats
We switched the single-pathway mode from a standard JSON response to an NDJSON (newline-delimited JSON) stream. The server sends bytes immediately and heartbeats every 15 seconds while the generator works:
function initStream(res: NextApiResponse) {
res.writeHead(200, {
"Content-Type": "application/x-ndjson",
"Cache-Control": "no-cache, no-transform",
"X-Accel-Buffering": "no"
});
const send = (data) => {
if (!res.writableEnded) res.write(JSON.stringify(data) + "\n");
};
const heartbeat = setInterval(() => send({ status: "heartbeat" }), 15_000);
return { send, cleanup: () => clearInterval(heartbeat) };
}The client reads the stream progressively:
const reader = res.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() || "";
for (const line of lines) {
const event = JSON.parse(line);
if (event.status === "complete") {
pathwaySlugOrId = event.pathway?.slug;
}
}
}Events flow like this:
{"status":"generating","message":"Generating pathway..."}
{"status":"heartbeat"}
{"status":"heartbeat"}
{"status":"saving","message":"Saving pathway..."}
{"status":"complete","pathway":{"id":"abc","slug":"backend-engg-pathway"}}We also introduced upsertLpPathwayLite, a write-only variant that skips the expensive deep-read after saving. The streaming client only needs { id, slug } to navigate.
We tightened the retry budget: 90 seconds per attempt, 2 attempts maximum. Worst case: 90s + 1s backoff + 90s = 181 seconds, well within the 300-second function limit.
Phase 3: The Generator Response Was Wrapped
After deploying the streaming fix, a new error appeared: "Invalid pathway data received from generator (missing id)."
The static JSON files have the pathway data flat at the top level: { id: "backend-engg-pathway", title: "...", modules: [...] }. But the external AI generator wraps its response in a container: { pathway: { id: "...", ... } }.
We added an unwrapping function that handles both formats:
function unwrapGeneratorResponse(raw: unknown): AnyRecord {
if (!raw || typeof raw !== "object") return {};
const obj = raw as AnyRecord;
if (obj.id) return obj;
for (const key of ["pathway", "data", "result"]) {
const inner = obj[key];
if (inner && typeof inner === "object" && inner.id) {
return inner;
}
}
console.error(
"[save-pathway] Unrecognised response. Top-level keys:",
Object.keys(obj)
);
return obj;
}The diagnostic log at the end ensures that if the generator changes its response format in the future, we get a clear indication of what keys are present.
Phase 4: Making It Lightning Fast
With timeouts solved, we turned to speed. The generation flow still took 30-90 seconds for the AI call, plus roughly 2 seconds of sequential post-generation round trips:
- Verify the pathway exists (GET /api/get-pathway) ~500ms
- Open a usage session (POST /api/pathways/{slug}/open) ~500ms
- Fetch saved snapshot (GET /api/pathways/{slug}/saved) ~500ms
- Navigate to the stream page
For custom input that requires the AI generator, that 30-90 seconds is unavoidable. But for the 9 recommended topics displayed on the homepage? They already have pre-built JSON data files. Calling the AI generator for them was unnecessary.
The Two-Speed Architecture
Speed 1: Instant Path for Recommended Topics
We created a single endpoint that collapses the entire chain into one round trip:
POST /api/pathways/{slug}/quick-start
Body: { userId: "..." }
Response: { pathwaySlug: "...", usageId: "...", created: true }Internally, quick-start does three things:
- Checks if the pathway exists in the database. If it does, skip to step 3.
- If missing, and the slug is a known static pathway, reads the JSON file from the
data/directory and inserts it via bulkcreateMany()(~200ms). - Creates or reuses a PathwayUsage record with an embedded snapshot, then returns the slug and usage ID.
The 9 known static slugs are defined in a shared constants file used by both the quick-start endpoint and the static seeding endpoint:
export const KNOWN_STATIC_SLUGS = new Set([
"backend-engg-pathway",
"cybersecurity-pathway",
"data-analysis-pathway",
"ds-ml-interview-prep",
"ds-ml-pathway",
"frntend-eng-pathway",
"fullstack-eng-pathway",
"swe-interview-prep",
"sys-eng-devops-pathway"
]);On the client side, the composer detects when a recommended topic is selected and takes the fast path:
if (activeTopicId && KNOWN_TOPIC_IDS.has(activeTopicId)) {
const quickRes = await fetch(`/api/pathways/${activeTopicId}/quick-start`, {
method: "POST",
body: JSON.stringify({ userId })
});
const { pathwaySlug, usageId } = await quickRes.json();
// Navigate immediately
setPendingPathwayId(pathwaySlug);
setPendingUsageId(usageId);
return;
}Result: 30-90 seconds drops to under 500 milliseconds. The user clicks a recommended topic, and they are in their pathway almost instantly.
Speculative Prefetch
We went a step further. When a user clicks a recommended topic (before they even click "Generate"), we fire a background request to the quick-start endpoint:
const handleTopicSelect = (topic) => {
setInputText(topic.value);
setActiveTopicId(topic.id);
// Start prefetching in the background
if (KNOWN_TOPIC_IDS.has(topic.id) && userId) {
fetch(`/api/pathways/${topic.id}/quick-start`, {
method: "POST",
body: JSON.stringify({ userId })
}).catch(() => {});
}
};This ensures the pathway is seeded and the usage record exists before the user even clicks Generate. When they do click, the quick-start call finds an existing usage and returns immediately. The prefetch is fire-and-forget; failure is silently swallowed.
Speed 2: Optimized Path for Custom Input
For custom pathways that do require the AI generator, we made two targeted improvements:
Removed the verification step. After the NDJSON stream returns a complete event with the pathway slug, we no longer make a separate GET request to verify the pathway exists. We just saved it; we know it exists.
Server-side request deduplication. If two users submit identical payloads at the same time (e.g., both type "Backend Development"), the server makes only one call to the external generator:
const inflight = new Map<string, Promise<any>>();
async function callGenerator(payload, client) {
const key = JSON.stringify({ p: payload, b: client.defaults.baseURL });
const existing = inflight.get(key);
if (existing) return existing;
const promise = callGeneratorRaw(payload, client).finally(() => {
inflight.delete(key);
});
inflight.set(key, promise);
return promise;
}The second request piggybacks on the first. Both callers await the same promise. When it resolves, both proceed with the same result. The map entry is cleaned up on completion (success or failure), so there is no memory leak.
Results
| Scenario | Before | After |
|---|---|---|
| Recommended topic generation | 30-90s + 2s post-processing | <500ms total |
| Custom pathway (AI generator) | 30-90s + 2s post-processing | 30-90s + 0s post-processing |
| Database writes per pathway | ~400 individual INSERT statements | ~6 bulk INSERT statements |
| Concurrent identical requests | N generator calls | 1 generator call (deduplicated) |
| Connection pool behavior | New pool per request, destroyed after | Shared singleton, persistent |
| Gateway timeout resilience | Zero bytes sent during generation | Heartbeat every 15s |
Lessons Learned
Start with the data layer. The flashiest optimization was the two-speed system, but the foundational fix was the database layer. Bulk inserts with pre-generated IDs is a pattern worth internalizing. When your ORM wants to do N queries, sometimes you need to take control and do 6.
Silence kills connections. HTTP connections that send zero bytes are fragile. Any proxy, CDN, or load balancer in the path can terminate them. Streaming with heartbeats is the robust pattern for long-running operations.
Pre-built data is the fastest data. If you have known, static content, serve it from the database. Do not call an external AI service to regenerate content you already have. This seems obvious in retrospect, but it is easy to build a single code path and route everything through it.
Speculative prefetch is free performance. The user clicked a topic. They are going to click Generate. Start working before they ask. The worst case is a wasted background request. The best case is the user perceives zero latency.
Deduplication is cheap insurance. A module-level Map with a JSON key is trivial to implement. Under load, it prevents redundant calls to expensive external services. The cleanup happens automatically via .finally().
This post documents the engineering work done on the Upsoma pathway generation system across four iterations of diagnosis and optimization.