Published June 1, 2026 in Inside Upsoma

From 90-Second Timeouts to Sub-Second Pathways: How We Made Upsoma Lightning Fast

Author: Upsoma Engineering

The Problem

Upsoma generates personalized learning pathways using an external AI service. When a user clicks "Generate," the system calls an AI endpoint, waits for a structured pathway response, saves it to a MySQL database, and then navigates the user to their new learning plan.

In production, this flow was failing. Users were hitting timeouts. Gateway proxies were killing connections. The experience was broken.

Here is the story of how we diagnosed the root causes, implemented fixes layer by layer, and ultimately built a two-speed architecture that serves recommended pathways in under 500 milliseconds.

Phase 1: The Database Layer Was Drowning

Diagnosis

The first investigation revealed the save-pathway API was doing something deeply inefficient. When a pathway came back from the AI generator, the code needed to persist a tree structure: one pathway, multiple modules, each with chapters, each with resource groups, each with resource sections, each with individual resources and books.

The original implementation used nested for loops with individual Prisma create() calls:

for each module:
  await prisma.module.create(...)
  for each chapter:
    await prisma.chapter.create(...)
    for each resource type:
      await prisma.resourceSection.create(...)
      for each resource:
        await prisma.resource.create(...)

A typical pathway with 6 modules, 30 chapters, and 300+ resources generated 400+ individual SQL INSERT statements, all wrapped in a single Prisma interactive transaction. The transaction would regularly exceed its default timeout and fail.

On top of that, every API route file was instantiating its own PrismaClient:

const prisma = new PrismaClient();

Each request created a new connection pool. And each handler called prisma.$disconnect() in a finally block, tearing it down immediately after. Under any concurrency, the database connection limit was exhausted.

The Fix: Bulk Inserts with Pre-Generated IDs

We rewrote the entire module tree persistence strategy. Instead of creating records one by one, we pre-generate all IDs upfront using crypto.randomUUID(), collect every row into flat arrays, and execute approximately six createMany() calls regardless of data size:

const moduleRows: any[] = []
const chapterRows: any[] = []
const sectionRows: any[] = []
const rgRows: any[] = []
const resourceRows: any[] = []
const bookRows: any[] = []

for (const mod of modules) {
  const moduleId = crypto.randomUUID()
  moduleRows.push({ id: moduleId, pathwayId, ... })

  for (const chapter of mod.chapters) {
    const chapterId = crypto.randomUUID()
    chapterRows.push({ id: chapterId, moduleId, ... })
    // ... collect resources into flat arrays
  }
}

// ~6 queries total, regardless of data size
await tx.lPModule.createMany({ data: moduleRows })
await tx.lPChapter.createMany({ data: chapterRows })
await tx.resourceSection.createMany({ data: sectionRows })
await tx.lPResourceGroup.createMany({ data: rgRows })
await tx.resource.createMany({ data: resourceRows })
await tx.book.createMany({ data: bookRows })

This reduced 400+ queries to 6. The key insight: by pre-generating IDs, we can reference foreign keys before the parent rows exist in the database, because everything gets inserted in dependency order within the same transaction.

We also added explicit transaction timeouts (maxWait: 10s, timeout: 30s), replaced all new PrismaClient() instances with a shared singleton, and moved the expensive deep-read query (7-table nested include) outside the write transaction so it does not hold locks.

Orphan Cleanup

The Prisma schema uses relationMode = "prisma" (emulated cascades, not database-level foreign keys). This means deleting a module cascades to chapters and resource groups, but ResourceSection, Resource, and chapter-level Book records are not cascade-deleted. They become orphans.

We added an explicit cleanup function that walks the tree before re-inserting:

const cleanupExistingTree = async (tx, pathwayId) => {
  const moduleIds = await tx.lPModule.findMany({ where: { pathwayId } });
  const chapterIds = await tx.lPChapter.findMany({
    where: { moduleId: { in: moduleIds } }
  });
  const rgs = await tx.lPResourceGroup.findMany({
    where: { chapterId: { in: chapterIds } }
  });

  // Delete orphan-prone records explicitly
  await tx.resource.deleteMany({ where: { sectionId: { in: sectionIds } } });
  await tx.resourceSection.deleteMany({ where: { id: { in: sectionIds } } });
  await tx.book.deleteMany({ where: { lpResourceGroupId: { in: rgIds } } });

  // Now Prisma-emulated cascade handles the rest
  await tx.lPModule.deleteMany({ where: { pathwayId } });
};

Phase 2: The Real Bottleneck Was Silence

After the database optimizations, the bulk inserts were fast. But users were still timing out. The problem had shifted.

Diagnosis

The external AI generator takes 30 to 90 seconds to produce a pathway. During that time, the server held an open HTTP connection but sent zero bytes. Gateway proxies, CDNs, and load balancers have idle connection timeouts (typically 30-60 seconds). They would kill the connection before any response arrived.

The previous retry configuration made this worse: 280 seconds per attempt, 3 retries. Worst case: 840 seconds of total budget, far exceeding the 300-second Vercel function limit.

The Fix: NDJSON Streaming with Heartbeats

We switched the single-pathway mode from a standard JSON response to an NDJSON (newline-delimited JSON) stream. The server sends bytes immediately and heartbeats every 15 seconds while the generator works:

function initStream(res: NextApiResponse) {
  res.writeHead(200, {
    "Content-Type": "application/x-ndjson",
    "Cache-Control": "no-cache, no-transform",
    "X-Accel-Buffering": "no"
  });

  const send = (data) => {
    if (!res.writableEnded) res.write(JSON.stringify(data) + "\n");
  };

  const heartbeat = setInterval(() => send({ status: "heartbeat" }), 15_000);

  return { send, cleanup: () => clearInterval(heartbeat) };
}

The client reads the stream progressively:

const reader = res.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n");
  buffer = lines.pop() || "";

  for (const line of lines) {
    const event = JSON.parse(line);
    if (event.status === "complete") {
      pathwaySlugOrId = event.pathway?.slug;
    }
  }
}

Events flow like this:

{"status":"generating","message":"Generating pathway..."}
{"status":"heartbeat"}
{"status":"heartbeat"}
{"status":"saving","message":"Saving pathway..."}
{"status":"complete","pathway":{"id":"abc","slug":"backend-engg-pathway"}}

We also introduced upsertLpPathwayLite, a write-only variant that skips the expensive deep-read after saving. The streaming client only needs { id, slug } to navigate.

We tightened the retry budget: 90 seconds per attempt, 2 attempts maximum. Worst case: 90s + 1s backoff + 90s = 181 seconds, well within the 300-second function limit.

Phase 3: The Generator Response Was Wrapped

After deploying the streaming fix, a new error appeared: "Invalid pathway data received from generator (missing id)."

The static JSON files have the pathway data flat at the top level: { id: "backend-engg-pathway", title: "...", modules: [...] }. But the external AI generator wraps its response in a container: { pathway: { id: "...", ... } }.

We added an unwrapping function that handles both formats:

function unwrapGeneratorResponse(raw: unknown): AnyRecord {
  if (!raw || typeof raw !== "object") return {};
  const obj = raw as AnyRecord;

  if (obj.id) return obj;

  for (const key of ["pathway", "data", "result"]) {
    const inner = obj[key];
    if (inner && typeof inner === "object" && inner.id) {
      return inner;
    }
  }

  console.error(
    "[save-pathway] Unrecognised response. Top-level keys:",
    Object.keys(obj)
  );
  return obj;
}

The diagnostic log at the end ensures that if the generator changes its response format in the future, we get a clear indication of what keys are present.

Phase 4: Making It Lightning Fast

With timeouts solved, we turned to speed. The generation flow still took 30-90 seconds for the AI call, plus roughly 2 seconds of sequential post-generation round trips:

Verify the pathway exists (GET /api/get-pathway) ~500ms
Open a usage session (POST /api/pathways/{slug}/open) ~500ms
Fetch saved snapshot (GET /api/pathways/{slug}/saved) ~500ms
Navigate to the stream page

For custom input that requires the AI generator, that 30-90 seconds is unavoidable. But for the 9 recommended topics displayed on the homepage? They already have pre-built JSON data files. Calling the AI generator for them was unnecessary.

The Two-Speed Architecture

Speed 1: Instant Path for Recommended Topics

We created a single endpoint that collapses the entire chain into one round trip:

POST /api/pathways/{slug}/quick-start
Body: { userId: "..." }
Response: { pathwaySlug: "...", usageId: "...", created: true }

Internally, quick-start does three things:

Checks if the pathway exists in the database. If it does, skip to step 3.
If missing, and the slug is a known static pathway, reads the JSON file from the data/ directory and inserts it via bulk createMany() (~200ms).
Creates or reuses a PathwayUsage record with an embedded snapshot, then returns the slug and usage ID.

The 9 known static slugs are defined in a shared constants file used by both the quick-start endpoint and the static seeding endpoint:

export const KNOWN_STATIC_SLUGS = new Set([
  "backend-engg-pathway",
  "cybersecurity-pathway",
  "data-analysis-pathway",
  "ds-ml-interview-prep",
  "ds-ml-pathway",
  "frntend-eng-pathway",
  "fullstack-eng-pathway",
  "swe-interview-prep",
  "sys-eng-devops-pathway"
]);

On the client side, the composer detects when a recommended topic is selected and takes the fast path:

if (activeTopicId && KNOWN_TOPIC_IDS.has(activeTopicId)) {
  const quickRes = await fetch(`/api/pathways/${activeTopicId}/quick-start`, {
    method: "POST",
    body: JSON.stringify({ userId })
  });
  const { pathwaySlug, usageId } = await quickRes.json();
  // Navigate immediately
  setPendingPathwayId(pathwaySlug);
  setPendingUsageId(usageId);
  return;
}

Result: 30-90 seconds drops to under 500 milliseconds. The user clicks a recommended topic, and they are in their pathway almost instantly.

Speculative Prefetch

We went a step further. When a user clicks a recommended topic (before they even click "Generate"), we fire a background request to the quick-start endpoint:

const handleTopicSelect = (topic) => {
  setInputText(topic.value);
  setActiveTopicId(topic.id);

  // Start prefetching in the background
  if (KNOWN_TOPIC_IDS.has(topic.id) && userId) {
    fetch(`/api/pathways/${topic.id}/quick-start`, {
      method: "POST",
      body: JSON.stringify({ userId })
    }).catch(() => {});
  }
};

This ensures the pathway is seeded and the usage record exists before the user even clicks Generate. When they do click, the quick-start call finds an existing usage and returns immediately. The prefetch is fire-and-forget; failure is silently swallowed.

Speed 2: Optimized Path for Custom Input

For custom pathways that do require the AI generator, we made two targeted improvements:

Removed the verification step. After the NDJSON stream returns a complete event with the pathway slug, we no longer make a separate GET request to verify the pathway exists. We just saved it; we know it exists.

Server-side request deduplication. If two users submit identical payloads at the same time (e.g., both type "Backend Development"), the server makes only one call to the external generator:

const inflight = new Map<string, Promise<any>>();

async function callGenerator(payload, client) {
  const key = JSON.stringify({ p: payload, b: client.defaults.baseURL });

  const existing = inflight.get(key);
  if (existing) return existing;

  const promise = callGeneratorRaw(payload, client).finally(() => {
    inflight.delete(key);
  });
  inflight.set(key, promise);
  return promise;
}

The second request piggybacks on the first. Both callers await the same promise. When it resolves, both proceed with the same result. The map entry is cleaned up on completion (success or failure), so there is no memory leak.

Results

Scenario	Before	After
Recommended topic generation	30-90s + 2s post-processing	<500ms total
Custom pathway (AI generator)	30-90s + 2s post-processing	30-90s + 0s post-processing
Database writes per pathway	~400 individual INSERT statements	~6 bulk INSERT statements
Concurrent identical requests	N generator calls	1 generator call (deduplicated)
Connection pool behavior	New pool per request, destroyed after	Shared singleton, persistent
Gateway timeout resilience	Zero bytes sent during generation	Heartbeat every 15s

Lessons Learned

Start with the data layer. The flashiest optimization was the two-speed system, but the foundational fix was the database layer. Bulk inserts with pre-generated IDs is a pattern worth internalizing. When your ORM wants to do N queries, sometimes you need to take control and do 6.

Silence kills connections. HTTP connections that send zero bytes are fragile. Any proxy, CDN, or load balancer in the path can terminate them. Streaming with heartbeats is the robust pattern for long-running operations.

Pre-built data is the fastest data. If you have known, static content, serve it from the database. Do not call an external AI service to regenerate content you already have. This seems obvious in retrospect, but it is easy to build a single code path and route everything through it.

Speculative prefetch is free performance. The user clicked a topic. They are going to click Generate. Start working before they ask. The worst case is a wasted background request. The best case is the user perceives zero latency.

Deduplication is cheap insurance. A module-level Map with a JSON key is trivial to implement. Under load, it prevents redundant calls to expensive external services. The cleanup happens automatically via .finally().

This post documents the engineering work done on the Upsoma pathway generation system across four iterations of diagnosis and optimization.