The Invisible Waterfall in Next.js 15 Suspense Streaming

Streaming sounds like magic. You drop a <Suspense> boundary, add a loading.tsx file, and Next.js promises to flush HTML to the browser as soon as each piece resolves. In practice, many App Router apps that look streamed are actually serialized. A 1.8 second time-to-first-byte where the same backend could be delivering 600ms is depressingly common. The culprit is almost never the framework. It is the placement of await calls in the component tree.

This article walks through the diagnostic pattern, three concrete refactors, and a rule of thumb for which one applies. By the end you should be able to look at a slow App Router page and predict, before opening DevTools, whether the bottleneck is a waterfall.

What streaming actually does

Before talking about waterfalls, a quick recap on what Next.js 15 streaming buys you. With the App Router and React Server Components, the framework can return an HTTP response whose body is progressively flushed. Each Suspense boundary corresponds to a chunk: the boundary's fallback is shipped first, and the resolved content is patched in via out-of-order streaming once its promise settles. The mechanics live in React itself, with Next.js exposing them through file conventions like loading.tsx.

One detail bites people. Streaming is opt-in per boundary, not per component. A <Suspense> boundary only helps if there is a promise inside it that the boundary can wait on. If you await above the boundary, the boundary has nothing to do.

The diagnosis: three awaits, one round-trip-shaped trap

Picture a dashboard with this shape:

app/dashboard/layout.tsx awaits the current user
app/dashboard/page.tsx awaits the dashboard summary
<Chart /> inside the page awaits the chart series

Each await runs on the server, and each blocks the response from being flushed. Even with loading.tsx in place at the layout level, the actual HTTP body sits in the framework's writer queue until the layout's user fetch resolves. Then the page fetches the summary. Then the chart fetches its series. Three sequential round-trips to your API, executed serially on the server, then delivered as a single chunk because nothing actually streamed.

A Chrome DevTools waterfall for this case looks like one fat block ending at the 1.8 second mark, with no progressive rendering. The framework did its job. The data layer was misshapen.

Reproducing the trap

Here is a minimal reproduction. The three fetches are independent: the user, the summary, and the chart series all hit different endpoints and do not depend on each other. The shape below serializes them anyway.

// app/dashboard/layout.tsx
export default async function DashboardLayout({
  children,
}: {
  children: React.ReactNode
}) {
  const user = await fetch('https://api.example.com/me', {
    cache: 'no-store',
  }).then((r) => r.json())

  return (
    <section>
      <header>Welcome, {user.name}</header>
      {children}
    </section>
  )
}

// app/dashboard/page.tsx
export default async function DashboardPage() {
  const summary = await fetch('https://api.example.com/summary', {
    cache: 'no-store',
  }).then((r) => r.json())

  return (
    <div>
      <p>Active sessions: {summary.activeSessions}</p>
      <Chart />
    </div>
  )
}

// app/dashboard/chart.tsx
export default async function Chart() {
  const series = await fetch('https://api.example.com/chart-series', {
    cache: 'no-store',
  }).then((r) => r.json())

  return <pre>{JSON.stringify(series, null, 2)}</pre>
}

Three awaits, three sequential network calls on the server, one delivered chunk. On a backend where each request costs roughly 600ms, the total is about 1.8 seconds. A single loading.tsx at any level shows the spinner during that entire window, then snaps to the full page.

Refactor 1: move-await-down

The first refactor moves the await as close to the consumer as possible, then wraps it in <Suspense>. A user pill in the header does not need to block the chart. A chart does not need to block the summary. Each independent leaf becomes its own streamable unit.

// app/dashboard/layout.tsx
import { Suspense } from 'react'
import { UserHeader } from './user-header'

export default function DashboardLayout({
  children,
}: {
  children: React.ReactNode
}) {
  return (
    <section>
      <Suspense fallback={<header>Welcome…</header>}>
        <UserHeader />
      </Suspense>
      {children}
    </section>
  )
}

// app/dashboard/user-header.tsx
export async function UserHeader() {
  const user = await fetch('https://api.example.com/me', {
    cache: 'no-store',
  }).then((r) => r.json())
  return <header>Welcome, {user.name}</header>
}

The layout itself is no longer async. That await moved into UserHeader, which is wrapped in a Suspense boundary. The framework can flush the layout shell and the children slot immediately, then patch the header in once the user fetch resolves. The same pattern applies to the page-level summary fetch and to the chart series.

This pattern alone is usually a 30 to 50 percent improvement on a three-level await chain, because the layout shell ships before any data layer work begins.

Refactor 2: parallelize-promises

Moving awaits down unblocks the shell, but two siblings inside the same boundary can still be serialized if a single component awaits them sequentially. Suppose the page needs both the summary and a list of recent activity, both feeding into the same UI region:

// Before: serial inside one component
export default async function DashboardPage() {
  const summary = await fetch('https://api.example.com/summary').then((r) =>
    r.json(),
  )
  const recent = await fetch('https://api.example.com/recent').then((r) =>
    r.json(),
  )
  return <SummaryAndRecent summary={summary} recent={recent} />
}

That code costs summary_ms + recent_ms. With Promise.all the cost drops to max(summary_ms, recent_ms):

// After: parallel
export default async function DashboardPage() {
  const [summary, recent] = await Promise.all([
    fetch('https://api.example.com/summary').then((r) => r.json()),
    fetch('https://api.example.com/recent').then((r) => r.json()),
  ])
  return <SummaryAndRecent summary={summary} recent={recent} />
}

The rule is simple. If two fetches do not depend on each other's result, Promise.all them. If they do depend on each other (you need the user id to fetch their preferences), keep them sequential and consider whether the second fetch can move into a deeper Suspense boundary instead.

One subtle gotcha with Promise.all: any rejection cancels the whole batch from the caller's perspective. For independent fetches where one failure should not block the others, prefer Promise.allSettled and handle the results object explicitly. React's notes on data fetching patterns at https://react.dev/reference/react/use cover the trade-offs in more depth.

Refactor 3: split-suspense-boundaries

A third refactor matters when the page has visually distinct regions that resolve at very different speeds. A dashboard with a fast summary card and a slow chart should ship the summary first, not wait for the chart. Two sibling Suspense boundaries with their own fallbacks do exactly that.

// app/dashboard/page.tsx
import { Suspense } from 'react'
import { Summary } from './summary'
import { Chart } from './chart'

export default function DashboardPage() {
  return (
    <div>
      <Suspense fallback={<SummarySkeleton />}>
        <Summary />
      </Suspense>
      <Suspense fallback={<ChartSkeleton />}>
        <Chart />
      </Suspense>
    </div>
  )
}

Both Summary and Chart start their fetches as soon as the page renders on the server, because React begins rendering Server Components eagerly and only suspends on the await. The summary chunk flushes when its fetch resolves; the chart chunk flushes independently. A user sees the summary at roughly 600ms and the chart at roughly 1.2s instead of staring at a single spinner until 1.8s.

A sibling loading.tsx at the route level is appropriate when the entire route is slow and the layout shell is the only thing you can ship eagerly. Inline <Suspense> boundaries inside the page are appropriate when sub-regions can be staggered.

When the waterfall is invisible

A waterfall is hard to see in DevTools because Next.js batches HTTP/2 frames and the timing tab shows the response as one stream. Three signs you are looking at a hidden waterfall:

A route has a loading.tsx but the spinner is always visible until everything is ready, never partial.
Adding a 1 second delay to one fetch increases TTFB by 1 second, not parallel-overlapped time.
Server-side logs show API calls finishing in strict sequence rather than overlapping.

That first sign is the most useful. A single non-progressive spinner is usually a sign that the await sits above your only Suspense boundary, defeating the streaming pipeline.

The placement rule of thumb

A short heuristic for where each refactor applies:

If the await is in a layout or page wrapper above any Suspense boundary, use move-await-down. Push the fetch into a leaf component and wrap that leaf in <Suspense>.
If two independent fetches sit inside one async function, use parallelize-promises with Promise.all.
If a route has visually independent regions that resolve at different speeds, use split-suspense-boundaries with one <Suspense> per region instead of a single loading.tsx.

Most slow App Router routes need more than one of these. A useful diagnostic order is: check for awaits above your top-most boundary first, then look for sequential awaits inside one async function, then ask whether a single boundary should become several.

A note on `loading.tsx` versus inline `<Suspense>`

A common confusion is whether to use the file-based loading.tsx convention or inline <Suspense> components. That file convention is sugar over an implicit Suspense boundary that wraps page.tsx. It is convenient when the entire route shares one loading state, but it puts the boundary at exactly one level: between layout and page. If you need finer granularity, inline <Suspense> is the right tool. They compose: a route can have loading.tsx for the page-level fallback and inline boundaries inside the page for sub-regions. Next.js documentation on streaming at https://nextjs.org/docs/app/building-your-application/routing/loading-ui-and-streaming covers the layering rules, and React's Suspense reference at https://react.dev/reference/react/Suspense is worth a careful read.

What streaming will not fix

Streaming reorders when things appear. It does not make a slow query fast. A 1.2 second chart fetch will still take 1.2 seconds on the wire. What streaming buys you is that the rest of the page is interactive while the chart resolves, which dramatically changes the perceived performance even when the raw numbers look similar in a synthetic benchmark.

If your bottleneck is a single slow upstream call with nothing else to render around it, the refactor you want is on the backend, not in the React tree. Streaming the wait for a single dominant fetch saves perhaps 50ms of HTML parsing time and not much else.

Takeaway

The mental model is worth keeping flat. Each <Suspense> boundary is a chunk in your HTTP response, and each await above a boundary delays the response. Streaming becomes real only when the awaits live inside the boundaries that wrap them, and only when independent fetches at the same level run in parallel. Three refactor patterns above cover the vast majority of "we have streaming but it does not feel like it" cases. Apply them in the diagnostic order, measure with the server-side request log rather than just the browser waterfall, and the 1.8 second case really does become a 600ms case on the same backend.

References: