How Modern Browsers Render the Web: A Deep Technical Journey

/software-computational-architecture

From HTML bytes to GPU pixels — the complete architecture of a modern browser rendering engine.


Introduction: The Browser as a Game Engine

There is a useful analogy for understanding how browsers work at a deep level: think of a browser as a real-time 3D game engine.

In a game engine:

  • The scene graph defines what objects exist and where they are in the world.
  • The camera defines what portion of the world is visible.
  • The renderer takes the visible scene and converts it into pixels on screen.
  • The compositor combines multiple render layers (HUD, skybox, geometry, post-processing effects) into the final frame.

A browser does the same thing — just with HTML, CSS, and JavaScript instead of meshes and shaders.

Game Engine Concept Browser Equivalent
Scene graph DOM + layout tree
World coordinates CSS pixel space
Camera / viewport Scroll + viewport clipping
Render layers Compositor layers
HUD (fixed overlay) position: fixed elements
Post-processing effects CSS filters, opacity, blend modes
Vertex transform to clip space GPU NDC normalization

With this mental model in place, let's walk through the full pipeline — step by step.


Part I: Global Architecture

The Multi-Process Model

Modern browsers like Chrome, Firefox, and Safari are multi-process applications. They don't run everything in one thread or process — they deliberately split responsibilities for security, stability, and performance.

Chrome's architecture (which Blink-based browsers share) involves:

  • Browser Process — manages the UI shell, tabs, navigation, and inter-process communication.
  • Renderer Processes — one (or more) per site origin. This is where the rendering pipeline lives.
  • GPU Process — manages all GPU communication. Receives draw commands from renderer processes and executes them.
  • Network Process — handles all network requests (HTTP, WebSocket, etc.).
  • Utility Processes — audio, storage, and other services.

This is a sandbox model: a compromised renderer process cannot access the file system or GPU directly. It must go through the GPU process and browser process via IPC (inter-process communication).

  
  
Browser Process
    ├── Tab 1 → Renderer Process (site A)
    ├── Tab 2 → Renderer Process (site B)
    ├── Tab 3 → Renderer Process (site B, same origin → may share)
    └── GPU Process ← all renderers send draw commands here
  
  

Inside the Renderer Process

The renderer process is where the rendering pipeline lives. It is single-threaded on the main thread for most work, with a dedicated compositor thread running in parallel.

  
  
Renderer Process
    ├── Main Thread
    │     ├── HTML Parser
    │     ├── CSS Parser
    │     ├── Script Engine (V8 / SpiderMonkey)
    │     ├── Style Resolution
    │     ├── Layout (Reflow)
    │     ├── Paint (Display List generation)
    │     └── Commit → send to compositor
    └── Compositor Thread
          ├── Property Tree management
          ├── Layer management
          ├── Scroll handling (independent of main thread)
          └── Tile rasterization scheduling
  
  

The critical insight: scrolling runs on the compositor thread. This is why smooth scrolling is possible even when JavaScript is blocking the main thread.


Part II: The Rendering Pipeline in Detail

Step 1 — Parsing: Building the DOM and CSSOM

The browser begins by parsing HTML bytes into a DOM tree (Document Object Model), and CSS into a CSSOM (CSS Object Model).

  
  
HTML bytes
    → Tokenizer
    → Tree builder
    → DOM Tree

CSS bytes
    → CSS Parser
    → CSSOM Tree
  
  

The DOM is a tree of Node objects. The CSSOM is a tree of style rules.

Important: JavaScript can block HTML parsing (unless async or defer is used). CSS blocks rendering (the browser won't paint until CSSOM is ready). This is the source of many performance problems.

Step 2 — Style Resolution: Computed Styles

The style engine traverses the DOM and for each element, determines the computed style — the final resolved values of every CSS property, after cascade, inheritance, and specificity rules are applied.

  
  
DOM Node + CSSOM Rules → Computed Style Struct
  
  

For each element, this produces values like:

  • display: flex
  • position: fixed
  • transform: matrix(1, 0, 0, 1, 0, 0)
  • opacity: 0.8
  • overflow: hidden

This computed style struct is the input to layout.

Step 3 — Layout (Reflow): Building the Layout Tree

Layout (historically called "reflow") computes the geometric position and size of every element in the document.

The Layout Tree is not the same as the DOM tree. Some DOM nodes generate no layout box (display: none). Others generate multiple boxes (::before, ::after pseudo-elements). The layout tree contains only the nodes that participate in layout.

  
  
DOM Tree + Computed Styles → Layout Tree
  
  

Each layout node has:

  • A LayoutRect (x, y, width, height)
  • A formatting context (block, inline, flex, grid, table...)
  • References to children in layout order

Layout is the most expensive operation in the rendering pipeline for large changes. It is inherently sequential and single-threaded. A change to a parent element's width can cascade down and force all children to recompute.

All coordinates produced by layout are in CSS pixels — not physical pixels, not GPU coordinates. This is a critical point we will return to.

The Formatting Context Model

Layout is organized around formatting contexts:

  • Block Formatting Context (BFC) — normal document flow. Block boxes stack vertically.
  • Inline Formatting Context (IFC) — text and inline elements flow horizontally, wrapping at line boxes.
  • Flex Formatting Context — Flexbox algorithm governs children.
  • Grid Formatting Context — Grid algorithm governs children.
  • Table Formatting Context — table layout rules.

Each formatting context is essentially an independent layout algorithm that runs on a subtree of the layout tree.

Step 4 — Paint: Generating the Display List

Paint does not mean pixels on screen. Paint means generating a display list — an ordered sequence of drawing commands.

Think of it like stage directions: "draw this rectangle in blue", "clip to this region", "draw this text at this position".

  
  
Layout Tree → Paint Walk → Display List (drawing commands)
  
  

A display list might look like:

  
  
DrawRect(x=0, y=0, w=1200, h=800, color=#ffffff)
SaveLayer(opacity=0.5)
DrawImage(x=100, y=200, src=logo.png)
RestoreLayer()
DrawText(x=50, y=300, text="Hello", font=16px Arial)
ClipRect(x=0, y=0, w=400, h=600)
DrawRect(x=10, y=10, w=380, h=580, color=#eeeeee)
  
  

The paint phase also determines stacking contexts — the CSS painter's model defines a specific ordering for who draws on top of whom. Elements with position, opacity, transform, z-index, filter, or isolation properties create new stacking contexts.

The display list is ordered by paint order, which is not the same as DOM order in all cases.

Step 5 — Commit: Handing Off to the Compositor

After the main thread finishes layout and paint, it commits the result to the compositor thread. This commit packages:

  • The display list
  • The property trees (transform tree, scroll tree, clip tree, effect tree — more on these shortly)
  • Layer information

From this point on, the compositor thread can operate independently. This is the key to 60fps scrolling: once the page is committed, scrolling is handled entirely on the compositor thread without touching the main thread.


Part III: The Coordinate System Journey

This is one of the most misunderstood parts of browser rendering. There is not one coordinate system — there are four distinct stages, each with its own space.

The Camera Analogy

In a 3D game:

  1. Objects are placed in world space (meters, game units)
  2. The camera transform converts world space to camera/view space
  3. Projection converts to clip space (NDC: -1 to 1)
  4. The rasterizer maps clip space to screen pixels

In a browser:

  1. Layout produces positions in CSS pixel space
  2. Scroll and transforms produce a transform matrix (still CSS pixels)
  3. Rasterization converts to physical device pixels
  4. The GPU converts to normalized device coordinates (NDC)

Stage 1 — CSS Pixel Space (Layout Output)

All layout math happens in CSS pixels. This is a logical unit, not a physical one.

If your device has a Device Pixel Ratio (DPR) of 3:

  • 1 CSS pixel = 3 physical pixels
  • A 100px wide div is actually 300 physical pixels wide on screen

At this stage, everything is still in logical screen space. No normalization, no device pixels.

  
  
Element at CSS position (100px, 200px)
→ stored as (100, 200) in layout coordinates
→ DPR is not applied yet
  
  

This logical abstraction exists so that the same CSS works consistently across devices with different pixel densities. A 16px font looks roughly the same physical size on a 1x monitor and a 3x Retina display.

Stage 2 — Viewport Clipping and Transform Application (Compositor)

The compositor thread:

  1. Takes the property trees
  2. Applies scroll transforms (as matrix translations)
  3. Applies element CSS transforms (scale, rotate, translate, matrix)
  4. Multiplies matrices down the transform tree
  5. Clips to the viewport rectangle

All of this is still in CSS pixel space. The compositor does not think in device pixels. It is doing matrix math in the logical coordinate system.

  
  
element_world_position = scroll_matrix × css_transform_matrix × local_position
viewport_clip = [0, 0, viewport_width_css, viewport_height_css]
visible_region = element_world_position ∩ viewport_clip
  
  

This is where position: fixed elements behave differently: they are detached from the scroll transform. Their transform chain does not include the scroll translation matrix, so they stay fixed relative to the viewport.

  
  
Layer Tree
 ├── Scroll Node (has scroll_offset transform)
 │     └── Normal Content (inherits scroll translation → moves with scroll)
 └── Fixed Node (no scroll transform → stays put)
  
  

Stage 3 — Rasterization: Physical Pixels

Now the compositor hands layers to the rasterizer. This is where the coordinate system finally crosses into the physical world.

  
  
physical_pixel_position = css_position × device_pixel_ratio
  
  

So a CSS position of (100, 200) on a DPR=3 screen becomes device pixel (300, 600).

The rasterizer executes the display list drawing commands at this physical resolution, producing bitmaps — tile-sized chunks of pixel data. Chrome uses a tiling system: the page is divided into tiles (typically 256×256 or 512×512 physical pixels), and tiles are rasterized independently, often in parallel on the GPU.

At this stage:

  • Coordinates are physical pixels (integers)
  • Not yet normalized
  • Not yet in GPU clip space

Stage 4 — GPU Pipeline: Normalized Device Coordinates

Only inside the GPU pipeline does normalization happen. This is handled entirely by GPU vertex shaders and is invisible to browser logic.

The GPU converts from device pixel coordinates to clip space (NDC: Normalized Device Coordinates), which range from -1 to +1 on all axes.

  
  
// OpenGL / WebGL style NDC conversion (in vertex shader)
float x_ndc = (2.0 * x_device / viewport_width) - 1.0;
float y_ndc = 1.0 - (2.0 * y_device / viewport_height);
  
  

This normalization is what the GPU hardware understands. It maps the viewport to a unit cube. Anything outside [-1, 1] is clipped by the GPU hardware automatically.

The browser never deals with NDC coordinates directly. This happens deep inside the GPU pipeline, after the browser has handed off its draw commands.

The Full Journey Summarized

  
  
Layout          → CSS coordinates         (logical, e.g. 100px)
Scroll          → adds translation matrix (still CSS pixels)
CSS transform   → adds matrix             (still CSS pixels)
Compositor      → multiplies matrices     (still CSS pixels)
Viewport clip   → intersect rect          (still CSS pixels)
Rasterizer      → converts to device px  (physical, e.g. 300px @ DPR=3)
GPU vertex      → normalizes to NDC      (-1 to +1)
Screen          → pixels emitted          (physical display pixels)
  
  

The compositor is not a GPU math system. Its job is:

  • Maintain transform trees
  • Apply translation matrices
  • Merge layers
  • Clip to viewport
  • Send draw commands

It does not care about [-1, 1] clip space. It works entirely in screen-space CSS coordinates.


Part IV: The Property Trees — Four Trees Instead of One

This is the deepest architectural insight in modern browser rendering.

Modern Blink (Chrome) and WebKit (Safari) rendering engines do not use a single scene graph. They split responsibilities into four independent property trees so that scrolling, transforms, clipping, and effects can be handled efficiently and independently.

The four trees are:

  1. Transform Tree
  2. Scroll Tree
  3. Clip Tree
  4. Effect Tree

Why Four Trees?

If everything were in one giant tree:

  • A scroll event would force layout invalidation
  • A CSS transform change would affect clip recomputation
  • An opacity change would recompute geometry

By separating concerns:

  • Scroll updates → only scroll tree + transform tree touched
  • Transform updates → only transform tree invalidated
  • Clip changes → only clip tree
  • Opacity changes → only effect tree

This separation is what allows the browser to hit 60fps on complex pages. Each tree can be updated independently, and most updates never touch layout at all.

Tree 1 — The Transform Tree

What it represents

The hierarchy of geometric transforms applied to elements:

  • CSS transform property (translate, scale, rotate, skew, matrix)
  • CSS perspective
  • Scroll offset (encoded as a translation — more on this below)
  • Layer promotion transforms

Each node in the transform tree stores a 4×4 transform matrix.

What it does

For any element, the compositor computes the final world-space transform by multiplying the chain of parent transforms:

  
  
final_transform = root_transform × ... × parent_transform × local_transform
  
  

This is pure matrix chaining — the same operation as a scene graph in a 3D engine.

Why it exists separately

Not every element creates a new transform context. Only elements with CSS transforms, scroll containers, or promoted layers have their own transform node. Most elements simply inherit their parent's transform implicitly.

Separating the transform tree allows the GPU to apply transforms directly to layer textures without re-rasterizing content.

Tree 2 — The Scroll Tree

What it represents

The hierarchy of scroll containers.

Every scrollable element — including the root document — creates a scroll node. Each node stores:

  • scroll_offset — the current (x, y) scroll position
  • scroll_bounds — the scrollable area dimensions
  • A reference to a corresponding transform node (because scroll becomes a transform)

What it does

Instead of mutating the coordinates of every element when you scroll, the engine:

  1. Updates scroll_offset on the scroll node
  2. Converts that offset into a translation transform matrix
  3. Attaches that transform to the transform tree

So scrolling is not a layout operation. It is a transform update. The entire document does not move — a single matrix changes, and the compositor re-composites the frame with the new scroll transform applied.

Why it exists separately

Scroll changes happen at up to 120Hz on modern devices. If scroll triggered layout, every frame would require a full reflow of the document. That's catastrophically expensive.

By making scroll a transform source in a separate tree:

  • Scroll runs entirely on the compositor thread
  • Main thread is never involved for smooth scrolling
  • Nested scroll containers are isolated from each other

Nested Scroll Containers

  
  
Scroll Tree
└── Root Scroll Node (document scroll)
      ├── Normal Content (inherits root scroll)
      └── Inner Scroll Node (e.g., overflow: scroll div)
            └── Inner Content (inherits inner scroll, then root scroll)
  
  

Each nested scroll container has its own node. Their transforms compose — a child scroll container's content moves with both its own scroll and any ancestor scroll.

Tree 3 — The Clip Tree

What it represents

Clipping regions — areas outside which pixels are not drawn.

Examples:

  • overflow: hidden — clips to the element's border box
  • overflow: scroll — clips to the scroll container viewport
  • border-radius with overflow: hidden — clips to rounded shape
  • The viewport itself — the root clip node
  • CSS clip-path — arbitrary shape clipping

Each node defines a clip rectangle (or shape) in the coordinate space of its transform node.

What it does

When compositing, the engine intersects clip regions going down the tree:

  
  
final_clip = grandparent_clip ∩ parent_clip ∩ local_clip
  
  

This gives the final visible region for an element. Pixels outside this region are simply not drawn.

Why separate from the transform tree?

Because clipping is not geometric transformation. An element can change its transform without affecting its clip region. An element's clip can change (due to scroll container resize) without affecting its transform.

Separating the clip tree means:

  • Clip invalidation doesn't trigger transform recomputation
  • Scroll container resize can update clips without layout
  • Complex clip shapes can be managed independently

Tree 4 — The Effect Tree

What it represents

Visual effects and compositing properties:

  • opacity
  • CSS filter (blur, brightness, contrast, drop-shadow...)
  • mix-blend-mode
  • isolation: isolate
  • backdrop-filter
  • Compositing groups

Each node stores the effect properties and whether it requires an offscreen render surface.

What it does

The compositor applies effect stacking rules:

  
  
final_opacity = parent_opacity × local_opacity
  
  

For effects that require isolation (like mix-blend-mode or backdrop-filter), the compositor creates an offscreen render surface — essentially an intermediate texture — renders the subtree into it, then composites that surface into the parent.

Why separate?

Effects can be expensive. Some require offscreen surfaces. None of them should trigger layout changes. By separating the effect tree:

  • Opacity animations never touch layout or the transform tree
  • Filter changes are isolated
  • Compositing groups can be managed independently
  • The renderer can decide whether GPU acceleration is worth it per-node

Part V: How the Four Trees Interact

Per-Element Dependency Struct

Every composited element (a layer or render surface in the compositor) stores references into the four property trees:

  
  
struct CompositorElement {
    TransformNode* transform_node;  // → its place in the transform tree
    ClipNode*      clip_node;       // → its clip ancestor
    EffectNode*    effect_node;     // → its effect ancestor
    // Scroll is encoded via transform_node (scroll → translation matrix)
    
    Quad           local_geometry;  // local bounding rect
};
  
  

This is not literally the struct in Chromium source, but conceptually this is what each element has. The key insight: elements do not walk the tree every frame. They store direct references to their specific nodes. Per-frame, the compositor just follows those references.

Strict Ordering: Transform → Clip → Effect

For each element, the compositor applies property trees in a strict, deterministic order.

1. Transform First

Transforms change geometry. Before anything else, the compositor must know where the element is and what shape it occupies in world space.

  
  
world_quad = final_transform_matrix × local_quad
  
  

Everything else depends on the transformed position.

2. Clip Second

Once the quad is positioned in world space, we apply clipping. The clip regions are expressed in the same coordinate space as the transformed geometry.

  
  
visible_region = world_quad ∩ final_clip
  
  

That's why clipping comes after transform: you can only clip against something that has been positioned.

3. Effect Third

Effects like opacity, filters, and blending often require:

  • Offscreen render surfaces
  • Alpha composition
  • Isolation groups

Only after geometry and clipping are resolved does the compositor determine how to blend the element into its parent surface.

  
  
composite into parent with:
    opacity = final_opacity
    filter  = final_filter
    blend   = blend_mode
  
  

Where Scroll Fits

Scroll is not a separate stage. Scroll is encoded as a translation transform node in the transform tree. So scroll is automatically applied during the transform stage.

  • Normal element: transform chain includes scroll translation → element moves with scroll
  • position: fixed element: transform chain excludes scroll translation → element stays put

This is the precise mechanism behind position: fixed. The element's transform node is attached to the root transform (or the viewport), not to the scroll node. So when scroll offset changes, the fixed element's transform chain is unaffected.

Per-Frame Algorithm (Conceptual)

For each composited element, the compositor runs something like:

  
  
# Build transform chain
matrix = Identity
node = element.transform_node
while node != None:
    matrix = node.local_matrix × matrix
    node = node.parent

# Build clip chain
clip = UniversalRect
node = element.clip_node
while node != None:
    clip = intersect(clip, transform_to_world(node.clip_rect, node.transform))
    node = node.parent

# Build effect chain
opacity = 1.0
node = element.effect_node
while node != None:
    opacity *= node.opacity
    # handle filters, blend modes, offscreen surfaces...
    node = node.parent

# Draw
draw(element.local_geometry, matrix, clip, opacity, ...)
  
  

Highly optimized in practice (cached, incremental, GPU-parallel), but this is the logical model.


Part VI: Layers and Compositing

What is a Compositor Layer?

A compositor layer is a GPU texture — a pre-rasterized bitmap of some part of the page. The compositor keeps multiple such textures and composites them together each frame, much like how a video editor composites multiple video tracks.

The advantage: if an element is on its own layer, and it animates (moves, scales, fades), the compositor just transforms the texture. No re-rasterization is needed. This is how CSS transform and opacity animations can be hardware-accelerated.

Layer Promotion

Not every element gets its own compositor layer. Creating a layer has a cost (GPU memory, upload time). Browsers use heuristics to decide when to promote elements to their own layers:

Automatic promotion triggers:

  • will-change: transform or will-change: opacity
  • CSS transform that is being animated (via CSS animations or Web Animations API)
  • CSS opacity being animated
  • position: fixed (usually, to handle scroll without re-compositing)
  • video, canvas, iframe elements
  • Elements with backdrop-filter
  • Overlapping elements that require correct paint order with a promoted sibling

The will-change property is a hint to the browser: "this element will animate soon, please promote it preemptively." This avoids a frame of jank when the animation starts (the promotion itself costs a frame).

Layer Explosion

A common performance pitfall: over-promoting elements. If every element has its own layer, GPU memory is exhausted and the compositor spends more time compositing than the savings from avoiding re-rasterization.

Tools like Chrome DevTools' Layers panel visualize the layer tree and show why each layer was created.

The Tiling System

Even within a single layer, browsers do not rasterize the entire layer at once. They divide layers into tiles (typically 256×256 or 512×512 physical pixels).

Tiles are rasterized on demand and prioritized by proximity to the viewport. Tiles near the current scroll position are rasterized first; tiles far off-screen may be rasterized at lower resolution or not at all.

This is why, when you scroll very fast on a complex page, you sometimes see checkerboard patterns — tiles that haven't been rasterized yet.


Part VII: The Paint Path to the GPU

From Display List to Draw Calls

Once layers are rasterized into tile textures, the compositor assembles the final frame. For each frame, the GPU process receives a sequence of draw calls — instructions like:

  
  
DrawTexturedQuad(
    texture: layer_texture_id,
    transform: final_transform_matrix,
    clip: final_clip_rect,
    opacity: 0.8,
    blend_mode: normal
)
  
  

The GPU process translates these into actual GPU API calls (OpenGL, Vulkan, Metal, D3D12, depending on platform).

The GPU Pipeline (Simplified)

Inside the GPU, each draw call goes through:

  1. Vertex Shader — transforms vertices from device pixel coordinates to NDC (normalized device coordinates, -1 to +1). This is where the NDC conversion happens.

  2. Primitive Assembly — assembles vertices into triangles (or quads).

  3. Rasterization — determines which screen pixels each triangle covers.

  4. Fragment Shader — for each covered pixel, samples the texture and applies color/alpha.

  5. Framebuffer Output — writes the final pixel color to the framebuffer.

The framebuffer is then presented (swapped) to the display.

VSync and the Frame Loop

Modern displays refresh at a fixed rate — 60Hz, 90Hz, 120Hz, or higher. The GPU process synchronizes frame presentation with the display's VSync signal (vertical synchronization). Presenting a frame outside of VSync causes tearing — a horizontal artifact where two different frames are visible simultaneously.

The browser's BeginFrame signal coordinates the entire rendering pipeline with VSync:

  
  
VSync signal
    → Browser Process sends BeginFrame to Renderer
    → Main thread runs rAF callbacks + style/layout/paint if needed
    → Compositor commits
    → Compositor thread composites
    → GPU Process renders
    → Frame presented to display
  
  

If any step takes too long, the frame is missed and the user sees jank (dropped frames).


Part VIII: The Viewport and Fixed Elements Deep Dive

What is the Viewport?

The viewport is the visible rectangle of the web page — the "camera" in our game engine analogy. It has:

  • A position in document space (determined by scroll offset)
  • A fixed size (the browser window's content area)

In CSS, the viewport defines the vw and vh units. 100vw is always the viewport width, regardless of scroll position.

The viewport is represented in the compositor as the root clip node (clips everything to the visible area) and the root scroll node (encodes the current scroll position as a transform).

The Visual Viewport vs. Layout Viewport

On mobile, browsers distinguish between:

  • Layout Viewport — the virtual canvas on which layout is performed. On mobile, this is typically 980px wide by default, allowing desktop sites to render at desktop widths.
  • Visual Viewport — the actual visible area on screen. On a 375px-wide phone with no zoom, the visual viewport is 375px wide.

Pinch-zoom changes the visual viewport without changing the layout viewport. The page does not reflow — the compositor simply scales the entire frame.

The viewport meta tag (<meta name="viewport" content="width=device-width">) tells the browser to set the layout viewport equal to the device width, eliminating the 980px default.

Fixed Positioning: The Scroll Tree Detachment

position: fixed elements remain fixed relative to the viewport regardless of scroll. The mechanism is precisely the scroll tree detachment described earlier.

  
  
Transform Tree (simplified):

Root Transform Node (identity)
└── Scroll Transform Node (translates by -scroll_offset)
      ├── Normal Elements (inherit scroll translation → move with page)
      └── (fixed elements are NOT here)
Fixed Elements (attached to Root Transform, not Scroll Transform)
  
  

When the user scrolls:

  1. The scroll node's scroll_offset updates
  2. The scroll translation matrix updates: translate(0, -scroll_y)
  3. Normal elements' world positions shift accordingly
  4. Fixed elements' world positions are unchanged (they're not in the scroll subtree)
  5. The compositor re-composites with the new matrices — no layout, no re-rasterization

This is why position: fixed elements are essentially "free" to scroll — they require no layout computation, just a matrix exclusion in the transform tree.

In Other terms:

For a normal element:

  • screen_y = layout_y - scroll_offset

For a fixed element:

  • screen_y = layout_y

Sticky Positioning

position: sticky is more complex. A sticky element behaves like position: relative until its scroll container scrolls past a threshold, then it behaves like position: fixed within that container.

In the compositor, sticky elements have a special sticky position constraint node. Per frame, the compositor computes whether the scroll position has crossed the sticky threshold, and if so, adjusts the element's transform to pin it at the threshold.

This is handled entirely in the compositor — no main thread involvement during scroll.

The position: fixed and transform Gotcha

A notorious browser behavior: if a position: fixed element's ancestor has a CSS transform applied, the fixed element is no longer fixed to the viewport. Instead, it is fixed to the transformed ancestor.

The reason: CSS transform creates a new containing block for fixed descendants. In transform tree terms, the fixed element is re-parented under the transformed ancestor's transform node rather than the root.

This surprises many developers, but it follows directly from the property tree model.


Part IX: Putting It All Together — A Concrete Example

Let's trace a progress bar with scaleX animation through the full pipeline.

  
  
.progress-bar {
    position: fixed;
    top: 0;
    left: 0;
    width: 100%;
    height: 4px;
    transform-origin: left;
    transform: scaleX(0.6);
    background: blue;
}
  
  

Layout Phase (Main Thread)

  • Element is laid out at CSS position (0, 0), width = viewport_width, height = 4px.
  • position: fixed noted: this element's containing block is the viewport.
  • Computed style: transform: scaleX(0.6), transform-origin: left center.

Property Tree Assignment (Main Thread → Compositor Commit)

The element gets references to:

  • Transform node: local matrix = scaleX(0.6) with origin at (0, 2) (left center of 4px bar). Parent: root transform node (NOT the scroll node).
  • Clip node: viewport clip node (clips to viewport bounds).
  • Effect node: default effect node (opacity=1, no filter).
  • Scroll: not in the scroll subtree — this is the position: fixed detachment.

Per-Frame Compositor Work

Every frame:

  
  
// Transform chain
matrix = root_transform (identity) × scaleX(0.6)

// Clip
clip = viewport_rect (e.g., [0, 0, 1200, 800] in CSS pixels)

// Effect
opacity = 1.0

// Geometry
world_quad = matrix × [0, 0, 1200, 4]
           = [0, 0, 720, 4]  (scaled by 0.6)

visible_region = world_quad ∩ clip = [0, 0, 720, 4]
  
  

When the user scrolls: nothing changes for this element. The scroll node updates, but the progress bar is not in the scroll subtree. Zero recomputation.

When the JavaScript updates transform: scaleX(0.9):

  1. Main thread updates the transform node's local matrix.
  2. Compositor thread sees the invalidation.
  3. Next frame: recomputes world_quad with new matrix.
  4. No layout, no re-rasterization (the layer texture is reused, just drawn at a different transform).

This is hardware-accelerated animation: matrix change → compositor recomposite → GPU draw.

Rasterization

The 720×4 CSS pixel visible region is rasterized to:

  • Physical pixels: 720 × DPR × 4 × DPR (e.g., 2160 × 12 at DPR=3)
  • Stored as a tile texture on the GPU

GPU Pipeline

The compositor issues a draw call:

  
  
DrawTexturedQuad(
    texture: progress_bar_texture,
    transform: [scale(0.6) at (0,0)],
    clip: viewport,
    opacity: 1.0
)
  
  

Inside the GPU vertex shader:

  
  
x_ndc = (2.0 * x_device / 3600.0) - 1.0;  // viewport 1200 CSS × DPR 3 = 3600 physical px
y_ndc = 1.0 - (2.0 * y_device / 2400.0);
  
  

The 2160×12 texture quad is rendered into the framebuffer. Frame presented. Done.


Part X: Performance Implications

Understanding this architecture directly explains why certain operations are "free" and others are expensive.

What Triggers What

Operation Layout Paint Composite Cost
Scroll (compositor path) No No Yes Very cheap
transform change (compositor) No No Yes Very cheap
opacity change (compositor) No No Yes Very cheap
color, background change No Yes Yes Cheap
width, height change Yes Yes Yes Expensive
font-size change Yes Yes Yes Expensive
Adding/removing elements Yes Yes Yes Expensive
position change Yes Yes Yes Expensive

The CSS Property Fast Path

CSS transform and opacity are special because they are handled entirely in the compositor via the effect tree and transform tree. Changes to these properties:

  1. Do not trigger layout
  2. Do not trigger paint
  3. Only update a property tree node
  4. Are composited at 60fps on the compositor thread

This is why transform: translateX() is preferred over left: for animations. Both visually move an element horizontally, but left triggers layout on every frame, while transform only updates a matrix.

will-change and Layer Promotion

  
  
.animated-element {
    will-change: transform;
}
  
  

This tells the browser: "promote this element to its own compositor layer now, before animation starts." The cost: GPU memory for the layer texture. The benefit: animation starts without a jank frame from the promotion itself.

Overusing will-change can cause layer explosion — too many layers, too much GPU memory, slower compositing.

Avoiding Layout Thrashing

Layout thrashing occurs when JavaScript reads layout properties (triggering a layout) and then writes layout properties (invalidating layout), repeatedly in a loop:

  
  
// Thrashing — forces layout N times
for (let i = 0; i < elements.length; i++) {
    const width = elements[i].offsetWidth;     // READ → forces layout
    elements[i].style.width = (width * 2) + 'px';  // WRITE → invalidates layout
}

// Fixed — batch reads then writes
const widths = elements.map(el => el.offsetWidth);  // all reads
elements.forEach((el, i) => el.style.width = (widths[i] * 2) + 'px');  // all writes
  
  

The browser coalesces layout invalidations and only runs layout once per frame — but forced synchronous layouts (reading a layout property after a write in the same frame) bypass this optimization.


Conclusion

The modern browser rendering pipeline is a sophisticated, multi-stage system that rivals a real-time 3D game engine in architectural complexity. Let's summarize the key insights:

The pipeline has six major stages:

  1. Parse HTML/CSS → DOM + CSSOM
  2. Style resolution → Computed styles
  3. Layout → CSS pixel geometry
  4. Paint → Display list (drawing commands)
  5. Composite → Layer assembly, property tree traversal
  6. GPU → Rasterize, normalize, present

The coordinate system has four distinct spaces:

  1. CSS pixels (layout output)
  2. CSS pixels + transform matrices (compositor)
  3. Physical device pixels (rasterization)
  4. Normalized Device Coordinates (-1 to +1, GPU only)

The property trees (four, not one) enable performance isolation:

  1. Transform tree — geometric transforms + scroll offsets
  2. Scroll tree — scroll containers and offsets (scroll = transform)
  3. Clip tree — overflow and viewport clipping
  4. Effect tree — opacity, filters, blend modes

The compositor thread is the performance hero:

  • Runs independently of the main thread
  • Handles scrolling without layout involvement
  • Composites layers using GPU textures
  • Applies transforms, clips, and effects per-frame

position: fixed is a scroll tree detachment:

  • Fixed elements are attached to the root transform, not the scroll transform
  • They are composited every frame without any main thread involvement
  • A CSS transform on an ancestor breaks this by creating a new containing block

Understanding these mechanisms transforms how you approach front-end performance. When you know that transform and opacity live in the compositor's property trees — never touching layout — you understand why they are the only two CSS properties that can reliably achieve 60fps animation. When you understand that position: fixed is a tree detachment, you understand both its power and its quirks.

The browser is, in the most technical sense, a game engine for documents. And like any game engine, its performance characteristics flow directly from its architecture.


Written for developers who want to go beyond "avoid layout reflow" and understand the actual machinery underneath.