Skip to content

Conversation

@nhelfman
Copy link
Contributor

This pull request improves the clarity and completeness of the PerformanceScrollTiming explainer documentation. The main changes provide more precise definitions for key metrics, introduce detailed rules for when scroll timing entries are emitted, and update acknowledgements.

Key documentation improvements:

  • Expanded and clarified attribute definitions in the PerformanceScrollTiming interface, including more precise descriptions for duration, framesExpected, framesProduced, and checkerboardTime. These now specify calculation methods, default behaviors, and implementation guidance.
  • Added a new "Entry Emission Rules" section, which defines when PerformanceScrollTiming entries are created. This includes rules for scroll end detection, direction change segmentation, and input-type-specific entry boundaries.

Acknowledgements update:

  • Added Hoch Hochkeppel to the acknowledgements list for contributions and guidance.

@nhelfman nhelfman changed the title Clarifications to some of the definitions used in Scroll Timing API proposal Scroll Timing API proposal - clarifications to some definitions Jan 25, 2026
@nhelfman nhelfman marked this pull request as ready for review January 25, 2026 08:47
| `framesProduced` | unsigned long | Number of frames actually rendered during the scroll |
| `checkerboardTime` | DOMHighResTimeStamp | Total duration (ms) that unpainted areas were visible during scroll |
| `duration` | DOMHighResTimeStamp | Total scroll duration from `startTime` until scrolling stops. A scroll interaction is considered complete when no scroll position changes have occurred for at least 150 milliseconds, or when a scroll end event is explicitly signaled (e.g., `touchend`, `scrollend`). Includes momentum/inertia phases. |
| `framesExpected` | unsigned long | Number of frames that would be rendered at the display's refresh rate during the scroll duration. Implementations SHOULD use the actual display refresh rate when available, and MAY fall back to 60Hz as a default. Calculated as `ceil(duration / vsync_interval)`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the definitions for framesExpected and framesProduced, it sounds like if I have a scroll that includes back-to-back identical frames, that would count against my perceived performance?

Some scenarios I'm thinking of (which might not all be possible, or might not all count):

  • An animated scroll going 50px down, over a period of time larger than 50 frames (e.g. 1 second at 60Hz).
    • Or if animated/smooth-scroll is not an option, then doing the same with a clicked-mouse-wheel to scroll down at a constant but very slow speed.
  • A touch-based scroll where I stop for 100ms in the middle of the scroll and then resume moving.
  • A keyboard-based scroll where each scroll is taking less than 100ms to complete, but my keypresses are coming every 140ms. So they get combined together based on the 150ms window, but have 40ms of non-scrolling time in between.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! You're right that the current design includes frames during stationary periods in framesExpected, which can result in lower smoothness scores for scenarios like the ones you mentioned.

The short answer: Yes, all four scenarios you described would count gap/pause frames against smoothness. This is intentional—the design provides raw measurements of "rendering opportunities vs. actual updates" rather than trying to filter out intentional low-velocity periods.

Addressing your scenarios:

  1. 50px animated scroll over 1 second (60 frames): ~50 frames produced, 83% smoothness

    • For sub-pixel-per-frame animations, implementations SHOULD still count a frame as "produced" if the browser attempted to update scroll position, even if pixel rounding results in identical rendered positions. So this might score closer to 100% in practice.
  2. Touch pause for 100ms mid-scroll: Those ~6 frames during the pause are included in framesExpected

    • Correct, this reflects that the rendering pipeline had opportunities to update, but input velocity was zero
  3. Keyboard with 40ms gaps: Frames during gaps count toward framesExpected

    • Yes, because the 150ms threshold combines these into one entry, treating it as continuous scrolling intent

Why this design?

The alternative—excluding "stationary" frames from framesExpected—would require defining velocity thresholds and make it harder to distinguish "couldn't render" from "no motion." The current approach gives developers the raw data to make their own interpretations based on velocity and duration:

  • High velocity + low smoothness = performance issue
  • Low velocity + low smoothness = likely intentional

I've added clarification to address this:

Does this design make sense for your use case, or do you think we should reconsider the frame counting approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that during stationary periods, when no frame is expected, the Chromium compositor is not called. In fact, multiple optimizations are in place to avoid this extra cost. Did you have an opportunity to test stationary periods with your prototype implementation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation makes sense and thanks for the write-up. I'm not familiar with what developer expectations would be in this respect, but this certainly sounds reasonable enough to get in the explainer and receive feedback.

I'll leave this comment unresolved for now for the sake of @ogerchikov's question

| `"touch"` | One entry per continuous gesture (`touchstart` → `touchend`), split on direction changes |
| `"wheel"` | One entry per scroll interaction; consecutive wheel events are combined into a single entry if they occur within 150ms of each other |
| `"keyboard"` | One entry per key repeat sequence (from `keydown` until key release + 150ms inactivity) |
| `"programmatic"` | One entry per programmatic scroll API call (e.g., `scrollTo()`, `scrollBy()`) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should programmatic also be merged together based on the 150ms window? That way if I wanted to, for example, map a custom key to a scrollBy() call, it could achieve the same behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't plan to address that since in case of programmatic the developer has full control on the experience and can measure on its own.

How important do you think is addressing this use case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific cases that spring to mind for me are for gamepad support on various websites or web-powered technologies.

One approach used is detecting use of the alternate joystick and invoking ScrollBy on a cadence, effectively mapping what would be a keydown event.

Another (more complicated to explain) approach is part of the overall directional focus model used by many of these UIs, where invoking 'down' (on a keyboard or the gamepad) sends focus to the next visual element lower on the screen. BUT this is only if the next visual element is already partially visible. If the current "focused" element is larger than the viewport, the request to go 'down' is instead turned into a request to scroll down by a distance, and this is repeated until the next focusable element is partially visible.

In cases like these the developer is trying to map user interaction into scrolling, but the scroll source is likely to still end up being "programmatic". On its surface, it feels like these should get treated just like any other scroll source - being grouped together for performance reporting.

Is there a reason to exclude programmatic scrolls from the grouping logic?

| Scroll Source | Entry Boundary |
|--------------|----------------|
| `"touch"` | One entry per continuous gesture (`touchstart` → `touchend`), split on direction changes |
| `"wheel"` | One entry per scroll interaction; consecutive wheel events are combined into a single entry if they occur within 150ms of each other |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially getting more detailed than is appropriate at this level, but if I was implementing this I'd like to better understand the 150ms rule. If wheel event happens at time 0ms and results in scrolling going until 80ms, is the combination rule:

  • another wheel event happening before time 150ms
  • another wheel event happening before time 230ms
  • another wheel event whose first frame happens before time 150ms
  • another wheel event whose first frame happens before time 230ms

(+similar breakdown for the other relevant scroll sources)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key principle is that the timeout measures inactivity (no position changes), not time since the last input event. This ensures that momentum scrolling and animation effects are included in the entry.

I've added clarification with examples to the Scroll End Detection section to make this explicit for implementers.

Does this clarify the timing model?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intention is mostly clear now.

Given my understanding, I'd like to propose the 150ms be used as the window for detecting inactivity, but not included in the duration.

As the document stands right now, one could describe an entry as starting at "the first request to change a scroll position". This makes sense to me.

When that entry ends, however, feels inconsistent.

  • For the wheel, keyboard, autoscroll, and touch-that-lingers-for-too-long examples you have below, the entry ends 150ms after the last frame. If I wanted to analyze the results, I would know that for all these cases the last 150ms of the duration had nothing happening - no scroll changes, no new input.
  • The final touch examples, however, end the entry at the same time as the last frame. Not 150ms after it.
  • Similarly, if there is a direction change in one of the other input methods, the entry would end at the same time as the last frame. Not 150ms after it.

What about conceptually defining an entry to go from "the starting request to change a scroll position" until "the scroll in that direction comes to an end". Then saying we know it comes to an end immediately at various events (like scrollend or a change in scroll direction) OR retroactively after 150ms of no new scroll positions or scroll requests in the initial direction.


#### Direction Change Segmentation

A new scroll timing entry MUST be emitted when the scroll direction reverses (i.e., `deltaX` or `deltaY` changes sign during the scroll). This means a single scroll gesture can produce multiple entries if the user reverses direction mid-scroll.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with what values get exposed, but does this mean an inertia-based overscroll would result in 2 scroll entries? (This is mostly me being curious)

| Scroll Source | Entry Boundary |
|--------------|----------------|
| `"touch"` | One entry per continuous gesture (`touchstart` → `touchend`), split on direction changes |
| `"wheel"` | One entry per scroll interaction; consecutive wheel events are combined into a single entry if they occur within 150ms of each other |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intention is mostly clear now.

Given my understanding, I'd like to propose the 150ms be used as the window for detecting inactivity, but not included in the duration.

As the document stands right now, one could describe an entry as starting at "the first request to change a scroll position". This makes sense to me.

When that entry ends, however, feels inconsistent.

  • For the wheel, keyboard, autoscroll, and touch-that-lingers-for-too-long examples you have below, the entry ends 150ms after the last frame. If I wanted to analyze the results, I would know that for all these cases the last 150ms of the duration had nothing happening - no scroll changes, no new input.
  • The final touch examples, however, end the entry at the same time as the last frame. Not 150ms after it.
  • Similarly, if there is a direction change in one of the other input methods, the entry would end at the same time as the last frame. Not 150ms after it.

What about conceptually defining an entry to go from "the starting request to change a scroll position" until "the scroll in that direction comes to an end". Then saying we know it comes to an end immediately at various events (like scrollend or a change in scroll direction) OR retroactively after 150ms of no new scroll positions or scroll requests in the initial direction.

| `"touch"` | One entry per continuous gesture (`touchstart` → `touchend`), split on direction changes |
| `"wheel"` | One entry per scroll interaction; consecutive wheel events are combined into a single entry if they occur within 150ms of each other |
| `"keyboard"` | One entry per key repeat sequence (from `keydown` until key release + 150ms inactivity) |
| `"programmatic"` | One entry per programmatic scroll API call (e.g., `scrollTo()`, `scrollBy()`) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific cases that spring to mind for me are for gamepad support on various websites or web-powered technologies.

One approach used is detecting use of the alternate joystick and invoking ScrollBy on a cadence, effectively mapping what would be a keydown event.

Another (more complicated to explain) approach is part of the overall directional focus model used by many of these UIs, where invoking 'down' (on a keyboard or the gamepad) sends focus to the next visual element lower on the screen. BUT this is only if the next visual element is already partially visible. If the current "focused" element is larger than the viewport, the request to go 'down' is instead turned into a request to scroll down by a distance, and this is repeated until the next focusable element is partially visible.

In cases like these the developer is trying to map user interaction into scrolling, but the scroll source is likely to still end up being "programmatic". On its surface, it feels like these should get treated just like any other scroll source - being grouped together for performance reporting.

Is there a reason to exclude programmatic scrolls from the grouping logic?

| `framesProduced` | unsigned long | Number of frames actually rendered during the scroll |
| `checkerboardTime` | DOMHighResTimeStamp | Total duration (ms) that unpainted areas were visible during scroll |
| `duration` | DOMHighResTimeStamp | Total scroll duration from `startTime` until scrolling stops. A scroll interaction is considered complete when no scroll position changes have occurred for at least 150 milliseconds, or when a scroll end event is explicitly signaled (e.g., `touchend`, `scrollend`). Includes momentum/inertia phases. |
| `framesExpected` | unsigned long | Number of frames that would be rendered at the display's refresh rate during the scroll duration. Implementations SHOULD use the actual display refresh rate when available, and MAY fall back to 60Hz as a default. Calculated as `ceil(duration / vsync_interval)`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation makes sense and thanks for the write-up. I'm not familiar with what developer expectations would be in this respect, but this certainly sounds reasonable enough to get in the explainer and receive feedback.

I'll leave this comment unresolved for now for the sake of @ogerchikov's question

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants