Since Async Pan and Zoom (APZ) has landed on Firefox OS, scrolling is now decoupled from the painting of the app we're scrolling. Because of the decoupling, when the CPU is under heavy load, we sometimes checkerboard. What do the past two sentences mean and how do we fix it? Stay with me here as we go down the rabbit hole.
What is Checkerboarding?
Checkerboarding occurs when Gecko is unable to paint the viewable portion of a webpage while we're scrolling. Instead, we just paint a color instead of the content. Visually this means we get something like this:
Why does this happen?
Async Pan and Zoom
Asynchronous Pan and Zoom (APZ) is a major new feature in Firefox OS that improves scrolling, panning, and zooming. Before APZ, scrolling was a synchronous affair. Every time a user touched the screen to scroll, an event would be sent to the browser essentially saying 'scroll by X amount'. Nothing else in the system could occur while the CPU painted in the newly scrolled in region. If the CPU was busy doing something else, you couldn't scroll until the CPU caught up or it would be janky.
APZ changes that by using a separate Compositor thread. Every time a user touches the screen, an event is fired off to the Compositor thread that says 'scroll by X amount'. The compositor thread is then free to scroll by X amount without being blocked by other work. For this to work smoothly, the graphics system has to overpaint what the user is currently seeing. The idea is that the user can scroll around a displayport that is larger than what the user is currently seeing (the viewport). When the user scrolls near the edge of the displayport, we can repaint the displayport again and the user scrolls as they wish. The painting of the displayport occurs on the main thread, but the scrolling occurs on the Compositor thread. Thus, we should be able to have smoother scrolling. It's all kind of difficult to explain in text, so let's checkout this video:
What we see is how the graphics subsystem works when we're scrolling. The initial 'red' box we see is the current viewport, or what the user is seeing. The darker brown box is the whole webpage. As we scroll, we see a yellow box, which is the 'requested' displayport. This is a request to the graphics system to render and paint that portion of the webpage. Finally, the green box displays what we've actually painted. As we scroll, we see that we're constantly requesting new display ports (yellow boxes), and we're flashing green as we paint in new displayports. These displayports should always be following the viewport and in theory, the viewport always fits into the green box. When they don't, we get sadface or checkerboarding. This occurs around frame ~440 (top left number) where the top of the red box is just slightly outside the green box. Not quite the best visual appearance ever is it? Finally, we know what checkboarding is and why it happens. How do we fix it?
The Graphics Pipeline
Figuring out how to fix something requires an understanding of what's actually happening. Your HTML page goes through these following steps:
- Parsed HTML into a Content Tree - Mostly a 1:1 representation of your document
- Content Tree is translated into rectangles creating the Frame Tree
- Each node in the Frame Tree is assigned a Layer, creating a Layer Tree
- The Layer Tree is painted. The content is sent to the Compositor in another thread
- The Compositor renders each layer to create a single image you see on the screen.
Steps (1) and (2) occur the first time the page is loaded. Steps 3-5 occur at every single frame. If you want smooth scrolling, we paint 60 frames per second, which means we might have to do steps 3-5 every 16.6 ms consistently. When we checkerboard, it means we're not doing steps 3-5 within our 16.6 ms time frame. Since steps 1 and 2 occur only once, if we want to optimize smooth scrolling, we have to optimize steps 3-5. As with compilers, optimizing early in the pipeline usually produces a much better performance optimization than optimizing at the end of the pipeline. For example, the compositor's job is to go through each layer and draw pixels. If we can optimize a layer, the compositor doesn't have to do any work. The main way to optimize checkerboarding and smooth scrolling is to optimize the layer tree. If we optimize a layer in the layer tree, we don't have to paint anything in the main thread, we send less data over IPC to the compositor thread, and the Compositor has to do less work. Now, what is a Layer? What's a Layer Tree. And what is ???? before we Profit!
Layerize All the Things!
Every node in the Frame tree is assigned to a Layer. There are different layers for different types of content, such as an image, a video, a scrollable layer, etc. All the Layers make up the Layer Tree. The idea of a layer comes from painting by hand, where oil painters used layers for different elements in their paintings. The decision to assign content to which layer occurs in nsFrame::BuildDisplayList. At each frame, Gecko decides to invalidate certain parts of the Layer Tree and assigns those elements to a new layer. What does a Layer Tree look like?
We see that there are a ton of different types of Layers, but there are a few that stand out.
- The RefLayer - This is the layer that contains the content app or child process. Everything below it belongs to the content. Everything above it refers to the parent.
- ContainerLayers - These layers just contain other layers. Sometimes these are Scrollable layers, as identified by a ScrollID.
- ThebesLayers - Layers that have an allocated buffer that we can draw to. We mostly want to optimize these!
- ColorLayers - Blank layers made of a single color (usually specified with the CSS property background-color).
To see a text dump of a Layer Tree, enable the layers.dump preference or in the developer menu in Gaia. Our job to reduce checkerboarding is to make sure we both (1) Layerize Correctly and (2) Make as few changes to the Layer Tree as possible at each frame. How do we know when we're doing (1) or (2)?
There are a few things to check to see if an app is layerizing correctly. The are generally just alerts to see if we're doing ok. The first is to enable the FPS counter in the developer tools. The right most number on the display will tell us how many times we're overdrawing each pixel. In the ideal world, we would draw each pixel only once, so the number would be 100. If the number is 200, it means we're drawing each pixel twice. While you're scrolling, if the number is ~300 - ~400 for an app, we're probably doing alright. Anything over ~500 should be an alarm that we're not layering correctly.
The other option is to enable Draw Layers Borders. If we have a lot of random layers all over the place while we're scrolling, or if the layer's don't make any sense, we're buildings bad layers. Here's an old example from the Settings app where we were layerizing pretty badly. See the random green boxes around sound? Bad.
The next thing to check is to see if the app is over invalidating the layer tree. In those cases, we'll be painting something every frame when we don't have to. For example, if a piece of content is a static block with some text that never changed, it's useless work to keep painting the same text. You can check to see if your app is over painting by enabling the 'Flash Repainted Area' option. Generally, unless it's changing, it shouldn't be flashing. In addition, with APZ, when we're scrolling, only the newly scrolled in content should maybe be flashing. If everything is flashing all the time, check your app. Some good examples on over invalidation are over here.
You should check that we're both layerizing correctly and not over invalidating the app. If both look good, you might have to read a layer tree.
Reading a Layer Tree
Reading the Layer Tree is a very dense and time consuming process, especially at the beginning. Skip this if you're busy and just want to fix your app quickly. Otherwise, grab some coffee and join me for a rabbit hole. Here is an example Layer Tree from the Settings App that helped with bug 976299.
Essentially what we're looking for is a Layer that is either (a) unused (b) not visible or (c) not the right dimensions. The key things to look for are the visible sections. These dimensions are in pixel units. On this device, we're looking at a width=320px, height=480px screen. Thus anything that isn't actually visible to your eyes shouldn't be in a layer if we're optimizing it correctly. Let's take this layer tree line by line.
- Layer Manager - Every Layer tree is managed by a Layer manager. Can't do anything here.
- Because we haven't seen a RefLayer yet (line 14), we're now in the parent process. We have a ContainerLayer that is dimensions width=320, height=480, starting at coordinate x=0, y=0 (the top left corner). X is the horizontal axis. Y is the vertical axis. We see this in the [visible=< (x=0, y=0, w=320, h=480); >] portion.
- We have a ThebesLayer, with x=0, y=0, w=0, h=0, and it is not visible. Ideally, we wouldn't have a Thebes Layer here, but since it has no dimensions and is not visible, we're good to go.
- Empty Space
- We have a ColorLayer. ColorLayers are a single background color layer. Here in this case it is a black background rgba=(0,0,0,1), the 1 here means opacity so it's opaque. It's also not visible, so we're good to go. Opaque is good because it means anything behind this layer isn't going to be visible, so we don't have to paint the items behind it.
- Another ContainerLayer that has the same dimensions. Since the layer in line (2) is just a non-visible rectangle, it's ok to have this one.
- We have a ThebesLayer that is width=320, height=20px located at (0, 0). Since this is in the parent process, this is the status bar that tells you how much battery you have, etc.
- Every Thebes layer has some buffer attached to it to actually draw the data. The ContentHost tells us it has a buffer of size width=320, height=32, located at (0, 0). In the ideal world, the buffer size would be height=20, but 32 works because of the GPU.
- The ContentHost has a gralloced buffer that we're painting with OpenGL. Usually, for every ThebesLayer, you should see both a ContentHost (8) and a Gralloced Buffer (9).
- Another Container Layer! This time located at (0, 20) (so vertically down 20 pixels, visually 20 pixels from the very top or just below the status bar), a size width=320, height=460. The height here is 460 because 480px total screen size - 20 pixels for the status bar.
- Another Thebes layer, not visible so we're ok.
- A Color Layer that is the whole container layer (10). Color Layers are cheap, so it's ok. In the ideal world, we'd get rid of this too. But the important part of the Container Layer is line 14!
- RefLayer - This marks the beginning of the Content App, or the Settings App here. We see it starts at coordinate (0, 20), is 320x460.
- The Container Layer here starts at coordinate (0, 0), width=640, height=460. A few things to note here, since the ContainerLayer is inside the RefLayer, the (0, 0) is from the top left of the ContainerLayer in (10). That means, from the point of view of the whole screen, it's actually at (0, 20), width=640, height=460. In addition, it has a DisplayPort of size 640x460, which is the displayport associated with Async Pan and Zoom. Since our screen size is only width=320, we're allocating twice as much size as we need to. ALERT SHOULD BE GOING OFF HERE!
- We have a ThebesLayer that starts at x=320, y=0 (again, from the point of view of the ContainerLayer defined in (line 10)), so it's actually at coordinate (320, 20). The width=320, height=460. Essentially, we're having a layer that is horizontally shifted by 320 pixels. But our screen is only 320x480. We have a layer for something off screen! ALERT! See bug 976299.
- A Content Host for the Thebes Layer in 16, again at position (320, 0).
- The buffer for the Content Host and Thebes Layer.
- Another Container Layer! Yay! This time at (0, 0), width=320, height=460. This Layer is for the word 'Settings' and might be too big.
- Another ThebesLayer, (0, 0) 320x460. Again in reality, it's at (0, 20), but because it's inside line 10, it's from the point of view of shifted down 20 pixels.
- Another Content Host.
- Another buffer for the Thebes Layer in line 20.
- Another Container Layer. However, this is the layer we're actually scrolling. First, we see that it has dimensions (0, 50) width=320, height=1435. Why is it height=50? What's at (0, 0) in the Settings app? Oh, the word 'Settings'! Thus the layer at line 19 contains the word 'Settings'. This layer at line 23, is the scrollable area of the Settings App. The height=1435 is because of APZ. Remember the displayport being bigger than the viewport? Here's how big the displayport is! 1435 pixels. We see that this height=1435 matches the displayport size [displayport=(x=0.000000, y=0.000000, w=320.000000, h=1435.000000)].
- A Thebes Layer for the whole displayport, hence 320x1435 size. We also see again that the starting coordinate is (0 ,0), which is again relative to the ContainerLayer in line 23. So (0, 0), means the top left of the ContainerLayer in line 23, or in overall coordinate space (0, 70). 20 from the status bar plus 50 from the word 'Settings'.
- A ContentHost for the ThebesLayer in line 24. The same size.
- A buffer for the ContentHost for the ThebesLayer in 24.
Whew. Thus when we're optimizing the Layer Tree, we're looking for items that are allocated a layer when they shouldn't be or wrong sized layers. One thing to note here is that every layer here is Opaque. If it is transparent, we'll see [component-alpha] instead of [OpaqueContent]. Opaque content is great because we don't have to paint any items behind the opaque content. If you were wondering why we were adding a bunch of background-colors into Gaia, this is it. Whew, so that's line by line on how to read a layer tree. Essentially, if you see a layer but don't see it on the screen, we can optimize a bit more. Every time we eliminate a Layer, it helps the compositor at every single frame, or every 16.6ms.
Are We There Yet?
In theory, if we fix everything, and we're not doing any other work while scrolling, we should not checkerboard. We've made a lot of progress in many of the apps. You can find the mother bug in bug 942750. Until then: