R·ex / Zeng


MUGer, hacker, developer, amateur UI designer, punster, Japanese learner.

A Ghost Bug Caused by Use Zoom For DSF

When we are writing a project, we will produce some bugs. As the saying goes, "the first step in solving a bug is to reproduce it", but what if we encounter a ghost bug (a bug that cannot be stably reproduced)? How to solve it?

The Phenomenon of the Bug

We have a project that is open to the public, and one of the pages displays various statistical charts. When we were still celebrating the completion of the project on time, QA came over and said that the separator lines between rows in a table on the page disappeared:

The disappeared separator lines

But after debugging many times, we finally found that a sentence that had been heard for a long time was finally put to use:

Works on my machine... What the hell.

Yes, this is a ghost bug, because I found that on my colleague's computer, as long as we refresh the page enough times, it will be reproduced once or twice. Unlike projects like the operation platform, this is the second platform of our team that is open to the public, any problems that may affect the style and experience of the public platform must be fixed.

Step by Step Attempts

Attempt to Reproduce Stably

After many investigations, we did not find any CSS issues: the computed style of the cell has border-top, and there is no background and overflow: hidden, indicating that it is not covered by any element.

Well, since it doesn't seem to be a code problem and it can't be stably reproduced, let's try to analyze and search for possible causes.

Attempt to Analyze

This table is located at the bottom of the page, and there is a Metrics module above it. In the few times when the phenomenon was accidentally reproduced, I found that the table lines were there at the moment the page was loaded, but when the data of the Metrics module above was loaded and the table was slightly pushed down, the separator lines disappeared.

The whole page

Since the problem only occurs when it is pushed down, could it be caused by physical pixels?

I remembered that a colleague had asked me a question before:

In Chrome, when the zoom is 75%, the plus icon in the Button component only has a vertical line left, and the horizontal line disappears. This problem is relatively easy to explain: in Chrome's view, the physical pixels of the screen cannot be decimals, so all pixels are converted to integers during rendering. This may cause the vertical coordinates of the upper and lower edges of the horizontal line in the plus sign to be equal after rounding under certain zoom and positioning conditions. For example, after calculation, the Y coordinate of the upper edge is 80.5px, and the Y coordinate of the lower edge is 81.25px, so the lower edge is rounded down and the upper edge is rounded up, both are 81px, and the horizontal line disappears.

But not all horizontal lines will disappear (otherwise the whole page will change when zoomed), for example, the Y coordinate of the upper edge is 81.25px, and the Y coordinate of the lower edge is 82px, at this time, the upper edge is rounded down and the lower edge is rounded up, one is 81px and the other is 82px, so a 1px wide horizontal line will be displayed on the screen.

It is important to note the difference between "physical pixels" and "subpixel rendering". An example of "subpixel rendering" is: on a non-high-resolution screen, if an element with an odd number of pixels in width (e.g. 251px) is set to translateX(50%), it will become blurry. This situation is less likely to occur on high-resolution screens.

Following this line of thought, I looked around at everyone's devices and found that our Chrome browsers were all placed on Mac retina screens, but the colleague who could reproduce it placed Chrome on a 1080P monitor. This basically confirms that this bug is related to physical pixels.

Attempt to Search

After a Google search, I found a tweet from EvanYou on Twitter saying that Chrome seems to have a bug recently, where a sticky element with position: sticky will shake when the page is scrolled, and occasionally there will be a 1px gap at the top of the page:

EvanYou's tweet

In the replies, I saw a crbug link: Issue 1076036: 1px gap with sticky positioning and mouse-wheel scrolling, which is similar to the tweet, saying that there is an element on the page with position: fixed and top: 0, but occasionally there will be a 1px gap above the element when the mouse is scrolled.

The comments below said that Chromium has a flag called Use Zoom For DSF, which was always enabled in previous versions, but was disabled in recent versions due to some problems. If you want to re-enable it, you can add the --enable-use-zoom-for-dsf parameter when starting Chromium. I tried it and found that the problem was indeed gone.

But we can't let QA and users do this, right? This is definitely not a good solution.

What is Use Zoom For DSF

Here comes a new term. After searching around, I found that there was almost no Chinese information. However, in the English search results, I found a proposal from the Chromium team Using Zooming to implement DSF and the corresponding technical solution document Use Blink’s Zoom to implement device scale, so I forced myself to read these two documents, and if I encountered something I didn't understand, I would search for it.

The documents say that DSF is the abbreviation of device scale factor, which I think can be translated into "设备缩放因子" in Chinese. It sounds a bit related to devicePixelRatio (DPR) that everyone is familiar with and will be used in the rendering process.

Rendering Pipeline

The rendering process of Chromium can be described by a pipeline, which is divided into several steps. Each step receives the semi-finished product generated by the previous step and generates new content to pass to the next step. This is a major test point in interviews.

Let's "review" this classic image first (from Life of a Pixel):

Rendering process of Chromium

The detailed operations of each step can be found in this article: The Life of a Pixel - Chromium Rendering Pipeline Step by Step - ¥ЯႭ1I0. Combined with the above image, here are some key parts excerpted from the article:

  • Main thread (main) is responsible for:

    • Layout phase: After generating the DOM tree and calculating the style, the visual geometric properties of the elements need to be processed.
    • Paint phase: Use a list to store the objects that need to be drawn. The object records the drawing operations that need to be performed to draw the element, such as which color to use and where to draw a rectangle.
  • The compositor thread (impl) is responsible for:

    • Rasterisation: The drawing operations recorded earlier are executed in the rasterisation phase. In the bitmap obtained after rasterisation, each grid stores the pixel bit values that have color and transparency encoded.
  • The process of layer composition (related to composite layers): Construct the composite layer in the main thread and submit it to the compositor thread for drawing.

Let's go back to the proposal and technical solution documents of the Chromium team. The original description of the cause of the problem is as follows:

The content painting commands are recorded at 1.0x scale factor, and then rastered at the target scale factor on another thread (impl side painting).

...

This can cause undesirable artifacts at fractional scale factor due to rounding.

The document also gives an example:

Rendering scene with problems

The rounding of the coordinates when recording the drawing operations will result in a large error if there is a decimal in the intermediate result. The document also gives an example of what it should look like under normal circumstances:

Rendering scene under normal circumstances

Proposal and Implementation of Use Zoom For DSF

The meaning of Use Zoom is "use the same technology as the page zoom to handle DSF". The proposal author found that when the browser is zoomed (Ctrl +, Ctrl -), the coordinates are scaled (and rounded) in the layout phase. If the coordinates can be scaled according to DSF in the layout phase during normal rendering, the rounding error will be greatly reduced, and such problems will not occur. The proposal later analyzed the feasibility of this approach, provided some screenshots of rendering to prove that scaling will not have such problems, and believed that the difference between page zoom and the current rendering scaling according to DSF is small enough, that we can use a technology similar to the former to replace the latter.

The implementation given in the technical solution is basically aligned with the proposal:

When rendering on the main thread, if DSF is X, the drawing instructions are recorded according to Zoom = X00%, DSF = 1.

However, this introduces a new problem: since the drawing commands on the main thread are generated according to DSF, if we move the window from a low-resolution screen to a high-resolution screen, the DSF changes, won't it become blurry? To solve this problem, the main thread will attach a parameter when sending the composited frame to the compositor thread, indicating "I was composited when DSF = X", if the UI thread finds that this parameter is different from the DSF it obtained, it will do some processing again. For example, multiply all the coordinates in the drawing instructions generated when DSF = 1 by 2, which can handle the situation from a low-resolution screen to a high-resolution screen.

About the implementation, the Chromium team added an additional float painted_device_scale_factor_ parameter to the compositor thread (to distinguish it from the previous device_scale_factor_ parameter), which is the parameter attached just mentioned before. In theory, the previous device_scale_factor_ parameter is meaningless and can be removed, but its impact is too great, and the team believes that it still needs to be carefully considered. Therefore, until now, Chromium's compositor thread still retains two DSF parameters.

This proposal was actually implemented a long time ago, so this situation did not occur in the previous rendering. But around Chrome 93, it seems that someone found some problems with it, so this feature was disabled, which caused the problem we encountered.

Temporary Solution

Although the Use Zoom For DSF feature cannot be restored in the short term, it does have an impact on us, so we need to find a solution. I remember that when solving the component library problem before, I encountered that getBoundingClientRect will get decimals on some elements with transform enabled, so considering that this may also be related to composite layers. I opened Devtools and turned on "Show layer borders", and found that the elements in the table all have yellow borders, which means they are all composite layers.

I remember that I wrote an article before: Bugs Caused by Browser Optimization, which mentioned under what conditions Chromium will create a composite layer for an element:

  • CSS properties like 3D or perspective transforms (perspective, transform)
  • <video> elements with hardware accelerated video decoding
  • <canvas> elements with accelerated 2d context or 3d context (WebGL)
  • Plugins like Flash
  • Elements with CSS animations applied to opacity, or that are being animated with transform
  • Elements with CSS filters applied
  • Elements containing a descendant with a composited layer (in other words, an element with a child that is in its own layer)
  • Elements with a z-index lower than a sibling with a composited layer

I re-examined the code, but found no suspicious points. Considering that this article is already very old, these conditions may have changed, so I went to find the latest conditions - they are all written in the source code compositing_reasons.h! It seems that there are many more conditions now than I knew before!

#define FOR_EACH_COMPOSITING_REASON(V)                                        \
  /* Intrinsic reasons that can be known right away by the layer. */          \
  V(3DTransform)                                                              \
  V(Trivial3DTransform)                                                       \
  V(Video)                                                                    \
  V(Canvas)                                                                   \
  V(Plugin)                                                                   \
  V(IFrame)                                                                   \
  V(DocumentTransitionContentElement)                                         \
  /* This is used for pre-CompositAfterPaint + CompositeSVG only. */          \
  V(SVGRoot)                                                                  \
  V(BackfaceVisibilityHidden)                                                 \
  V(ActiveTransformAnimation)                                                 \
  V(ActiveOpacityAnimation)                                                   \
  V(ActiveFilterAnimation)                                                    \
  V(ActiveBackdropFilterAnimation)                                            \
  V(AffectedByOuterViewportBoundsDelta)                                       \
  V(FixedPosition)                                                            \
  V(StickyPosition)                                                           \
  V(OverflowScrolling)                                                        \
  V(OverflowScrollingParent)                                                  \
  V(OutOfFlowClipping)                                                        \
  V(VideoOverlay)                                                             \
  V(WillChangeTransform)                                                      \
  V(WillChangeOpacity)                                                        \
  V(WillChangeFilter)                                                         \
  V(WillChangeBackdropFilter)                                                 \
  /* Reasons that depend on ancestor properties */                            \
  V(BackfaceInvisibility3DAncestor)                                           \
  /* This flag is needed only when none of the explicit kWillChange* reasons  \
     are set. */                                                              \
  V(WillChangeOther)                                                          \
  V(BackdropFilter)                                                           \
  V(BackdropFilterMask)                                                       \
  V(RootScroller)                                                             \
  V(XrOverlay)                                                                \
  V(Viewport)                                                                 \
                                                                              \
  /* Overlap reasons that require knowing what's behind you in paint-order    \
     before knowing the answer. */                                            \
  V(AssumedOverlap)                                                           \
  V(Overlap)                                                                  \
  V(NegativeZIndexChildren)                                                   \
  V(SquashingDisallowed)                                                      \
                                                                              \
  /* Subtree reasons that require knowing what the status of your subtree is  \
     before knowing the answer. */                                            \
  V(OpacityWithCompositedDescendants)                                         \
  V(MaskWithCompositedDescendants)                                            \
  V(ReflectionWithCompositedDescendants)                                      \
  V(FilterWithCompositedDescendants)                                          \
  V(BlendingWithCompositedDescendants)                                        \
  V(PerspectiveWith3DDescendants)                                             \
  V(Preserve3DWith3DDescendants)                                              \
  V(IsolateCompositedDescendants)                                             \
  V(FullscreenVideoWithCompositedDescendants)                                 \
                                                                              \
  /* The root layer is a special case. It may be forced to be a layer, but it \
  also needs to be a layer if anything else in the subtree is composited. */  \
  V(Root)                                                                     \
                                                                              \
  /* CompositedLayerMapping internal hierarchy reasons. Some of them are also \
  used in CompositeAfterPaint. */                                             \
  V(LayerForHorizontalScrollbar)                                              \
  V(LayerForVerticalScrollbar)                                                \
  V(LayerForScrollCorner)                                                     \
  V(LayerForScrollingContents)                                                \
  V(LayerForSquashingContents)                                                \
  V(LayerForForeground)                                                       \
  V(LayerForMask)                                                             \
  /* Composited layer painted on top of all other layers as decoration. */    \
  V(LayerForDecoration)                                                       \
  /* Used in CompositeAfterPaint for link highlight, frame overlay, etc. */   \
  V(LayerForOther)                                                            \
  /* DocumentTransition shared element.                                       \
  See third_party/blink/renderer/core/document_transition/README.md. */       \
  V(DocumentTransitionSharedElement)

P.S. I can tell the interviewer in future: I have read the Chromium source code, and these are the conditions for creating composite layers... (LOL)

Finally, I found that the <td> element in the leftmost column of the table was set to position: sticky, because this column is a fixed column (fixed: 'left'). In addition, the z-index of the fixed column needs to be larger than that of the other columns to ensure that they always float above the other columns. Ant Design 4 also implements fixed columns in this way.

According to the rules in the code block above, first position: sticky will match V(StickyPosition) to create a separate composite layer, and then because it is the leftmost column, the z-index is larger, so the subsequent columns will match V(AssumedOverlap) to create a separate composite layer. There may be composite layer squashing later, but it is not related to this problem, so I won't mention it.

Considering that this column is not wide, it may not need to be fixed. I tried to modify the code to unfix this column, and the problem was solved.

Follow-up

One day I received a push notification from Hacker News, and when I clicked on it, it linked to a comment on a crbug:

Comment on crbug

On February 9, 2022, --use-zoom-for-dsf has been fully enabled. This means that this problem will no longer occur now!

Harvest

Although the final solution was just to delete a line fixed: 'left', I learned some new things and updated my outdated knowledge in the process of solving the problem. As for how to solve this kind of problem, I think there are four points:

  1. It is beneficial to take a look at the crbug website for Chrome bugs. For Firefox bugs, go to Bugzilla. Browsers are also softwares, and there are definitely more edge cases than business projects, so it is normal to have bugs.
  2. When designing, be sure to leave behind Technical Requirements Documents (TRD) and Technical Design Documents (TD) so that others can understand your ideas.
  3. Understand some underlying principles. Whether it is a common problem or a ghost bug like this one, it will come in handy.
  4. The root causes of some problems are similar, so remember to summarize your experience in solving problems.
Disqus is loading... If it fails to load, please add disqus.com and disquscdn.com to your whitelist.

We've been together for

3877 days