R·ex / Zeng


MUGer, hacker, developer, amateur UI designer, punster, Japanese learner.

AWB Tool: Ultimate Optimisation for Chrome Headless Rendering

Background

Some of you may be familiar with the "AWB Tool" project, because I have written several articles related to it before. Its basic function is nothing more than drawing a shipping label: if you buy something from Shopee, the shipping label on the express delivery is drawn with this tool.

Maybe this tool is more convenient to use. With the expansion of the business, some strange things that are not shipping labels have also begun to use this tool to draw, such as the contract of the procurement system (more like a Word document), the barcode of the warehouse (using a specially sized paper), and the picking list (can be understood as a take-out list), which puts extremely high demands on the scalability of the shipping label tool. However, this is not the focus of this article. I will write a separate article later.

The interface for drawing the picking list

I have written an article before, which is about how to optimize the rendering efficiency of the editor (from 1 second to 100 ms for 500 elements), and I am already very satisfied with this result. But as more and more systems start using this tool, it encounters another performance bottleneck - the server-side rendering is too slow...

Problem Analysis

What exactly happens between a shipping label on the editor's canvas and generating a PDF?

  1. The front end converts the things in the editor into a template language (we chose Django / pongo2);
  2. A third-party system sent a request through an API call, asking to use a certain template and providing all the necessary data (order number, product details, etc.);
  3. After the backend verifies the data, it uses the template engine to generate an HTML of a shipping label;
  4. The backend uses Chrome Headless to open the HTML that was just generated and print it as a PDF;
  5. The backend returns the PDF to the third-party system, which may display it on the screen, or call a printer to print it.

It can be seen that although the difficulty of implementing the editor is all in the front end, it is the backend that can truly make this tool play its role. When the business increases, the front end considers how to implement possible new requirements, while the back end needs to consider how to deal with the gradually increasing QPS.

Backend Optimization

Since the backend colleagues cannot access the specific content of the shipping label (or, even if they do, they are powerless), they can only do some general optimisations:

  • Use a process pool instead of opening a new Chrome Headless every time;
  • Use the debugging protocol to reuse the previously opened tab each time;
  • Disable plugins, sandbox, etc.;
  • Add machines...

These points can all be found online. They may have done other optimisations, but these are the only ones I know for now. These optimisations are still very useful when the content of the shipping label is not complicated, until a business team proposes a strange requirement...

Business Requirements

One day, we got a strange request on layout rule:

  • First, a picking list is needed: it can be understood as a take-out receipt, which starts with a header area (company name, order number, order time, remarks, etc.), then a table (product, quantity, price), and finally a footer area (payment information, delivery information); the picking list is printed on vertical A6 paper, and the number of rows in the table can be large. In principle, there should be a footer area at the end of each page of the table;
  • Then, a "group of" picking lists is needed: several picking lists are arranged on an A4 paper, and the layout rule between them can be understood as the column layout rule of Word.

This requirement cannot be rejected for some reason, and even the layout rule cannot be modified (the business team strongly requests it, saying that local sellers hope so, and one of our competitors already has this feature).

Due to the uncertainty of the table data and the column layout logic, as well as the requirement of "there must be a footer area at the end of each page of the table", which is beyond the layout ability of Chrome itself, I discussed with the backend colleagues for half a day and had to decide to use JavaScript for layout on the server side, that is, I inject the layout code into the template, and the backend will execute this layout code when opening it in headless Chrome. I said that this may take 100 ms longer to generate than other ordinary shipping labels, and the backend colleague said, "I'm going to cry. Where can I squeeze out 100 ms for you..."

However, no matter how much we complain, we still have to do it. After more than a week of development, I not only completed the layout code, but also refactored the editor so that it can be easily extended to other business scenarios.

Then, the business team says that up to 50 picking lists might be arranged in a group, and returned as a PDF file. WTF...@%$!&^

What the hell...

Of course, as a professional engineer, business requirements can always be met, so I spent another two days to get it done. However, QA did a performance test: using the data of 50 shipping labels to request, it would take at least 100 seconds to return the result. I ran it on my Alienware after getting the HTML, and found that it still needs 16 seconds, which is too slow!

Frontend Optimisation

The Most Primitive Code

After getting the generated HTML file, I opened it in Chrome on my laptop. Since the server does not have a GPU, I disabled Chrome's hardware acceleration to be consistent with the server. Check the Performance panel, it looks like this:

The performance is unbearable

Oh my god, this is scary... I can't help but sigh that modern browsers cannot make it without GPU acceleration.

In the file provided by QA, each shipping label has 30 "products", each "product" has four columns, which means I have to layout for 50*30*4=6000 elements in real-time.

Naive Optimisation

After careful observation, if the situation after 40 seconds is the real-time layout, then is the first 40 seconds just rendering the initial HTML structure? Why does the time of recalculate style get longer and longer?

After a careful thinking, it is not surprising, because my layout code has not yet been executed, all areas and tables will be stacked at the position (0, 0) because I use absolute position at first. Chrome's rendering process is "rendering while parsing HTML", and recalculate style is performed every time it is rendered. This means that: every time a new area or table is rendered, it needs to use the data of all previous areas and tables (because it needs to cover them), so the number of elements that need to be calculated increases. You who have studied algorithms may realise that the time complexity of this method is as high as O(N^2)!

Can we recalculate the style only once? Yes, we just need to set the outermost wrapper to display: none at the beginning, so that it can avoid rendering every time a block is loaded; then render all at once after all are loaded. The idea is like this:

/* don't use *, just hide the outermost element */
.template {
    display: none;
}
/* Add this code to the script at the end of the body */
const newStyle = document.createElement('style');
newStyle.innerHTML = '.template{display:initial !important}';
document.body.appendChild(newStyle);

And I suddenly realised that the previous code used document.querySelectorAll('tr') to get elements, which is actually quite slow. If I can use firstElementChild and nextElementSibling, it will be much faster, because in Chrome, the storage structure of the node tree is quite magical, each node has only five pointers:

The five pointers of the Node Tree

According to the README of Chrome source code:

That means:

  • Siblings are stored as a linked list. It takes O(N) to access a parent's n-th child.
  • Parent can't tell how many children it has in O(1).

So don't talk about node.querySelectorAll, even a node.children call needs to spend O(N) time to generate a JavaScript-like array, and then it is likely to be converted into a real array through Array.from or ... operator, and finally we find the required element from it, which will consume a lot of time. Since I know the position of the table element, it will be faster to find the next elements through firstElementChild and nextElementSibling.

After making these changes, the performance looks like this:

Performance after optimisation

It can be seen that the rendering time of the first 40 seconds has been reduced to 3 seconds, and the speed of the layout code in the later stage has also increased by about 8 seconds.

Can it be more powerful?

Ultimate Optimisation

I couldn't think of any further optimisation before, because I looked at the time corresponding to each line of code in Sources, and found that the most time-consuming part is here:

The code that consumes the most time now

I need to perform a layout operation every time I get tr.offsetHeight, and it is a force reflow, but I must get the height of each row in the table, because the data in the table may have line breaks, and the height of each row is uncertain.

But before writing this article, I suddenly realised: as long as I can ensure that each table is set with the width of <col>, then the height of this row will not change no matter which table block it is placed in. That is to say, the height of all elements is completely determined between rendering and layout! So I can save offsetHeight results of all necessary elements in a Map at the beginning of the layout, and then call it directly later.

Before the optimisation, the timing of getting the height is during the layout, which means that every time the DOM structure is modified, the height of another element is obtained in the next loop, so there will be a force reflow every time it is obtained; after the optimisation, the timing of getting the height is before the layout, and the layout only gets the cached height of the element, without triggering force reflow, so there will be only one layout.

The idea is like this:

const trHeights = new Map();
const headAreaHeights = new Map();
const tailAreaHeights = new Map();
const theadHeights = new Map();
const tbodyHeights = new Map();
Array.from(document.querySelectorAll('tbody > tr')).forEach(el => {
    trHeights.set(el, el.offsetHeight);
});
Array.from(document.querySelectorAll('.__area_head')).forEach(el => {
    headAreaHeights.set(el, el.offsetHeight);
});
Array.from(document.querySelectorAll('.__area_tail')).forEach(el => {
    tailAreaHeights.set(el, el.offsetHeight);
});
Array.from(document.querySelectorAll('thead')).forEach(el => {
    theadHeights.set(el, el.offsetHeight);
});
Array.from(document.querySelectorAll('tbody')).forEach(el => {
    tbodyHeights.set(el, el.offsetHeight);
});

This code is executed once, so the impact on efficiency can be ignored. Next, we replace all the following offsetHeight with this:

trs.forEach((tr, index) => {
    // const rowHeight = tr.offsetHeight;
    const rowHeight = trHeights.get(tr);

    /* other codes */
});

Check the performance, it looks like this:

Performance after optimising Layout

This is brilliant! Let's analyse the meaning of this graph:

  • 0 ms ~ 400 ms is loading the page, which theoretically cannot be optimised anymore;
  • 400 ms ~ 2300 ms is to display all elements at the beginning, so Chrome does a layout;
  • 2300 ms ~ 2500 ms is my layout code, because there is no code to get the style at this time, the browser will concentrate all the style calculations on the next event loop for unified processing;
  • 2500 ms ~ 4900 ms is the unified processing of style calculations, which requires a layout.

Theoretically, the number of layouts cannot be reduced anymore, and the time of two layouts depends on the complexity of the shipping label. It may be possible to speed up by optimising CSS, but it is uncertain how effective it is (after all, now the average drawing time of each element is less than 0.5 ms). Therefore, I think this can be considered as a very extreme optimisation.

References (some image sources)

Disqus is loading... If it fails to load, please add disqus.com and disquscdn.com to your whitelist.

We've been together for

3847 days