Language 简体中文正體中文 English

Notice This article is published 2165 days ago, some contents may be deprecated.

AI Summary

The requirements of the AWB platform I'm responsible for are becoming increasingly complex. I implemented automatic table pagination with header and footer on each page. Recently, I implemented table cell merging through simplified requirements and a clever algorithm, and optimized the handling of cross-page "large cells", while ensuring that previous performance optimizations are not affected.

There are more and more complex requirements in the AWB platform I am responsible for. Since I implemented a complex table layout requirement during the winter vacation, various businesses teams have begun to use table layout in various scenarios. After all, who doesn't like a table that can automatically paginate, layout and keep a header and footer on each page?

Table Layout

Let's review the previous requirements first:

There is a special type of AWB called "picking List" for sellers to pick goods
The picking list has three parts: the header area (showing the logo, buyer's address, etc.), the table (showing the product information), and the footer area (showing remarks and page numbers)
The table may be very long and needs to be paginated for printing (paper types are divided into three types: A4, A5, and A6)
There must be a header area before the table on the first page
There must be a footer area after each page of the table, and it can be configured whether there is a header area before the table on non-first pages
The content of the table is unknown, and the column width, font, and font size can be customized, and the row height is uncertain

The general effect is similar to the following figure (A6, horizontal, showing a header area before the table on non-first pages):

A table that can be used as a picking list

Since this layout requirement has exceeded the capabilities of Chrome, I discussed it with the backend team and decided to inject some layout code into the generated HTML, and then run it on the server with headless Chrome to print the final result as a PDF.

This layout code is still quite happy to write (no it isn't), because I have reached some agreements with the business team: the layout of the page is fixed, the header and footer areas can be freely designed (as long as they do not exceed the page height), and the table can only be generated by configuration rather than dragging components. This way I can calculate the height line by line, and even for the large amount of data layout, I have made some optimisations (see this article).

Cell Merging

Here comes the new requirement. Merging cells using code has always been a very complex problem, whether it is native HTML or various component libraries. I think that for my current ability, it is not realistic to write a table that supports arbitrary cell merging in a short time (and it is in the case of unknown data and manual pagination layout code). Therefore, I carefully studied the requirements and found that I can made some reasonable simplifications.

First of all, the usual way to merge cells in HTML is to use the <table> element. If you want to merge a cell A with the cell B above it, you set rowspan for B, and then delete A, like this:

<table>
    <thead>
        <tr>
            <th>1</th>
            <th>2</th>
            <th>3</th>
            <th>4</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td rowspan="3">merge 1</td>
            <td>split 2</td>
            <td>split 3</td>
            <td rowspan="3">merge 4</td>
        </tr>
        <tr>
            <td>split 2</td>
            <td rowspan="2">small merge 3</td>
        </tr>
        <tr>
            <td>split 2</td>
        </tr>
    </tbody>
</table>

The effect is like this:

1	2	3	4
merge 1	split 2	split 3	merge 4
	split 2	small merge 3
	split 2	small merge 3

You can think of it as having three rows in <tbody>, the first row is a "complete row" because it has the same four columns as the header; the second row is an "incomplete row", it only has the middle two columns, and the first and last two columns are deleted to merge with the cell above; the third row has only one column (2), the other three columns are merged. Similarly, if you want to merge with the cell on the left, you need to set colspan for the cell on the left and delete the current cell.

Secondly, the "merged cells" described in the requirements actually look like this:

Group Name	Group Image	Item Name	Item Attribute
Group 1	[Image 1]	Item 1-1	High
		Item 1-2	Medium
		Item 1-3	Low
Group 2	[Image 2]	Item 2-1	High
		Item 2-2	Medium

The first two columns are "large cells", and the last two columns are "small cells", which look very neat and easy to handle at first glance! After communicating with the business team, we finally made the following agreement:

The first data of the table must be a "complete row" (after all, the cells in the first row have no "upper elements")
All "incomplete rows" have the same missing columns ("small cell" columns will not be merged)
The height of the cell does not exceed the height of one page minus the height of the header and footer areas (no need to consider the scenario that the content inside a cell may be split)
No horizontal merging (which can simplify the calculation of rowspan and no need to calculate colspan)

Under this agreement, the requirement of "merging cells" has been greatly simplified and is no longer so difficult to implement.

A Simple Algorithm for Calculating Rowspan

We made an agreement with the business team: we can set an option for the table component called "Allow empty cells merging up" (off by default, needs to be manually enabled). If enabled, when they pass the data, all empty strings ("") or missing fields corresponding to the cells will be merged upwards. For the neat table above, the business team should pass the data like this:

{
    "table_data": [
        // The first "complete row"
        {
            "group_name": "Group 1",
            "group_image": "[Image 1]",
            "item_name": "Item 1-1",
            "item_attr": "High"
        },
        {
            "item_name": "Item 1-2",
            "item_attr": "Medium"
        },
        {
            "item_name": "Item 1-3",
            "item_attr": "Low"
        },
        // The second "complete row"
        {
            "group_name": "Group 2",
            "group_image": "[Image 2]",
            "item_name": "Item 2-1",
            "item_attr": "High"
        },
        {
            "item_name": "Item 2-2",
            "item_attr": "Medium"
        },
    ]
}

Although we cannot get the original JSON data after rendering by the backend, we can achieve the same functionality by simply traversing the table:

// Assume we have already obtained the `table`, `tbody`, and other elements, and stored them in variables with the same name

// The table has only one row, no need to merge
if (tbody.childElementCount <= 1) {
    return;
}

// The first row (must be a "complete row") has no elements, no need to merge
const firstLine = tbody.firstElementChild;
if (firstLine.childElementCount <= 0) {
    return;
}

// `lastTds` is used to easily find the last element of each column, which is convenient for setting `rowspan`
const lastTds = [...firstLine.children];
// `lastRowSpans` is the `rowspan` value of these elements
const lastRowSpans = Array(firstLine.childElementCount).fill(1);

// For each row of the table
for (
    let tr = firstLine.nextElementSibling;
    tr.nextElementSibling;
    tr = tr.nextElementSibling
) {
    // Enumerate each column to see if there are cells that need to be merged upwards
    for (
        let td = tr.firstElementChild, i = 0, next = (td || {}).nextElementSibling;
        td;
        td = next, i++, next = (td || {}).nextElementSibling
    ) {
        if (
            // This cell is completely empty, obviously it needs to be merged
            !td.firstElementChild.innerHTML
            || (
                // Currently, each cell contains either pure text or an <img>, depending on the data format of the column
                // If it is an <img>, the business team should pass a URL or Base64 string
                // If the data format of the column is "image" and the data is empty, the backend will render an image with an empty src
                // So as long as this situation appears in the if statement, it can also be considered that this cell is empty and can be merged upwards
                td.firstElementChild.firstElementChild
                && td.firstElementChild.firstElementChild.tagName === 'IMG'
                && td.firstElementChild.firstElementChild.getAttribute('src') === ''
            )
        ) {
            // The rowspan of the last element of the current column (not necessarily the cell immediately adjacent to the current cell) += 1
            lastRowSpans[i]++;
            lastTds[i].setAttribute('rowspan', lastRowSpans[i]);
            // Remove the current cell
            tr.removeChild(td);
        } else {
            // If no merging is needed, the last element of the current column is the current cell, and rowspan = 1
            lastRowSpans[i] = 1;
            td.setAttribute('data-index', i);
            lastTds[i] = td;
        }
    }
}

Modify the Previous Pagination Layout Algorithm

If there is no pagination function, then this requirement has been completed, but things are not that simple...

I quickly found out that if I still follow the previous method and force the next row to be placed to the next page when there is not enough space in order to avoid splitting the content in the cell, it will inevitably cause a huge waste of space, because it is possible that the content in the "large cell" is not much, but there are many "small cells" corresponding to each "large cell", so the height of a "large cell" is very likely to exceed one page, and this is absolutely unavoidable in the business:

Group 1	Item 1	Group 2	Item 1
	Item 2		Item 2
	Item 3		Item 3
	Item 4		Item 4

	Item 5		Item 5
	Item 6		Item 6
	Item 7		Item 7
	Item 8		Item 8

However, there is one thing worth noting - the content in the "large cell" required by the business is indeed not much, usually just a product name or image, and the content itself will not exceed one page. If I can split this cell into two cells, one with content and one without content, it can also alleviate the problem of space waste to a certain extent.

The previous layout idea was like this:

Let the current height current = headAreaHeight + thHeight
Enumerate each row from the beginning
1. Get the height of the current row height = tr.offsetHeight
2. If current + height + tailAreaHeight is greater than the height of one page:
  1. Insert a tail area at the end of the current page
  2. Put the header and the current row on a new page, and update current
3. Otherwise, insert the current row at the end of the current page and update current

There are two places that need to be modified.

Redefine "Current Row"

If we still directly get tr.offsetHeight as before, there may be problems in this combination scenario:

The current row contains a "large cell" with a height between one row and two rows of "small cells"
The current page can contain the current row, but cannot contain the height of two rows of "small cells"

If this scenario occurs, should I put one row of "small cells" on the current page or forcibly put two rows? If I put one row, the "small cell" will be stretched vertically to fit the height of the "large cell", which is not very beautiful; forcibly putting two rows may push the tail area to the next page.

Finally, I decided to use this method: if the current row contains a "large cell", it must be a complete row, so I first find the one with the highest content height in all "large cells", assuming the height is x, and then find a non-complete row from the current row downwards. According to the previous agreement, before traversing to the next complete row, I can definitely find a non-complete row, and the height of all rows from the current row to it is exactly greater than or equal to x. If it is greater, it means that the content of the "large cell" is indeed not much; if it is equal, it means that the content of the "large cell" is relatively large, and it may even stretch the "small cells".

Calculation method of current row height

In the figure above, the content of the "large cell" in the left table is very small, so we first take the height of the red box, and then scan from top to bottom to see how many "small cells" add up to the height of the red box, the answer is the blue box, so we use the height of the blue box as the height of the current row; the content of the "large cell" in the right table is large, according to this method, the blue box will take all three "small cells", which also meets expectations.

If the "current row height" calculated by this method plus "the height of the tail area" cannot be placed on the current page, we will follow the previous idea: first insert a tail area on the current page, and then put the current row and the row where the "small cells" in the blue box are located on the next page, and let current be equal to the height of the blue box; otherwise, insert the current row and the row where the "small cells" in the blue box are located on the current page, and add the height of the blue box to current.

Processing of "Large Cells" After Crossing Pages

Since the header needs to be displayed in the table on each page, the cells that belong to the next page after the "large cell" is split must be explicitly drawn. The green box in the figure below is such a cell:

A cell that must be explicitly drawn

But fortunately, when we layout, we can record "how many incomplete rows have been used since the last complete row." In the example in the figure above, "内容很少" and "元素 1" are the most recent complete rows, and then all the way to "元素 5" are incomplete rows (a total of 4). Since we know that the rowspan = 8 of the "内容很少" cell when calculating the merged cells, it can be determined that the rowspan = 8 - 4 - 1 = 3 of the green box. As long as we find that the row where "Item 6" is located is an incomplete row, we insert a rowspan = 3, content-empty "large cell" at each missing place, and supplement it to a complete row.

So, this requirement is finally completed.

Don't Break The Web

In the previous section, I mentioned that I optimised the layout code for performance. Will the newly added function affect the effect of the previous optimisation? The answer is no. Let's review the optimisation I did before:

We only need to set the outermost element to display: none at the beginning, which can avoid rendering each block; then render all blocks at once after all blocks are loaded.
As long as the width of <col> is set for each table, the height of this row is fixed no matter which table block it is placed in. That is to say, the height of all elements is completely determined after rendering and before layout! So I can save the offsetHeight of all necessary elements in a Map at the beginning of the layout, and then call it directly.

For the first optimisation, controlling whether to display only affects the surface itself, not the elements inside each surface. I just need to ensure that my merged cell code and layout code can be executed after "rendering all at once"; for the second optimisation, because splitting a cell into two pages does not affect the overall height, and the content height of each "large cell" will not change, so the row height and content height can be cached in the previous Map in advance for layout to call.

R·e^x / Zeng

MUGer, hacker, developer, amateur UI designer, punster, Japanese learner.

When Table Layout Meets Cell Merging