Tencent America's Yixin Hu has told us about BroadLeaf, a real-time solution for rendering large-scale forests, explaining how it handles data compressions and texture baking, discussing LOD transition, and comparing the tech to UE5's Nanite.
Introduction
I'm Yixin Hu, a Senior Graphics Researcher at Pixel Lab, Tencent America. Before I joined Tencent America, I was a Ph.D. student at New York University. I did an internship at Adobe, nTopology, and Meta during my Ph.D.
BroadLeaf is the first project I did at Tencent. This work was done in collaboration with my colleagues Yuewei Shao and Jinzhao Zhan from START cloud gaming, Tencent.
Before that, my research direction during my Ph.D. was geometry processing and mesh generation. The methods I proposed are state-of-the-art tetrahedral meshing and triangular meshing algorithms that have been published at ACM Transactions on Graphics, including TetWild, fTetWild, and TriWild.
Limitations of Existing Solutions
Realistic real-time rendering of large-scale forests is an important yet difficult problem due to the massive number of plant-leaf geometries. The problem we want to solve here is rendering large-scale trees in real-time. In the meantime, we want the rendered trees to be of high visual quality and, preferably, interactable.
There are existing methods to solve this problem, and most methods that can keep a reasonably high visual quality of trees are based on level-of-details (LOD). So here, we only focus on LOD-based methods. There are some limitations in the existing methods:
- The LOD generation is not fully automatic and may need manual modification, which requires extra labor work.
- The LOD transition on high-complexity tree models is not efficient enough for real-time application. Even if it's fast enough, the transition may not be smooth due to the simplified LOD meshes are not faithful to the original model.
- The method may not be able to render millions of trees of high visual quality in real-time.
Here are some concrete examples for showing the limitations of the existing methods.
Figure 1 shows the rendering result comparison of our method with Nanite for rendering a testing scene that consists of 500 copies of four kinds of tree models (2000 trees in total).
As we can see in Figure 1, Image 1, Nanite fails to render tree leaves correctly (the trees in the front have partial leaves missing and the trees in the back have all the leaves missing) when the scene is seen from afar. Contrarily, our method renders trees with all the leaves well preserved as shown in Figure 1, Image 2. The missing leaves problem of Nanite is due to its limitation in handling thin surfaces, like plant leaves.
Figure 1:
Here is another example for showing the rendering efficiency issue. In this screenshot of our scene in Figure 2, there are about 600 million triangles rendered.
Figure 2:
For such a complex scene, if we use the Nanite foliage technique to accelerate the tree rendering procedure, the rasterization step is still not fast enough for real-time rendering. As shown in Figure 3, the rasterization takes about 14ms in total, which is half of the total rendering time which is 27ms.
Figure 3:
Actually, when it comes to thin surfaces, like grass and tree leaves, it's easy to cause a large amount of overdraw, which lowers the efficiency. Nanite is always looking to optimize and reduce or remove triangles of a mesh. But with a small and thin surface, like foliage, there will be almost no triangles to remove, so a lot of triangles are rendered and overlapping.
There are 3 main reasons why the problem is hard to solve:
- The models of trees usually have complex geometries and textures.
- It requires large GPU memory for both geometry and texture, especially for high-quality tree models.
- Streaming data of the tree models can be also complex.
Introduction to BroadLeaf
These limitations mentioned above motivate the need for an improved solution. After knowing about the problem and the existing solution, we aim to shift the paradigm. Our goal is to render realistic large-scale forests that consist of single-leaf interactable tree models in real-time, which is extremely useful and demanded in game development.
We also want to develop an automatic level of details strategy for tree leaves that can transit smoothly and efficiently and preserve their shape in the meantime. LOD helps to improve the rendering efficiency of leaves. Usually, foliage is the most complex part of plant models in most cases. We note that our method is mainly designed to be used in applications running on powerful GPUs, e.g. cloud gaming.
The new system we developed is BroadLeaf. We focus on trees with explicit geometry represented by triangle meshes which is the most universal tree model representation, regarding the fact that the tree modeling tools, like SpeedTree, or tree mesh datasets are easily accessible nowadays.
Our goal is to improve the rendering efficiency for foliage which is the most complex part of plant models in most cases. Our input is a triangle mesh with texture information that represents the leaves of a plant model. Our method involves the following steps to achieve our goal:
- We design a new leaf reference structure to compress the input data for both the geometry and texture.
- We propose a new way to automatically generate LODs and organize the levels of data in a hieratical structure. LOD data is pre-cached and will be used for reducing rendering load.
- We accelerate the LOD transition by using GPU-driven rendering pipeline for reducing the rendering load while maintaining a smooth and natural visual effect.
- The input texture information is baked to different levels of LOD meshes and preserved during the LOD transition.
- Based on our hieratical structure of LOD data, we can do more operations efficiently, like culling, to improve the rendering efficiency.
Note that we handle foliage separately from the other part of the plants. Usually, leaves and the other parts (like branches) are disconnected components, but if not, they can be distinguished by analyzing the texture or color information. Also, we only consider trees with static growth status, which means that our method does not support the animation of the growing or withering of tree leaves.
Handling Data Compressions and Texture Baking
For most of the tree models, each leaf is a connected component, which is true for most of the modeled trees. Also, the leaves of the whole tree can be deformed from several kinds of leaves by 3D affine transformations and maybe with some curling effects added. We call those kinds of leaves base leaves.
We design a new data format for tree leaves that is highly compressed. We construct a leaf reference structure, letting a large number of leaves be indexed to a few base leaves. This avoids storing duplicated leaf geometry and texture, which significantly lowered memory usage, especially for input with super fine details.
The file format to store the compressed leaves is as follows: For each base leaf,
- We first have a triangle surface mesh file to represent its geometry (vertices, faces, and UV coordinates)
- Then we use a transformation file to store the transformation information of the leaves indexed to the base leaf. The transformation information of each indexed leaf consists of 12 floating-point numbers: they Ware 3 numbers for translation, 1 number for uniform scaling, 1 number for rotation angle, and 3 numbers for rotation axis.
- We also need the texture maps of the leaves.
In some cases, we can extract the leaf reference information from tree modeling software like SpeedTree. However, this leaf-reference information is usually inaccessible or missing, so we need to construct or retrieve it according to the texture of the input mesh.
Since each leaf is an independent component, here we first show how to simplify a base leaf to a quadrilateral, called a quad-leaf, that approximates the shape of the leaf and preserves the texture well when the input texture is baked on it, as shown in Figure 4.
Figure 4:
So, given the input mesh of tree leaves, we simplify base leaves to quad-leaves and then apply transformations (from the base leaf data) on those quad-leaves to get the quad leaves for all the input leaves. This procedure outputs the Level 0 mesh of the LOD.
To generate next-level coarser meshes, we merge quad-leaves using the same way to generate a new quadrilateral to approximate the triangles of the merging leaves. The quad-leaves are merged based on their position and orientation.
The merging step enables us to correlate the quad-leaves from different simplification levels. Therefore, we design a hierarchy structure to store and organize the quad-leaves of all the levels by their indices, which we name as LOD forest.
The LOD forest consists of LOD trees (as shown in Figure 5) of the same depth. The root nodes of the LOD forest point to the quad-leaves of the most simplified LOS level. Each node of the LOD forest points to the merged quad-leaf of the quad-leaves its child nodes point to. The LOD forest enables us to switch between different levels of leaves on a tree efficiently.
Figure 5:
Then we need to generate new UV maps for the LOD meshes of quad-leaves. For a mesh with N quad-leaves (2N triangles), we can evenly divide the UV domain into M grids.
We then assign the first N UV grids to the N quad-leaves. Each UV grid will then be subdivided into 2 UV triangles. Note that this is the simplest way to generate the UV map, we can also generate a more complicated UV map so that, for example, the transformation of 2D and 3D triangles are more conformal.
To generate the texture maps for the quad-leaves, we bake the input texture on them. We do the projection in 3D and then map the projection into the 2D UV domain. For a quad-leaf that a set of input full-resolution leaves refers to, we project the triangles of those leaves to the quad-leaf. Note that the projected triangles must be inside the quad-leaf region, and we can obtain the barycentric coordinates of the projected vertices on the quad-leaf.
We can then get the UV coordinates of the projected vertices with the barycentric coordinates. The texture of the input triangle is copied to the texture map of the quad-leaves. The depth of the pixels is adjusted according to the position map.
Figure 6:
Figure 6 shows an example of our generated Albedo Map of the quad leaves on a LOD level 1 mesh. We can see in the closeup of the texture map, each quad-leaf could have multiple input leaf textures projected on it.
LOD Transition
After we get the LOD of leaves, we need to transit between the levels as the relative position of leaves to the camera changes. Smooth and fast transitions between LOD levels are demanded especially in real-time scene interactions like video games.
However, existing methods are either unable to preserve the shape of the tree models during the transition resulting in a sudden-change visual defect, or very slow due to more levels stored and involved for a smooth transition. Differently, our method is able to obtain a visually seamless transition and computational fast LOD transition.
Here, for each quad-leaf, we represent it as follows: its 3D center, 3D tangent vector, 3D binormal vector, the 2D UV coordinate of the center, the UV tangent length, and UV binormal length. We can then get the geometry of the quad online. This greatly reduces the bandwidth of geometry data access during the rendering.
Note that some of the attributes are in half type, this is because the ranges of their values are known to be small enough to be accurately represented using half precision.
To preserve the shape of the tree model during the LOD transition, we select LOD levels for quad-leaves independently, starting from the highest LOD level (the most simplified model). Based on the LOD forest we constructed, we first compute the screen quad-leaf size which is the sum of the length of the binormal and tangent vectors on the screen for the quad-leaves of the root nodes to decide whether to traverse to their child nodes, that is to decide whether to use finer triangle elements.
As shown in the figure below, if the screen size of a quad-leaf is smaller than a preset threshold, we stop traversing to the lower LOD levels. If the quad-leaf of a leaf node has a screen size larger than the threshold, we use the full resolution of the input to represent this leaf which is a deformed base leaf.
Although the computation seems light since each quad-leaf only consists of two triangles and pruning the LOD trees has reduced the amount of computation, the workload could still be heavy in large-scale scenes, and thus, we use Mesh Shader to speed up the computation.
We first compute the screen size for quad-leaves level by level from the root nodes of the LOD forest. Each level of computation is done in mesh shader, which means executing once the mesh shader where one thread is assigned to the screen size computation of one quad leaf.
As shown in the figure below, after the mesh shader execution on the root level of the forest, the quad-leaves that have a screen size smaller than the threshold (as the nodes marked in green) will be directly passed to the rasterizer for rendering and we stop traverse to its children. For the other nodes (marked in yellow), we will need checks on their child nodes. We store the indices of these child nodes in the GPU buffer for the next mesh shader execution to check their screen size.
Here we use an example of a single tree to show the result of our LOD transition:
As we can see from the video, the transition is very smooth during the camera movement. Even when the viewpoint is far from the tree, the leaves are well-preserved.
The images on the right from image 1 to image 4 show the changes in LOD levels of leaves during the transition when the viewpoint moves from near to far.
The Results
We prototype and test our algorithm on a machine with Windows 10 operating system, 24 CPU cores, 128 CPU memory storage, and an NVIDIA GeForce RTX 3090 GPU. We render the scene using DirectX 12 as the graphics application programming interface (API).
Here we compare the efficiency of our method with Nanite foliage in rendering tree leaves. We use UE 5 to render the scene for both Nanite and our method.
For the frame of our scene shown in Figure 7, Image 1-3, Nanite takes about 44ms, 37ms, and 15ms respectively for rendering the foliage while our method consistently takes 3~4ms.
Figure 7:
For the scene in Figure 8, our method renders the scene at 102 fps and uses 630 MB GPU memory and 2.5 ms per frame for the foliage, while Nanite renders the scene at 76 fps and uses 1146 MB and 5.7 ms per frame for the foliage, which shows that our method has higher efficiency but uses way less computation resources.
Note that we use UE5 to render the scene for both Nanite and our method. This would introduce extra rendering costs for the whole scene in both methods, like global illumination and shadowing.
Figure 8:
We note that there’s an option in Nanite to preserve the area of the leaves which scales each leaf up when zooming out. This can preserve the small leaves to some degree but cannot preserve them well.
Here is a comparison: Here is a scene that contains 9 different kinds of trees placed in a row. We placed 5 rows of the trees in the scene and moved the camera from near to far. These two videos will show the results of Nanite with area preservation, the upper one, and BroadLeaf at the bottom of this scene.
As we can see in the videos, our method can preserve the shape of trees better, especially for the trees that originally have sparse foliage.
Conclusion
There are a few places in our method that can be improved or worth investigating more thoroughly:
- The most straightforward one is the improvement of LOD mesh generation, especially for higher LOD-level low-poly meshes. We want to use fewer elements while keeping the smooth LOD transition so that the rendering can be faster.
- BroadLeaf is based on rasterization, and it handles plants with leaf-reference structures well. But it may not work efficiently on non-leaf-reference plants like grass that are usually represented by implicit surface or imposters, or banana trees whose leaves are large and rugged.
My collaborators Yuewei Shao and Jinzhao Zhan are working on the follow-up work of BroadLeaf based on ray tracing in progress at Tencent. This is an interesting direction to think about to solve the foliage rendering problem using ray tracing. This could give a better visual effect with similar efficiency. There are a few advantages:
- The ray tracing-based method does not need to generate LODs.
- We don’t need to worry about the resolution or the edge softness of the shadow maps.
- The cost of ray tracing won’t double like rasterization-based methods as the complexity of the scene doubles.
We hope to see it in the near future.
As for myself, I joined another team at Tencent America after the project BroadLeaf. I’m still working on geometry-processing-related research and have begun to explore the direction of geometry-based machine learning.