TL;DR: yes, you still have to care about draw calls even with Nanite (although that video you shared might be a bit overkill, the patterns it demonstrates point you in the right direction)
Why to Reduce “Draw Calls”
Before I get too deep into talking about “draw calls”, I want to clarify that in Nanite Land our goal is to reduce raster and shading bins. Nanite’s VisBuffer pass rasterizes all of geometry in the same raster bin (fixed function, or programmable (Masked materials, WPO, PDO)) with the same shader (descriptor heap + PSO combo) into a single shading bin. What that means for content creators is we’re no longer concerned with unique geometry, only unique shaders.
Rasterization
On the VS side of things, Nanite’s fixed function rasterization for non-deforming, opaque geometry is pretty darn fast. It can rasterize all of that completely static geometry in a single ExecuteIndirect call. To go a little deeper, there are separate fixed function paths for Hardware (triangles > r.Nanite.MaxPixelsPerEdge), and Software (triangles =< r.Nanite.MaxPixelsPerEdge) Rasterization. Within the HW and SW paths, there are separate Fixed Function bins for base, Spline, TwoSided, and Skinned (as well as some combinations of those). The SW path additionally has bins for tessellated geometry.
Shading
On the PS side of things and for the purposes of content creators: “shader” effectively means “Material Instance”. This could be constant (IE a Material Instance asset) or dynamic (MID).
Again, the goal for tech artists is to try and reduce the unique number of both of those bins. On PC especially, because the engine doesn’t yet have the capability to skip empty shading bins.
Empty shading bins? Well, unfortunately the engine has to allocate those bins before they’re rasterized. In order to do that the engine queues up a shading bin for each unique shader in the world. If a bin ends up not rasterizing any pixels, the engine can’t un-allocate that bin. Empty bins still require context switching, so that’s bad! On Gen9 platforms, though, the engine does have a few ways to skip over shaders which don’t end up with any on-screen pixels. Graham Wihlidal talks about that in his GDC2024 presentation Nanite GPU-Driven Materials (Video | Slides).
Worst Practices
So keeping all that in mind, I think it’s important to highlight the absolute worst thing you can do for Nanite rendering. It looks something like this:
In order to improve artist workflows and reduce content browser clutter from Material Instance Constant assets, we expose certain parameters of a material to the content creates at the actor level. In the actor’s construction script, we create Material Instance Dynamics for each material on every mesh component on the actor so that we can update the exposed parameters for the artists.
OH! And since we’re an isometric game, each of our base materials is masked. That way we can fade out the geometry when it’s between the character and the camera! And when you go into a building we animate the roofs popping off so you can see inside with World Position Offset.
Now, instead of at least being able to combine all those unique materials into a single shading bin, this studio has instead broken each single primitive into it’s own raster and shading bin.
A Tangent on Uber Materials
It’s vitally important to remember that a material graph is not one “shader”. It is an input into a system that first translates that graph to HLSL based on the permutations required. You do not save anything by putting your landscape material and your foliage material into the same graph. Even if you don’t hide functionality behind a bunch of static switch parameters, you’re still going to be checking multiple usage flags and setting Material Property Overrides on the MICs for both use cases, and all of those are going to create separate and unique shaders.
(This, of course, doesn’t even get into the permutation explosion that can happen with the existing Material Layers system)
How to Reduce Raster and Shading Bins
That video you shared is wild! It’s decidedly on the farthest end of the “Draw Calls Don’t Matter” to “You Should Only Have One Draw Call”, but from what I can tell the techniques they’re describing are more or less in line with what I was going to suggest anyways.
That said, I think there’s a balance between the effort to set up that level of infrastructure yourself and some reasonable construction techniques. From the texture side of things, I feel like the UDIM approach can be a bit cumbersome and restrictive. I know folks are excited about bindless resources as a potential avenue to be able to be more expressive and less restrictive about content and packing textures into UDIMs. For our current purposes, I’m going to bless separate MICs when you need to change out textures.
For vector and scalar parameters, though, we’ve got three tools in our tool belt we can use so that we’re reducing the number of unique MICs in our project:
- Custom Primitive Data - In your material you can specify a parameter to Use Custom Primitive Data and specify an index into an array of up to 32 floats which are passed in from the component. There’s an inspector panel in the Material editor so you can see which indexes are used throughout the material in order to avoid collisions. As of 5.0, the details panel of the component will show you the different parameters on the mesh and offer up color pickers for vector parameters!
- Per-Instance Custom Data - When you’re using instanced mesh components, you can specify an array of up to 32 floats of per-instance custom data in addition to Custom Primitive Data (IE parameters passed to all instances). This is a separate material expression, though. (Apparently our documentation on this is pretty thin, so I’ve linked the example of it I gave at GDC2022).
- Material Parameter Collections - You can have up to two unique MPCs referenced in a material. The benefit is that you set the values on the MPC itself, rather than in the material. All materials that reference that MPC will get the updated values immediately and automatically! A couple of common use cases here would be things like rain or wind, but there’s a lot of really interesting uses of MPCs once you put your mind to it.
With Nanite, you don’t have the ability to vertex paint per-instance. Storing XX million float3s per instance is downright prohibitive. Instead, as of UE5.5, you can use the new mesh painting mode to paint into what amounts to a UDIM. You can set up your materials with a base layer and a blend layer, then paint the blend layer on a per-instance basis without fear of bloating your memory, or creating separate MICs and bins to different explicit mask textures.
Conclusion
I can sit here all day and say “use CPD and PICD to reduce your MICs”, but I do acknowledge that for larger scale productions there might still be some infrastructure required to make managing and updating that data easier for your artists. There’s nothing built in for “Custom Primitive Data Set Instances” or anything like that. I’ll leave it as an exercise to the reader to determine what’s best for their project.
I also don’t want to sit here and say that everyone should go out and immediately start retooling their material pipeline to try and reduce draw calls. You should still be profiling in order to identify the specific issues your project is facing and focusing your optimization efforts there instead of tilting at windmills because Don Quixote said it was a good idea.
At the end of the day, though, I think it’s important to understand how to build content optimally (as opposed to how to optimize content). If you’re at the stage of setting up material pipelines and defining production practices for a title, now is a great time to consider how you can set yourself up to keep your bin counts low without inflicting a lot of pain and suffering on your artists.
Hope that helps, lemme know if you have any questions!