# Objective
Tried reading the custom_phase_item docs and it took me a solid 10s to
figure out what it was saying. I think the monospace-ness + no
whitespace + em dash just tripped me up
## Solution
Just add whitespace around it.
## Testing
Eyeballs perceived the subpixel rendered font and gave more dopamine
with a lack of subpixels between characters.
# Objective
Resolves#21902.
## Solution
This PR adopts a relatively transparent approach to reduce the GPU
vertex buffer size. On CPU-side mesh can still use uncompressed Float32
data, and users are not required to insert compressed vertex formats.
The vertex data is automatically processed into
lower-precision/octahedral encoded data when uploading to the GPU.
To enable vertex attribute compression, just set the
`attribute_compression` field of Mesh, or set
`mesh_attribute_compression` of GltfLoaderSettings. If enabled, normal
and tangent will be octahedral encoded Unorm16x2, uv0, uv1, joint weight
and color will be corresponding Unorm16 or Float16. I also provide
Unorm8x4 for vertex color if hdr isn't needed.
Update 2026-2-16
Removed previous approach that automatically compresses vertex buffer
according to flags when uploading to GPU. Instead, I added
`compressed_mesh` method to Mesh to construct compressed Mesh ahead of
time. GltfLoader can also opt-in mesh compressing when loading. I also
add an option to convert indices to u16, though I believe blender gltf
exporter already uses u16 indices when possible.
## Testing
Run `many_cubes`, `many_foxes`, `many_morph_targets` with
`--vertex-compression` to test 3d.
Run `bevymark` with `sprite_mesh` to test 2d, because `SpriteMesh` uses
compressed quad mesh now.
---------
Co-authored-by: Greeble <166992735+greeble-dev@users.noreply.github.com>
# Objective
In #22966 the semantics of the `add` method on `SortedRenderPhase` was
changed from the items being cleared at the end of the frame, to items
being retained until they are removed. A new `add_transient` method was
added with the old functionality, but this is a big migration hazard
because it's a major semantic change that doesn't give any compiler
errors or warnings.
I discovered this after updating `bevy_vector_shapes` to the RC and
seeing drawings not being cleared properly.
## Solution
Rename `add` to `add_retained` both for clarity and to make old uses not
compile, so the affected users will know to look in the migration guide.
A sentence about this should be added to the migration guide if/when
this is backported to the 0.19 release branch.
## Testing
It compiles (hopefully).
## Objective
Progress #19024 (removing UUID handles). Even if UUID handles end up
sticking around, there's an argument for deprecating
`AssetId::invalid()` in the name of simplicity and robustness.
## Solution
Remove the last remaining case of `AssetId::invalid()` in the
`custom_phase_item` example. The example only used it as a placeholder
value, so `AssetId::default()` is fine.
```diff
Opaque3dBinKey {
- asset_id: AssetId::<Mesh>::invalid().untyped(),
+ asset_id: AssetId::<Mesh>::default().untyped(),
}
```
## Testing
```sh
cargo run --example custom_phase_item
```
# Objective
Fixes#23627.
`MeshPipelineViewLayoutKey` uses too many bindings even if features like
ssr, environment map are unused.
## Solution
Don't pre-allocate every combination that grows exponentially. Instead,
create mesh view bind group layout on demand so that we can add more
view keys to reduce unused bindings.
`MeshPipelineViewLayouts::get_view_layout` will be slower, but I'm not
sure how slow it is. My feeling is that the overhead is not high,
compared to when we clone it before.
## Testing
```
WGPU_SETTINGS_PRIO=webgl2 cargo r --example 3d_scene
cargo r --example ssr --features bluenoise_texture
cargo r --example ssao
cargo r --example irradiance_volumes
```
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: Kevin Chen <chen.kevin.f@gmail.com>
# Objective
Fixes#23781
When Hdr is on, the order of `tonemapping` and
`fullscreen_material_system` is uncertain and the result of main pass
can be cleared. `run_in` `run_before` `run_after` can only specify
system set and can't specify a specific system and them hardcode
`Core3dSystems`.
## Solution
Replace `run_in` `run_before` `run_after` with `fn
schedule_configs(system: ScheduleConfigs<BoxedSystem>) ->
ScheduleConfigs<BoxedSystem>`
## Testing
Run fullscreen_material example with Hdr
PR #22041 made us use the center of the mesh AABB as the pivot point for
transparent sorting. This is fine in and of itself, but the way it was
implemented involved adding a matrix multiplication to
`collect_meshes_for_gpu_building` for all meshes, not just transparent
meshes. It also added several fields to `RenderMeshInstance` and related
instances for each mesh, even opaque ones. Since most meshes are opaque,
and `collect_meshes_for_gpu_building` is a performance-critical system,
this doesn't strike me as the right tradeoff.
This PR moves the calculation of the mesh center to
`queue_material_meshes`, to take place only after a mesh has been deemed
to be transparent. Not only does this make
`collect_meshes_for_gpu_building` faster, but it also allows us to
remove the various `center` fields, which stored redundant information.
Note that this comes with two tradeoffs:
1. The transparent sorting no longer takes a custom `Aabb` component on
the mesh into account. I doubt anybody was relying on this behavior.
2. We do have to calculate the AABB for the mesh when importing it to
the render world for the first time.
On `bevy_city --size 90 --no-cpu-culling`, this reduces the time spent
in `collect_meshes_for_gpu_building` after the loading screen from mean
84.85 ms, median 73.4 ms to mean 70.62 ms, median 72.5 ms.
Before this PR:
<img width="2756" height="1800" alt="Screenshot 2026-04-07 173043"
src="https://github.com/user-attachments/assets/3157b58c-b4f1-43db-8157-390e5c9c6ff0"
/>
After this PR:
<img width="2756" height="1800" alt="Screenshot 2026-04-07 172952"
src="https://github.com/user-attachments/assets/fda7100b-7695-4226-99e6-71b4c168f980"
/>
# Objective
According to #18900 , rename `MeshPipelineSet` to `MeshPipelineSystems`
for a consistent naming convention.
`MeshPipelineSet` was introduced in this cycle, so no migration guide
required.
## Testing
```
cargo run --example specialized_mesh_pipeline
```
# Objective
- Clean up our texture format handling.
- Fix#23732
- Get us closer to #22563
- It makes no sense for views to talk about being hdr or not. They have
a texture format, that's it. What does HDR shadows even mean lol
- Same for compositing_space
## Solution
- Remove ExtractedView::hdr
- Add ExtractedView::texture_format
- Move ExtractedView::compositing_space to
ExtractedCamera::compositing_space
- Add texture_format to a bunch of specialization keys instead of hdr
bool
- Convert VolumetricFogPipelineKey to not use flags and just use bool
and texture format
- Remove BevyDefault TextureFormat
- Remove ViewTarget TEXTURE_FORMAT_HDR
## Testing
- Pretty extensively test at this point
This has a migration guide.
---------
Co-authored-by: Willow Black <wmcblack@gmail.com>
Co-authored-by: Máté Homolya <mate.homolya@gmail.com>
Co-authored-by: Luo Zhihao <luo_zhihao@outlook.com>
Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
# Objective
Fixes#22863. Fixes#22999. Fixes#23142.
## Solution
1. Re-add `Out` type to `ExtractComponent`. Rename `SyncComponent::Out`
to `SyncComponent::Target`, which is used to clean up components when
removing, so that the output of extraction can be different from
`SyncComponent::Target`.
2. Add `#[extract_component_sync_target]` attribute for
`#[derive(ExtractComponent)]` to specify `SyncComponent::Target`.
## Testing
Tested `ssr` and `order_independent_transparency` examples.
There may be other places where derived components need to be cleaned
up, but finding them is somewhat challenging.
wgpu update for v29.
I have tested on macos m1, m5, and windows. Linux testing is
appreciated.
- [x] before merge, naga_oil and dlss_wgpu need to be published, and the
patches referencing their respective PRs removed from the workspace
Cargo.toml
##### other PRs
- naga_oil: https://github.com/bevyengine/naga_oil/pull/134
- dlss_wgpu: https://github.com/bevyengine/dlss_wgpu/pull/27
##### Source of relevant changes
- `Dx12Compiler::DynamicDxc` no longer has `max_shader_model`
- https://github.com/gfx-rs/wgpu/pull/8607
- `Dx12BackendOptions::force_shader_model` comes from:
- https://github.com/gfx-rs/wgpu/pull/8984
- Allow optional `RawDisplayHandle` in `InstanceDescriptor`
- https://github.com/gfx-rs/wgpu/pull/8012
- Add `GlDebugFns` option to disable OpenGL debug functions
- https://github.com/gfx-rs/wgpu/pull/8931
- Add a DX12 backend option to force a certain shader model
- https://github.com/gfx-rs/wgpu/pull/8984
- Migrate validation from maxInterStageShaderComponents to
maxInterStageShaderVariables
- https://github.com/gfx-rs/wgpu/pull/8652
- gaps are now supported in bind group layouts
- https://github.com/gfx-rs/wgpu/pull/9034
- depth validation changed to option to match spec
- https://github.com/gfx-rs/wgpu/pull/8840
- SHADER_PRIMITIVE_INDEX is now PRIMITIVE_INDEX
- https://github.com/gfx-rs/wgpu/pull/9101
- Support for binding arrays of RT acceleration structures
- https://github.com/gfx-rs/wgpu/pull/8923
- Make HasDisplayHandle optional in WindowHandle
- https://github.com/gfx-rs/wgpu/pull/8782
- `QueueWriteBufferView` can no longer be dereferenced to `&mut [u8]`,
so use `WriteOnly`.
- https://github.com/gfx-rs/wgpu/pull/9042
- ~bevy_mesh currently has an added dependency on `wgpu`, can we move
`WriteOnly` to wgpu-types?~ (it is in wgpu-types now)
- Change max_*_buffer_binding_size type to match WebGPU spec (u32 ->
u64)
- https://github.com/gfx-rs/wgpu/pull/9146
- raw vulkan init `open_with_callback` takes Limits as argument now
- https://github.com/gfx-rs/wgpu/pull/8756
## Known Issues
There is currently one known issue with occlusion culling on macos,
which we've decided to disable on macos by checking the limits we
actually require. This makes it so that if wgpu releases a patch fix,
bevy 0.19 users will benefit from occlusion culling re-enabling for
them.
<details><summary>More details</summary>
On macos, the wpgu limits were changed to align with the spec and now
put the early and late GPU occlusion culling `StorageBuffer` limit at 8,
but we currently use 9. [Filed in wgpu
repo](https://github.com/gfx-rs/wgpu/issues/9287)
```
2026-03-19T01:37:10.771117Z ERROR bevy_render::error_handler: Caught rendering error: Validation Error
Caused by:
In Device::create_bind_group_layout, label = 'build mesh uniforms GPU late occlusion culling bind group layout'
Too many bindings of type StorageBuffers in Stage ShaderStages(COMPUTE), limit is 8, count was 9. Check the limit `max_storage_buffers_per_shader_stage` passed to `Adapter::request_device`
```
</details>
solari working on wgpu 29:
<img width="1282" height="752" alt="image"
src="https://github.com/user-attachments/assets/4744faec-65c0-4a72-93e1-34a721fc26d8"
/>
---------
Co-authored-by: atlv <email@atlasdostal.com>
# Objective
Improve the fullscreen material example by making the aberration
intensity oscillate, as well as making it work on WebGL2
## Solution
- Make the chromatic aberration intensity oscillate using the sin of the
elapsed time
- Add padding to the `FullscreenEffect` struct on WebGL2
## Testing
I tested the example locally on firefox (nightly) with WebGL2
## Showcase
A short gif of the example running on WebGL2

# Objective
Depends on #22187. Fixes#17794. ~For platform consistency I think it’s
reasonable to enable primitive restart by default.~ wgpu will force
primitive restart after https://github.com/gfx-rs/wgpu/pull/8850.
## Solution
Add index format to MeshPipelineKey, replace
`MeshPipelineKey::from_primitive_topology` with
`MeshPipelineKey::from_primitive_topology_and_index`, and enable
`strip_index_format` in render pipeline.
## Testing
I modified the `lines` example to demonstrate primitive restart.
## Showcase
<details>
<summary>Click to view showcase</summary>
<img width="1550" height="852" alt="屏幕截图_20251218_210849"
src="https://github.com/user-attachments/assets/a7c41943-f22b-415a-8132-98455f21735d"
/>
</details>
Currently, Bevy handles meshes tagged with `NoCpuCulling` by simply
always adding them to `VisibleEntities`. This is, however, inefficient,
because `VisibleEntities` has to be repopulated with the entity IDs of
such meshes every frame and copied to the render world. When scaling to
millions of entities, this becomes a significant bottleneck.
This PR changes the visibility systems to ignore meshes with
`NoCpuCulling` entirely. Instead of being added to `VisibleEntities`,
the mesh extraction systems instead use standard ECS queries to iterate
over meshes with `NoCpuCulling` directly, in addition to any entities in
`VisibleEntities` that use CPU culling. For efficiency,
`RenderVisibleMeshEntities` now tracks mesh instances that are subject
to CPU culling and those that opted out of CPU culling in two separate
data structures. Note that this required changing the signatures of
`DirtySpecializations` methods to return a tuple of references instead
of a reference to a tuple. Although that change looks complicated, it's
actually just reshuffling to accommodate this slight type change, not a
change in logic.
On `bevy_city`, `check_visibility` takes a median of 1.19 ms, and
`check_dir_light_mesh_visibility` takes a median of 4.33 ms. With this
patch, these systems entirely disappear if `NoCpuCulling` is added to
every mesh.
<img width="2756" height="1800" alt="Screenshot 2026-02-22 020929"
src="https://github.com/user-attachments/assets/18048399-bcfd-4165-8491-8d126d73534e"
/>
# Objective
- Fix startup crash on 3d_shapes example and anything that uses
wireframes
- Make sure extra_buffer_usages set arent lost on recovery
## Solution
- Move extra_buffer_usages to MeshAllocatorSettings
## Testing
- 3d_example doesnt crash
# Objective
- Resolve most of the warnings in the Build Docs step of Deploy Docs,
you can see them here:
https://github.com/bevyengine/bevy/actions/runs/23021953246/job/66860356132
## Solution
- Resolve most of the warnings
- The `doc_cfg` feature should only be enabled with `docsrs`, not
`docsrs_dep` (I just followed this pattern from other crates tbh like
`bevy_math` and `bevy_material`)
- I unlinked example docs that references non public items within the
example itself
- I corrected some links
Note: I didn’t fix the warnings concerning the macros in bevy-reflect
for `tuple.rs` because I’m not macro savvy. If someone knows what to do
in those cases (should I just remove the `$(#[$meta])*` lines cause
they’re not in use?), just let me know and I can do it (or you can open
a pull!)
---------
Co-authored-by: François Mockers <francois.mockers@vleue.com>
Even though sorted render phases use multi-draw indirect and GPU
preprocessing, they only draw one mesh at a time. This patch fixes that
by introducing the notion of *batch set* comparisons, which allow
batches to be grouped into sets that can be multi-drawn together. This
mirrors the way that binned render phases use two-level bins, with outer
batch sets containing inner batches. Sorting sorts first on batch set
key, then on batch key. Then, as we generate batches, we track batch
sets in addition to batches and generate multi-draw commands as we go.
This PR accordingly modifies the definition of `GetBatchData` to support
separate comparison keys for batch sets and batch data and is therefore
technically a breaking change.
Right now, Bevy can't batch meshes with morph targets together, because
the morph targets are packed into a morph texture, which is
non-bindless. To fix this, this PR adds support for batching morph
targets together on platforms with storage buffers. Morph displacements
are allocated using the mesh allocator, just like vertex and index
buffers are.
This PR also improves the API for supplying morph targets to a mesh.
Today, the application must create a `MorphTargetImage` explicitly to
store the morph targets, which is cumbersome. This patch changes the
`Mesh` API to instead take morph targets as a flat vector. Internally,
if the platform doesn't support storage buffers, the morph targets are
converted to a morph target image; if the platform does support storage
buffers, however, the morph targets are packed in the mesh allocator.
This patch is a prerequisite for skin caching, because skin caching also
applies to morph targets, and skin caching wants to skin many meshes at
a time. Using a morph target image would either require batch breaking
logic or bindless, neither of which are desirable for a feature that be
simple and work on WebGPU, so I opted to make morph targets batchable
instead.
On the `many_morph_targets` example, I went from 5.55 ms/frame to 2.80
ms/frame, a 1.98x speedup.
---------
Co-authored-by: Greeble <166992735+greeble-dev@users.noreply.github.com>
# Objective
- #22443 broke wireframe and some examples
## Solution
- Fix them by having the systems run after `MeshPipelineSet`
- Also add a migration guide
## Testing
- run the examples modified or anything using wireframe
Extracting meshes to the render world is done in two phases: first, Bevy
does *extraction*, which pulls information from the main world ECS to
thread-local buffers in the render world; second, Bevy does
*collection*, which processes those buffers in parallel to update the
GPU buffers and other information. Unfortunately, the
`RenderMeshInstances` buffer that contains information that the CPU and
GPU need to know to draw the meshes properly currently isn't
thread-safe. Therefore, the parallel worker threads send completed data
through a channel to a single consumer thread, which updates the
`RenderMeshInstances` tables. This is a sequential bottleneck on large
scenes (especially since `RenderMeshInstance` is bigger than it ought to
be).
This PR tackles the problem directly by making `RenderMeshInstances`
partially thread-safe and allowing the worker threads to update mesh
instance data directly, via shared memory. Note the use of "partially":
the `RenderMeshInstances` buffer can still only grow to accommodate
*new* meshes on a single thread with this patch. However, *existing*
meshes can, with one exception (changing render layers), can be updated
directly.
The thread safety is accomplished via a new trait, `AtomicPod`, and a
new buffer type, `AtomicBufferVec`. `AtomicPod` is a trait that
describes a type that can be bitcast onto an array of `AtomicU32`s,
known as the *blob* type. With only a single exception, this is
implemented entirely in safe code, via `bytemuck`. This patch introduces
a new helper macro, `impl_atomic_pod!`, that automates the
implementation of `AtomicPod` types and provides accessors and mutators
so that the blob type feels like the original POD type as much as
possible. The single use of unsafe code is to upload the blob of
`AtomicU32`s to the GPU, as no safe method can currently cast an
`AtomicU32` array to a `u8` array to pass to `write_buffer`. The actual
*conversion* between the POD type and the blob type is entirely safe
code that's automatically generated.
Note that on x86-64 and AArch64, a relaxed load and store to an
`AtomicU32` location produces the exact same machine code as a regular
load and store to a memory location.
The downside of this PR, besides the minor inconvenience of accessing
`RenderMeshInstances` through helper methods, is that *new* mesh
instances can't be added to the `RenderMeshInstances` table while the
workers are running anymore, only afterward. This could regress the
performance in cases in which many new objects are queued. I believe
this is an easily worthwhile tradeoff. However, we could improve the
situation via heuristics (e.g. detecting the number of meshes that
became visible all at once) in the future if we wanted to.
On `bevy_city`, this PR increases the performance of
`collect_meshes_for_gpu_building` from 7.74 ms to 4.03 ms, a 1.92x
speedup. Most importantly, the sequential part of
`collect_meshes_for_gpu_building` is entirely eliminated.
Performance of a `bevy_city` frame with #22966 applied:
<img width="2756" height="1800" alt="Screenshot 2026-02-16 202420"
src="https://github.com/user-attachments/assets/a61b19c8-df98-4d8d-8cfc-4647ccf9990f"
/>
Notice the sequential mesh collection bottleneck.
Before this PR and after:
<img width="2756" height="1800" alt="Screenshot 2026-02-16 195459"
src="https://github.com/user-attachments/assets/91595254-3506-4e6e-9fb8-af1827b3c970"
/>
---------
Co-authored-by: charlotte <charlotte.c.mcelwain@gmail.com>
Right now, every frame, all specialization and queuing systems iterate
over all entities visible from a view and check to see whether they need
to be updated by consulting a set of change ticks and comparing them to
the current change ticks. To handle cases in which a mesh needs to be
removed from the bins, a separate final *sweep* pass then finds entities
that no longer exist and removes them manually from the bins. This
process is complex, error-prone, and slow, as it involves visiting all
visible entities multiple times every frame.
This PR changes the setup so that, instead of examining change ticks,
the visibility logic pushes the set of added and removed entities to
each view explicitly. The visibility system determines which meshes need
to be added and removed by first sorting the list of visible entities,
then performing an O(n) diff process on the last frame's visible
entities and this frame's visible entity list. The end result is that
the specialization and queuing systems only process the entities that
they need to every frame. If a mesh was visible last frame, remained
visible this frame, and didn't change its mesh or material, then it's
generally not examined at all. Not only is this significantly faster for
virtually all realistic scenes, but it's also much simpler.
In order to achieve the benefits of not examining every visible mesh
every frame, I made sorted render passes retained via an `IndexMap`.
This allows entities to be removed and added via random access while
still allowing the list to be sorted by distance. Note that I had to
remove the radix sort because `IndexMap` doesn't currently support that;
I believe the enormous speed benefits of this patch outweigh any minor
sorting regressions from this.
I tested this PR by running `scene_viewer` on a test scene with many
meshes and materials and implementing a material shuffler that randomly
switches the materials around. I tested the following cases:
* Moving the camera so that meshes become visible and invisible.
* Switching opaque materials on meshes.
* Moving meshes from opaque to alpha masked and vice versa.
* Moving meshes from binned render passes to sorted render passes (i.e.
transparent).
* All of the above while the meshes were off screen, then moving them on
screen to ensure that the changes took effect.
This PR brings the `specialize_shadows` time on the `bevy_city` demo
from 12.87 ms per frame to 0.1261 ms per frame, a 102x speedup. It
brings the `queue_shadows` time on the same demo from 12.34 ms per frame
to 0.1102 ms, a 111x speedup. Mean frame time goes from 50.16 ms to
23.26 ms, a 2.16x speedup.
`specialize_shadows` in `bevy_city` before and after:
<img width="2756" height="1800" alt="Screenshot 2026-02-14 180313"
src="https://github.com/user-attachments/assets/dbc3c68b-e0ec-424f-8085-87c0f5f41d3f"
/>
`queue_shadows` in `bevy_city` before and after:
<img width="2756" height="1800" alt="Screenshot 2026-02-14 180500"
src="https://github.com/user-attachments/assets/08f8e1bb-6ab4-47da-ae68-a80156d59caa"
/>
Frame graph of `bevy_city` before:
<img width="2756" height="1800" alt="Screenshot 2026-02-12 203324"
src="https://github.com/user-attachments/assets/d0807cee-23a2-4e14-be1a-7466b795ebfa"
/>
Frame graph of `bevy_city` after:
<img width="2756" height="1800" alt="Screenshot 2026-02-14 180506"
src="https://github.com/user-attachments/assets/b22acf0f-a6f9-432b-93d7-f8057c815b05"
/>
# Objective
Fixes#18722, and allows `ExtractComponent` to be used for foreign
types.
## Solution
* Split the `Out` type from `ExtractComponent` to a `SyncComponent`
trait. This allows types to use the synchronization logic without the
extraction logic, and allows `SyncComponentPlugin` to correctly identify
which components should be removed.
* Don't delete the entire entity but only the `Out` components in
`SyncComponentPlugin`/`SyncWorldPlugin`, fixing #18722.
* Add marker types to `ExtractComponent` and `SyncComponent`, allowing
them to be implemented for foreign types outside `bevy_render`.
(Example: `DirectionalLight` is defined in `bevy_light` which doesn't
depend on `bevy_render`, and used by `bevy_pbr`. Without the marker no
crate is allowed to implement the trait.)
During some earlier render crate refactors by @atlv24, some uses of
`ExtractComponent` was converted to manual implementations. I have not
ported these back, that can be done in follow up PRs.
As a follow up it might be interesting to make a derive macro for
`SyncComponent`, and/or update the `ExtractComponent` macro to be able
to customize the behavior around syncing.
## Testing
Ran a bunch of the examples. It would be good to test others, especially
ones that toggle components.
~~A test case is in #22758. If that one gets merged first this PR should
be updated to uncomment the relevant assert.~~ edit: the assert has been
added.
## Objective
- Creating a `MeshPipelineKey` is tricky and can lead to easily
avoidable errors: #21784
- We already cache the `MeshPipelineViewLayoutKey` part of
`MeshPipelineKey` for each view in `ViewKeyCache` with the correct
layout for any draw call that uses `SetMeshViewBindGroup`
- Instruct users to use `ViewKeyCache` to properly setup the pipeline
for any view features.
## Solution
- Reuse `ViewKeyCache` where possible in the engine and examples.
- Attempt at adding documentation for `ViewKeyCache`
---------
Co-authored-by: Levy A. <Levy A>
# render-graph-as-systems
> [!NOTE]
> Remember to check hide whitespace in diff view options when reviewing
this PR
This PR removes the `RenderGraph` in favor of using systems.
## Motivation
The `RenderGraph` API was originally created when the ECS was
significantly more immature. It was also created with the intention of
supporting an input/output based slot system for managing resources that
has never been used. While resource management is an important potential
use of a render graph, current rendering code doesn't make use of any
patterns relating to it.
Since the ECS has improved, the functionality of `Schedule` has
basically become co-extensive with what the `RenderGraph` API is doing,
i.e. ordering bits of system-like logic relative to one another and
executing them in a big chunk. Additionally, while there's still desire
for more advanced techniques like resource management in the graph, it's
desirable to implement those in ECS terms rather than creating more
`RenderGraph` specific abstraction.
In short, this sets us up to iterate on a more ECS based approach, while
deleting ~3k lines of mostly unused code.
## Implementation
At a high level: We use `Schedule` as our "sub-graph." Rather than
running the graph, we run a schedule. Systems can be ordered relative to
one another.
The render system uses a `RenderGraph` schedule to define the "root" of
the graph. `core_pipeline` adds a `camera_driver` system that runs the
per-camera schedules. This top level schedule provides an extension
point for apps that may want to do custom rendering, or non-camera
rendering.
### `CurrentView` / `ViewQuery`
When running schedules per-camera in the `camera_driver` system, we
insert a `CurrentView` resource that's used to mark the currently
iterating view. We also add a new param `ViewQuery` that internally uses
this resource to execute the query and skip the system if it doesn't
match as a convenience.
### `RenderContext`
The `RenderContext` is now a system param that wraps a `Deferred` for
tracking the state of the current command encoder and queued buffers.
### `SystemBuffer`
We use an system buffer impl to track command encoders in the render
context and rely on apply deferred in order to encode them all.
Currently, this encodes them in series. There are likely opportunities
here to make this more efficient.
## Benchmarks
### Bistro
<img width="1635" height="825" alt="Screenshot 2026-01-15 at 7 57 40 PM"
src="https://github.com/user-attachments/assets/8e55a959-89a3-4947-bfc5-c04780f82e7b"
/>
### Caldera
<img width="1631" height="828" alt="Screenshot 2026-01-15 at 8 13 06 PM"
src="https://github.com/user-attachments/assets/e7e8ae0d-41c3-430f-8b4d-9099b3d922a0"
/>
## Future steps
There are a number of exciting potential changes that could follow here:
- We can explore adding something like a read-only schedule to pick up
some more potential parallelism in graph execution.
- We can use more things like run conditions in order to prevent systems
from running at all in the first place.
- We can explore things like automating resource creation via system
params.
## TODO:
- [x] Make sure 100% of everything still works.
- [x] Benchmark to make sure we don't regress performance
- [x] Re-add docs
---------
Co-authored-by: atlas dostal <rodol@rivalrebels.com>
# Objective
Adopt and closes#22665
## Solution
Delete bevy's `Affine3`, create an extension trait for methods create
for old bevy's `Affine3` to be used by glam's `Affine3`, and register
glam's `Affine3` for reflection
## Testing
`cargo run -p ci`
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Upgrade to wgpu 28
> [!important]
> This can't merge until https://github.com/bevyengine/naga_oil/pull/132
does, and the dependency is updated from my fork to the release.
>
> Also requires wgpu 28 in dlss_wgpu:
https://github.com/bevyengine/dlss_wgpu/pull/17
> [!note]
> This does not enable mesh shaders, and neither does the naga_oil PR. I
chose to do an upgrade first, then go back and see about mesh shaders.
Here's a general list of changes and what I did. Commits are grouped by
feature except for the last one, which enabled solari when I ran the
solari example.
## MipmapFilterMode is split from FilterMode
- Split MipmapFilterMode from FilterMode #8314:
https://github.com/gfx-rs/wgpu/pull/8314
solution: implement From for `MipmapFilterMode`/`ImageFilterMode` since
the values are the same. The split was because the spec indicates they
are different types, even though they have the same values.
## Push Constants are now Immediates
- https://github.com/gfx-rs/wgpu/pull/8724
immediate_size is [a
u32](https://docs.rs/wgpu/28.0.0/wgpu/struct.PipelineLayoutDescriptor.html#structfield.immediate_size)
so use that instead of `PushConstantRange`
## Capabilities name changes
- https://github.com/gfx-rs/wgpu/pull/8671
Got new list from
https://github.com/gfx-rs/wgpu/blob/trunk/wgpu-core/src/device/mod.rs#L449
and copied it in.
## subgroup_{min,max}_size moved from Limits to AdapterInfo
- https://github.com/gfx-rs/wgpu/pull/8609
Update the limits to have the fields they have now, mirror the logic
from the other limits calls.
## multiview_mask
- https://github.com/gfx-rs/wgpu/pull/8206
set to None because we don't use it currently. Its vaguely for VR.
## error scope is now a guard
- https://github.com/gfx-rs/wgpu/pull/8685
retain guard and then pop() it later
---
I made one mistake during the PR, thinking set_immediates was going to
be the size of the immediates and not the offset. I'd like reviewers to
take a look at immediates offset and size sites specifically just in
case I missed something.
Here's a bunch of examples running
<img width="1470" height="1040" alt="screenshot-2025-12-24-at-16 47
22@2x"
src="https://github.com/user-attachments/assets/83dcf4c8-69f5-480a-b724-86598530f25a"
/>
<img width="1470" height="1040" alt="screenshot-2025-12-24-at-16 48
44@2x"
src="https://github.com/user-attachments/assets/46d897fa-1ab2-44ef-8055-fe2fce740dbc"
/>
<img width="1470" height="1040" alt="screenshot-2025-12-24-at-16 49
10@2x"
src="https://github.com/user-attachments/assets/6ae7a9bf-0473-4800-8dfc-233a6a41d6df"
/>
<img width="1470" height="1040" alt="screenshot-2025-12-24-at-16 49
31@2x"
src="https://github.com/user-attachments/assets/89f84a26-cfbd-4196-bca8-111c3d20ba7b"
/>
## Known Issues
> [!NOTE]
>
> There are no current known issues. Everything previous has been
solved.
- [x] enable extensions can not be written in naga oil in a way that
allows them to be used in composed modules.
Update: this was fixed by introducing a flag in naga-oil to force
shaders to allow ray queries in composed modules when using bevy_solari.
This is temporary and will be different in WESL-future.
<details><summary>old explanation</summary>
This is blocking solari from working on nvidia (solari runs successfully
on macos m1) because it needs `enable wgpu_ray_query;`. Putting the
declaration before `#define_import_path` means it gets stripped out
(afaict), and putting it after results in
```
error: expected global declaration, but found a global directive
┌─ embedded://bevy_solari/scene/raytracing_scene_bindings.wgsl:3:1
│
3 │ enable wgpu_ray_query;
│ ^^^^^^ written after first global declaration
│
= expected global declaration, but found a global directive
```
</details>
- [x] dlss_wgpu mixes apis which [causes panics
now](https://github.com/bevyengine/dlss_wgpu/pull/17#issuecomment-3690847524).
<details><summary>Previous notes as dlss_wgpu was being fixed
here</summary>
The wgpu release notes don't mention which PR this was introduced in,
only saying:
> Using both the wgpu command encoding APIs and
CommandEncoder::as_hal_mut on the same encoder will now result in a
panic.
It was caused by https://github.com/gfx-rs/wgpu/pull/8373 which claimed
to not know of any use cases
> With record on finish, the actual ordering on the command buffer is
deeply counter intuitive (all as_hal would come first) and I think it
additionally was just flat out broken in some ways
> -
https://discord.com/channels/691052431525675048/743663924229963868/1453786307099758683
Possible path forward is using multiple command buffers:
https://discord.com/channels/691052431525675048/743663924229963868/1453795633503670415
</summary>
---------
Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
Ideally this should be fully defaultable, but is problematic until we
complete the 2d->3d rework where more items can be exported from
`bevy_material`. The temp fix is just to initialize a few more fields.
There are other hacky things we could do to keep the default
initialization working but I'd rather just fix it correctly when we
complete the refactor.
# Objective
People have been asking how to get a compute shader-built mesh into
bevy's "stuff".
Some people want to control the lifetime of the mesh via Handle, and
others don't don't how to set data in bind groups.
## Solution
a new example that shows how to initialize a mesh handle with a
render_world usage mesh, and then put the output of the compute shader
into the mesh_allocator slab for the mesh.
The demo creates a scene with a camera, light, a circular base mesh, and
an empty "cube to be" mesh that is shared by cloning the handle across
two entities. The compute shader then fills in the data directly into
the mesh_allocator slabs for the vertex/index buffers.
If the compute shader failed, there would be no cube meshes showing as
the data would be empty.
## Testing
```
cargo run --example compute_mesh
```
---
## Showcase
<img width="3392" height="2106" alt="screenshot-2025-12-29-at-16 06
48@2x"
src="https://github.com/user-attachments/assets/88d8fed4-e3c1-418e-bb04-6f08d673403a"
/>
# Objective
- extract material infrastructure to be usable for scene description
without a renderer
- rework of #21543 on top of #22408, you can view a clean diff here:
https://github.com/tychedelia/bevy/compare/type-erase-more-materials...atlv24:ad/material2?expand=1
- this is the culmination of numerous crate splits and refactors leading
up to this point, and the another step towards shared 2d and 3d
rendering infrastructure deduplication.
## Solution
- new crate bevy_material with MaterialProperties struct that lets one
define when a material draws, how it behaves, what shaders it uses,
specialization functions, and bind group layouts expected.
## Testing
---------
Co-authored-by: charlotte 🌸 <charlotte.c.mcelwain@gmail.com>
Co-authored-by: Daniel Skates <zeophlite@gmail.com>
# Objective
A user was confused about a crash when modifying the
custom_post_processing example.
They added an Hdr feature, Bloom, and were confused about the crash.
The issue is that when using Hdr cameras, the custom_post_processing
pipeline must use Hdr textures.
Fixes#21516
## Solution
Add note about using Hdr textures if Hdr features are enabled.
Note is added to two locations since most users will be looking at the
camera when adding hdr components; not looking at the colortargetstate.
# Objective
- Users often want to run a fullscreen shader but the current solution
involves copying the custom_post_processing example which is a 350 line
file with a lot of low level wgpu complexity. Users shouldn't have to
deal with that just to make a fullscreen shader
## Solution
- Introduce a new FullscreenMaterial trait and FullscsreenMaterialPlugin
- This new material will run a fullscreen triangle with the specified
shader. It builds on top of the existing FullscreenShader infrastructure
- It lets user customize the node ordering. There's no defaults right
now becausae it's intended as a bit of a primitive plugin. Eventually we
could have some kind of default for custom post processing
## Testing
Made a new fullscreen_material example and made sure it works
## Follow up
Once this is merged there are various things that should be done to
improve it. Add the option to bind the depth texture, offer defaults for
post processing, use a full AsBindGroup, add a way to bind the gbuffer.
---------
Co-authored-by: JMS55 <47158642+JMS55@users.noreply.github.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
# Objective
#20830 created the possibility that we may want to have render targets
that produce a number of outputs, e.g. depth and normals. This is a
first step towards something like that (e.g. a `RendersTo` relation) by
converting `RenderTarget` to be a component. This is also useful for
out-of-tree render targets that may want to do something like
`#[require(RenderTarget::Image)]` once BSN lands.
## Solution
Make it a component.
Transparent and transmissive phases previously used the instance
translation from GlobalTransform as the sort position. This breaks down
when mesh geometry is authored in "world-like" coordinates and the
instance transform is identity or near-identity (common in
building/CAD-style content). In such cases multiple transparent
instances end up with the same translation and produce incorrect draw
order.
This change introduces sorting based on the world-space center of the
mesh bounds instead of the raw translation. The local bounds center is
stored per mesh/instance and transformed by the instance’s world
transform when building sort keys. This adds a small amount of
per-mesh/instance data but produces much more correct transparent and
transmissive rendering in real-world scenes.
# Objective
Currently, transparent and transmissive render phases in Bevy sort
instances using the translation from GlobalTransform. This works only if
the mesh origin is a good proxy for the geometry position. In many
real-world cases (especially CAD/architecture-like content), the mesh
data is authored in "world-like" coordinates and the instance
`Transform` is identity. In such setups, sorting by translation produces
incorrect draw order for transparent/transmissive objects.
I propose switching the sorting key from `GlobalTransform.translation`
to the world-space center of the mesh bounds for each instance.
## Solution
Instead of using `GlobalTransform.translation` as the sort position for
transparent/transmissive phases, use the world-space center of the mesh
bounds:
1. Store the local-space bounds center for each render mesh (e.g. in
something like `RenderMeshInstanceShared` as `center: Vec3` derived from
the mesh `Aabb`).
2. For each instance, compute the world-space center by applying the
instance transform.
3. Use this world-space center as the position for distance / depth
computation in view space when building sort keys for transparent and
transmissive phases.
This way:
- Sorting respects the actual spatial position of the geometry
- Instances with baked-in “world-like” coordinates inside the mesh are
handled correctly
- Draw order for transparent objects becomes much more stable and
visually correct in real scenes
The main trade-offs:
- Adding a Vec3 center in `RenderMeshInstanceShared` (typically +12 or
+16 bytes depending on alignment),
- For each instance, we need to transform the local bounds center into
world space to compute the sort key.
### Alternative approach and its drawbacks
In theory, this could be fixed by **baking** meshes so that:
- The mesh is recentered around its local bounding box center, and
- The instance `Transform` is adjusted to move it back into place.
However, this has several drawbacks:
- Requires modifying vertex data for each mesh (expensive and
error-prone)
- Requires either duplicating meshes or introducing one-off edits, which
is bad for instancing and memory
- Complicates asset workflows (tools, exporters, pipelines)
- Still does not address dynamic or procedurally generated content
In practice, this is not a scalable or convenient solution.
### Secondary issue: unstable ordering when depth is equal
There is another related problem with the current sorting: when two
transparent/transmissive instances end up with the same view-space depth
(for example, their centers project onto the same depth plane), the
resulting draw order becomes unstable. This leads to visible flickering,
because the internal order of `RenderEntity` items is not guaranteed to
be
stable between frames.
In practice this happens quite easily, especially when multiple
transparent instances share the same or very similar sort depth, and
their relative order in the extracted render list can change frame to
frame.
To address this, I suggest extending the sort key with a deterministic
tie-breaker, for example the entity's main index. Conceptually, the sort
key would become:
- primary: view-space depth (or distance),
- secondary: stable per-entity index
This ensures that instances with the same depth keep a consistent draw
order across frames, removing flickering while preserving the intended
depth-based sorting behavior.
## Testing
- Did you test these changes? If so, how?
```sh
cargo run -p ci -- test
cargo run -p ci -- doc
cargo run -p ci -- compile
```
- Are there any parts that need more testing? Not sure
- How can other people (reviewers) test your changes? Is there anything
specific they need to know?
Run this "example"
```rust
use bevy::{
camera_controller::free_camera::{FreeCamera, FreeCameraPlugin},
prelude::*,
};
fn main() {
App::new()
.add_plugins(DefaultPlugins)
.add_plugins(FreeCameraPlugin)
.add_systems(Startup, setup)
.add_systems(Update, view_orient)
.run();
}
fn setup(
mut commands: Commands,
mut meshes: ResMut<Assets<Mesh>>,
mut materials: ResMut<Assets<StandardMaterial>>,
) {
let material = materials.add(StandardMaterial {
base_color: Color::srgb_u8(150, 250, 150).with_alpha(0.7),
alpha_mode: AlphaMode::Blend,
..default()
});
let mesh = Cuboid::new(3., 3., 1.)
.mesh()
.build()
.translated_by(Vec3::new(1.5, 1.5, 0.5));
// Cuboids grids
for k in -1..=0 {
let z_offset = k as f32 * 3.;
for i in 0..3 {
let x_offset = i as f32 * 3.25;
for j in 0..3 {
let y_offset = j as f32 * 3.25;
commands.spawn((
Mesh3d(
meshes.add(
mesh.clone()
.translated_by(Vec3::new(x_offset, y_offset, z_offset)),
),
),
MeshMaterial3d(material.clone()),
));
}
}
}
// Cuboids at the center share the same position and are equidistant from the camera
{
commands.spawn((
Mesh3d(meshes.add(mesh.clone().translated_by(Vec3::new(3.25, 3.25, 3.)))),
MeshMaterial3d(material.clone()),
));
commands.spawn((
Mesh3d(meshes.add(mesh.clone().translated_by(Vec3::new(3.25, 3.25, 3.)))),
MeshMaterial3d(materials.add(StandardMaterial {
base_color: Color::srgb_u8(150, 150, 250).with_alpha(0.6),
alpha_mode: AlphaMode::Blend,
..default()
})),
));
commands.spawn((
Mesh3d(meshes.add(mesh.clone().translated_by(Vec3::new(3.25, 3.25, 3.)))),
MeshMaterial3d(materials.add(StandardMaterial {
base_color: Color::srgb_u8(250, 150, 150).with_alpha(0.5),
alpha_mode: AlphaMode::Blend,
..default()
})),
));
}
commands.spawn((PointLight::default(), Transform::from_xyz(-3., 10., 4.5)));
commands.spawn((
Camera3d::default(),
Transform::from_xyz(-3., 12., 15.).looking_at(Vec3::new(4.75, 4.75, 0.), Vec3::Y),
FreeCamera::default(),
));
commands.spawn((
Node {
position_type: PositionType::Absolute,
padding: UiRect::all(px(10)),
..default()
},
GlobalZIndex(i32::MAX),
children![(
Text::default(),
children![
(TextSpan::new("1 - 3D view\n")),
(TextSpan::new("2 - Front view\n")),
(TextSpan::new("3 - Top view\n")),
(TextSpan::new("4 - Right view\n")),
]
)],
));
}
fn view_orient(
input: Res<ButtonInput<KeyCode>>,
mut camera_xform: Single<&mut Transform, With<Camera>>,
) {
let xform = if input.just_pressed(KeyCode::Digit1) {
Some(Transform::from_xyz(-3., 12., 15.).looking_at(Vec3::new(4.75, 4.75, 0.), Vec3::Y))
} else if input.just_pressed(KeyCode::Digit2) {
Some(Transform::from_xyz(4.75, 4.75, 15.).looking_at(Vec3::new(4.75, 4.75, 0.), Vec3::Y))
} else if input.just_pressed(KeyCode::Digit3) {
Some(Transform::from_xyz(4.75, 18., -1.).looking_at(Vec3::new(4.75, 0., -1.), Vec3::NEG_Z))
} else if input.just_pressed(KeyCode::Digit4) {
Some(Transform::from_xyz(-15., 4.75, -1.).looking_at(Vec3::new(0., 4.75, -1.), Vec3::Y))
} else {
None
};
if let Some(xform) = xform {
camera_xform.set_if_neq(xform);
}
}
```
- If relevant, what platforms did you test these changes on, and are
there any important ones you can't test? MacOS
---
## Showcase
In my tests with building models (windows, glass, etc.), switching from
translation-based sorting to bounds-center-based sorting noticeably
improves the visual result. Transparent surfaces that were previously
fighting or blending incorrectly now render in a much more expected
order.
### Current:
https://youtu.be/WjDjPAoKK6w
### Sort by aabb center:
https://youtu.be/-Sl4GOXp_vQ
### Sort by aabb center + tie breaker:
https://youtu.be/0aQhkSKxECo
---------
Co-authored-by: Volodymyr Enhelhardt <volodymyr.enhelhardt@ambr.net>
# Objective
#19667 introduced a type-erased material system that effectively puts
all material instances through a single cache: during asset preparation
(`ErasedRenderAssetPlugin`), materials are processed into a set of
`MaterialProperties` that contains all the data needed to render them,
and these are all cached and deduplicated as needed.
This allows for maximal flexibility (every single material instance
could have a different ""type"") but complicates the logic and makes
cache keys really big. So, one goal for the material revamp I have (and
I think @tychedelia is on board) is to cache materials at two levels:
material "types" and material "instances", where material types roughly
map to the rust types that currently implement `Material`. Without
getting into implementation details, storing mostly static data separate
from instance data would let us simplify a lot of the logic, while only
requiring a little more work for fully-dynamic use cases.
This PR is a first step in that direction, which stores *all* the
available draw functions in `MaterialProperties`, and pushes the
decision for "what draw function should I use for this material?" to
queue time, where before it was split between there and asset
preparation. This makes the list of available draw functions
instance-independent, and will later allow us to store it with other
"static" material data.
## Solution
- Make draw function labels 1:1 with render phases, and include all of
them in the list in `MaterialProperties`
## Testing
- Ran `3d_scene`
- Ran `manual_material`
# Objective
- This example is intended for advanced users
## Solution
- Move it to the shader_advanced category
## Testing
- I ran the example and it worked
# Objective
- Defer creating `BindGroupLayout` by using a
`BindGroupLayoutDescriptor` and cache the results
- Unblocks `bevy_material` (render-less material definitions)
- Blocked by https://github.com/bevyengine/bevy/pull/21533
## Solution
- Reviewers, look at first commit for mechanism, and following for usage
## Testing
- CI
---------
Co-authored-by: atlas <email@atlasdostal.com>
# Objective
- `offset` argument is misleading as the argument is not actually passed
to wgpu (used only for memoization and logging). `BufferSlice` already
contains an offset.
- wgpu's `set_index_buffer` [sets the offset according to
BufferSlice::offset](https://github.com/gfx-rs/wgpu/blob/e990388af98e4b4dff9f7fcc09a4eb5d2f71d227/wgpu/src/api/render_pass.rs#L98-L105)
- `TrackedRenderPass::set_vertex_buffer` was made aware of slice size
(#14916) but missed `set_index_buffer` counterpart
## Solution
- Removed `offset` argument from `TrackedRenderPass::set_index_buffer`
- Apply fix from #14916 to `TrackedRenderPass::is_index_buffer_set`
- ~~Cleanup code by using the newly added `BufferSlice` getters~~ split
out to https://github.com/bevyengine/bevy/pull/21289
## Testing
- Ran a few examples
It can occasionally be useful to have cameras that *only* render
prepasses such as depth. Other game engines such as Unity support this
feature by allowing a depth-only render target to be assigned to a
camera. Bevy, however, has no easy mechanism for this. (Creating an
`ShadowView` in the render app doesn't work, because various places in
rendering assume that shadow views are associated with lights.)
This patch fixes the problem by introducing a new type of
`RenderTarget`, `RenderTarget::None`. Cameras with no render target will
skip the main opaque and transparent render passes, but any prepasses on
such cameras will still occur. Adding a `DepthPrepass` to such a camera
enables depth-only cameras, with maximum efficiency as the fragment
shader won't exist and no color buffer will be bound.
Note that, when no render target is specified, the physical size of the
viewport must be explicitly specified, as Bevy has no other mechanism to
determine it.
A new example, `render_depth_to_texture`, has been added, containing a
rotating cube and a depth-only camera orbiting it. The depth texture
that the camera produces is rendered onto a plane using a custom shader.
(NB: In such scenarios, the depth texture must be copied from the camera
to a custom image due to (a) the `wgpu` limitation that a depth texture
can't be both a render target and bindable as a texture and (b) the fact
that Bevy depth textures are managed by Bevy itself and exposed only to
the render world. The example uses a custom render node to perform the
copy.) The depth-only camera can be moved using the WASD keys.
<img width="2564" height="1500" alt="Screenshot 2025-09-02 080508"
src="https://github.com/user-attachments/assets/415e7f4d-393d-4be3-b569-829c06901078"
/>
# Objective
- prepare to remove bevy_mesh re-export from bevy_render. This will be
done in 0.18, but we might as well prepare for it now.
## Solution
- Add a prelude and use bevy_mesh directly. After this pr and #20471, we
will be ready.
## Testing
- cargo check --examples
# Objective
- Adds some util methods to remove some boilerplate from specializers.
More will probably be added later but `set_target` and `set_layout` will
be the most used I think.
- Note: Specializers can't rely on their input descriptor having a
certain shape, so instead of just `push`ing to each vec, the methods pad
the length of the vec if necessary and set the value directly.
- After migrating a few engine `Specializer`s, `GetBaseDescriptor` &
`SpecializedCache: Resource` both seem like anti-patterns, especially
with dynamic materials on the horizon
- Also removes `user_specializer`s. If anyone needs that functionality
they can easily make a wrapper for it.
## Solution
- Add the things
- Nuke the stuff
- update the migration guide
# Objective
- The example doesn't work on webgl2
- It tries to reimplement logic for batching instead of using existing
abstractions which breaks on webgl2
- It only uses batching
## Solution
- Use existing abstractions and remove code related to batching
- This fixes the webgl2 issue
- It also makes it use multi draw indirect instead of just batching
## Testing
- Tested the example with the bevy cli for webgl and wegpu and also ran
the example locally