
Fusing LiDAR and camera data in Bird's-Eye-View (BEV) space has become a standard paradigm for 3D object detection in autonomous driving, but it still suffers from two fundamental challenges: (1) Modality-Occupancy Imbalance (MOI) arises when dense image features overwhelm sparse LiDAR points, while (2) Semantic-Occupancy Imbalance (SOI) occurs when vast background regions dilute the signals of foreground objects. Existing fusion frameworks lack a unified mechanism to balance both modalities and semantics, resulting in persistent feature imbalance. We propose DB-Fusion, a Dual-Balanced multimodal fusion framework that jointly mitigates both imbalances through a carefully designed pipeline. To address MOI, we first introduce Modality-Balanced Feature Sampling that aligns cross-modal signal density by augmenting LiDAR-guided image sampling with depth-aware pseudo points, and then apply Geometric-based Feature Fusion that adaptively fuses the balanced features using geometry-aware reliability weights. To tackle SOI, Semantic-Balanced Feature Enhancing partitions the fused BEV map into semantic scene classes and performs class-consistent context and instance-scene interactions to suppress background dominance. Extensive experiments on the nuScenes benchmark demon- strate that our DB-Fusion establishes a new state-of-the-art in mAP, surpassing existing multimodal fusion methods.
The 40th Annual AAAI Conference on Artificial Intelligence (AAAI 2026) • 2026
Korea Computer Congress (KCC) • 2025
IEEE/IEIE International Conference on Consumer Electronics Asia (ICCE-Asia) • 2024