Saved in:
| Main Authors: | Lu, Qiang, Xiu, Waikit, Li, Xiying, Hu, Shenyu, Sun, Shengbo |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.23331 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Traffic-MLLM: Curiosity-Regularized Supervised Learning for Traffic Scenario Case-Based Reasoning
by: Xiu, Waikit, et al.
Published: (2025)
by: Xiu, Waikit, et al.
Published: (2025)
Beyond Pixels: Introducing Geometric-Semantic World Priors for Video-based Embodied Models via Spatio-temporal Alignment
by: Tang, Jinzhou, et al.
Published: (2025)
by: Tang, Jinzhou, et al.
Published: (2025)
LASAR: Towards Spatio-temporal Reasoning with Latent Cognitive Map
by: Tang, Jinzhou, et al.
Published: (2026)
by: Tang, Jinzhou, et al.
Published: (2026)
SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning
by: Chen, Hao, et al.
Published: (2024)
by: Chen, Hao, et al.
Published: (2024)
Vision-Driven 2D Supervised Fine-Tuning Framework for Bird's Eye View Perception
by: He, Lei, et al.
Published: (2024)
by: He, Lei, et al.
Published: (2024)
AWM-Fuse: Multi-Modality Image Fusion for Adverse Weather via Global and Local Text Perception
by: Li, Xilai, et al.
Published: (2025)
by: Li, Xilai, et al.
Published: (2025)
SignBind-LLM: Multi-Stage Modality Fusion for Sign Language Translation
by: Thomas, Marshall, et al.
Published: (2025)
by: Thomas, Marshall, et al.
Published: (2025)
Cross-Modal Consistency Learning for Sign Language Recognition
by: Wu, Kepeng, et al.
Published: (2025)
by: Wu, Kepeng, et al.
Published: (2025)
LuSeg: Efficient Negative and Positive Obstacles Segmentation via Contrast-Driven Multi-Modal Feature Fusion on the Lunar
by: Jiao, Shuaifeng, et al.
Published: (2025)
by: Jiao, Shuaifeng, et al.
Published: (2025)
DIFF-MF: A Difference-Driven Channel-Spatial State Space Model for Multi-Modal Image Fusion
by: Sun, Yiming, et al.
Published: (2026)
by: Sun, Yiming, et al.
Published: (2026)
SparseFusion: Efficient Sparse Multi-Modal Fusion Framework for Long-Range 3D Perception
by: Li, Yiheng, et al.
Published: (2024)
by: Li, Yiheng, et al.
Published: (2024)
Generative Sign-description Prompts with Multi-positive Contrastive Learning for Sign Language Recognition
by: Liang, Siyu, et al.
Published: (2025)
by: Liang, Siyu, et al.
Published: (2025)
Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition
by: Shang, Tianyi, et al.
Published: (2025)
by: Shang, Tianyi, et al.
Published: (2025)
Spatial-Frequency Enhanced Mamba for Multi-Modal Image Fusion
by: Sun, Hui, et al.
Published: (2025)
by: Sun, Hui, et al.
Published: (2025)
Revolutionizing Traffic Sign Recognition: Unveiling the Potential of Vision Transformers
by: Mingwin, Susano, et al.
Published: (2024)
by: Mingwin, Susano, et al.
Published: (2024)
Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving
by: Dai, Yuqi, et al.
Published: (2024)
by: Dai, Yuqi, et al.
Published: (2024)
Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion
by: Cho, Minkyoung, et al.
Published: (2024)
by: Cho, Minkyoung, et al.
Published: (2024)
SignMouth: Leveraging Mouthing Cues for Sign Language Translation by Multimodal Contrastive Fusion
by: Wu, Wenfang, et al.
Published: (2025)
by: Wu, Wenfang, et al.
Published: (2025)
Deep Learning-Based Multi-Modal Fusion for Robust Robot Perception and Navigation
by: Lai, Delun, et al.
Published: (2025)
by: Lai, Delun, et al.
Published: (2025)
Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion
by: Zhu, Yixin, et al.
Published: (2026)
by: Zhu, Yixin, et al.
Published: (2026)
FusionFM: All-in-One Multi-Modal Image Fusion with Flow Matching
by: Zhu, Huayi, et al.
Published: (2025)
by: Zhu, Huayi, et al.
Published: (2025)
Text-Guided Channel Perturbation and Pretrained Knowledge Integration for Unified Multi-Modality Image Fusion
by: Li, Xilai, et al.
Published: (2025)
by: Li, Xilai, et al.
Published: (2025)
InterFusion: Text-Driven Generation of 3D Human-Object Interaction
by: Dai, Sisi, et al.
Published: (2024)
by: Dai, Sisi, et al.
Published: (2024)
Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents
by: Lin, Yiqi, et al.
Published: (2025)
by: Lin, Yiqi, et al.
Published: (2025)
Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction
by: Gu, Yi, et al.
Published: (2025)
by: Gu, Yi, et al.
Published: (2025)
Cross-Modal Prototype Allocation: Unsupervised Slide Representation Learning via Patch-Text Contrast in Computational Pathology
by: Chen, Yuxuan, et al.
Published: (2025)
by: Chen, Yuxuan, et al.
Published: (2025)
Ordinal Scale Traffic Congestion Classification with Multi-Modal Vision-Language and Motion Analysis
by: Lin, Yu-Hsuan
Published: (2025)
by: Lin, Yu-Hsuan
Published: (2025)
FusionSAM: Visual Multi-Modal Learning with Segment Anything
by: Li, Daixun, et al.
Published: (2024)
by: Li, Daixun, et al.
Published: (2024)
$β$-CLIP: Text-Conditioned Contrastive Learning for Multi-Granular Vision-Language Alignment
by: Zohra, Fatimah, et al.
Published: (2025)
by: Zohra, Fatimah, et al.
Published: (2025)
EMDFNet: Efficient Multi-scale and Diverse Feature Network for Traffic Sign Detection
by: Li, Pengyu, et al.
Published: (2024)
by: Li, Pengyu, et al.
Published: (2024)
Text-Driven Diffusion Model for Sign Language Production
by: He, Jiayi, et al.
Published: (2025)
by: He, Jiayi, et al.
Published: (2025)
Multi-granular body modeling with Redundancy-Free Spatiotemporal Fusion for Text-Driven Motion Generation
by: Zhan, Xingzu, et al.
Published: (2025)
by: Zhan, Xingzu, et al.
Published: (2025)
Contrast-X: A Multi-Modal Contrast Image Synthesis Benchmark and Universal Modality Flow Matching
by: Chen, Yifan, et al.
Published: (2026)
by: Chen, Yifan, et al.
Published: (2026)
Multi-View Fusion Neural Network for Traffic Demand Prediction
by: Zhang, Dongran, et al.
Published: (2024)
by: Zhang, Dongran, et al.
Published: (2024)
UTA-Sign: Unsupervised Thermal Video Augmentation via Event-Assisted Traffic Signage Sketching
by: Han, Yuqi, et al.
Published: (2025)
by: Han, Yuqi, et al.
Published: (2025)
MGHFT: Multi-Granularity Hierarchical Fusion Transformer for Cross-Modal Sticker Emotion Recognition
by: Chen, Jian, et al.
Published: (2025)
by: Chen, Jian, et al.
Published: (2025)
CalFuse: Multi-Modal Continual Learning via Feature Calibration and Parameter Fusion
by: Guo, Juncen, et al.
Published: (2025)
by: Guo, Juncen, et al.
Published: (2025)
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models
by: Nair, Nithin Gopalakrishnan, et al.
Published: (2024)
by: Nair, Nithin Gopalakrishnan, et al.
Published: (2024)
Invisible Reflections: Leveraging Infrared Laser Reflections to Target Traffic Sign Perception
by: Sato, Takami, et al.
Published: (2024)
by: Sato, Takami, et al.
Published: (2024)
MDDFNet: Mamba-based Dynamic Dual Fusion Network for Traffic Sign Detection
by: Yu, TianYi
Published: (2025)
by: Yu, TianYi
Published: (2025)
Similar Items
-
Traffic-MLLM: Curiosity-Regularized Supervised Learning for Traffic Scenario Case-Based Reasoning
by: Xiu, Waikit, et al.
Published: (2025) -
Beyond Pixels: Introducing Geometric-Semantic World Priors for Video-based Embodied Models via Spatio-temporal Alignment
by: Tang, Jinzhou, et al.
Published: (2025) -
LASAR: Towards Spatio-temporal Reasoning with Latent Cognitive Map
by: Tang, Jinzhou, et al.
Published: (2026) -
SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning
by: Chen, Hao, et al.
Published: (2024) -
Vision-Driven 2D Supervised Fine-Tuning Framework for Bird's Eye View Perception
by: He, Lei, et al.
Published: (2024)