Saved in:
| Main Authors: | Jia, Ding, Guo, Jianyuan, Han, Kai, Wu, Han, Zhang, Chao, Xu, Chang, Chen, Xinghao |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.01210 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
by: Hao, Zhiwei, et al.
Published: (2024)
by: Hao, Zhiwei, et al.
Published: (2024)
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
by: Guo, Jianyuan, et al.
Published: (2024)
by: Guo, Jianyuan, et al.
Published: (2024)
Data-efficient Large Vision Models through Sequential Autoregression
by: Guo, Jianyuan, et al.
Published: (2024)
by: Guo, Jianyuan, et al.
Published: (2024)
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
by: Han, Xiaofeng, et al.
Published: (2025)
by: Han, Xiaofeng, et al.
Published: (2025)
FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain
by: Wang, YuAn, et al.
Published: (2025)
by: Wang, YuAn, et al.
Published: (2025)
Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning
by: Jie, Shibo, et al.
Published: (2024)
by: Jie, Shibo, et al.
Published: (2024)
MM-DETR: An Efficient Multimodal Detection Transformer with Mamba-Driven Dual-Granularity Fusion and Frequency-Aware Modality Adapters
by: Han, Jianhong, et al.
Published: (2025)
by: Han, Jianhong, et al.
Published: (2025)
PPT: Token Pruning and Pooling for Efficient Vision Transformers
by: Wu, Xinjian, et al.
Published: (2023)
by: Wu, Xinjian, et al.
Published: (2023)
Enhanced Multimodal Hate Video Detection via Channel-wise and Modality-wise Fusion
by: Zhang, Yinghui, et al.
Published: (2025)
by: Zhang, Yinghui, et al.
Published: (2025)
Block-based Symmetric Pruning and Fusion for Efficient Vision Transformers
by: Hsieh, Yi-Kuan, et al.
Published: (2025)
by: Hsieh, Yi-Kuan, et al.
Published: (2025)
ParameterNet: Parameters Are All You Need
by: Han, Kai, et al.
Published: (2023)
by: Han, Kai, et al.
Published: (2023)
YOLOMG: Vision-based Drone-to-Drone Detection with Appearance and Pixel-Level Motion Fusion
by: Guo, Hanqing, et al.
Published: (2025)
by: Guo, Hanqing, et al.
Published: (2025)
Beyond Simple Fusion: Adaptive Gated Fusion for Robust Multimodal Sentiment Analysis
by: Wu, Han, et al.
Published: (2025)
by: Wu, Han, et al.
Published: (2025)
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
by: Shi, Liangtao, et al.
Published: (2025)
by: Shi, Liangtao, et al.
Published: (2025)
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
by: Pu, Yifan, et al.
Published: (2024)
by: Pu, Yifan, et al.
Published: (2024)
TinySAM 2: Extreme Memory Compression for Efficient Track Anything Model
by: Ding, Zhaoyuan, et al.
Published: (2026)
by: Ding, Zhaoyuan, et al.
Published: (2026)
PromptFusion: Decoupling Stability and Plasticity for Continual Learning
by: Chen, Haoran, et al.
Published: (2023)
by: Chen, Haoran, et al.
Published: (2023)
A Multimodal Fusion Network For Student Emotion Recognition Based on Transformer and Tensor Product
by: Xiang, Ao, et al.
Published: (2024)
by: Xiang, Ao, et al.
Published: (2024)
Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs
by: Han, Kai, et al.
Published: (2024)
by: Han, Kai, et al.
Published: (2024)
MMMamba: A Versatile Cross-Modal In Context Fusion Framework for Pan-Sharpening and Zero-Shot Image Enhancement
by: Wang, Yingying, et al.
Published: (2025)
by: Wang, Yingying, et al.
Published: (2025)
Multispectral Detection Transformer with Infrared-Centric Feature Fusion
by: Hwang, Seongmin, et al.
Published: (2025)
by: Hwang, Seongmin, et al.
Published: (2025)
Rethinking Early-Fusion Strategies for Improved Multimodal Image Segmentation
by: Shen, Zhengwen, et al.
Published: (2025)
by: Shen, Zhengwen, et al.
Published: (2025)
Self-supervised Multiplex Consensus Mamba for General Image Fusion
by: Wang, Yingying, et al.
Published: (2025)
by: Wang, Yingying, et al.
Published: (2025)
PPE: Positional Preservation Embedding for Token Compression in Multimodal Large Language Models
by: Huang, Mouxiao, et al.
Published: (2025)
by: Huang, Mouxiao, et al.
Published: (2025)
EfficienT-HDR: An Efficient Transformer-Based Framework via Multi-Exposure Fusion for HDR Reconstruction
by: Huang, Yu-Shen, et al.
Published: (2025)
by: Huang, Yu-Shen, et al.
Published: (2025)
Degradation-Robust Fusion: An Efficient Degradation-Aware Diffusion Framework for Multimodal Image Fusion in Arbitrary Degradation Scenarios
by: Shi, Yu, et al.
Published: (2026)
by: Shi, Yu, et al.
Published: (2026)
Unified Multimodal Coherent Field: Synchronous Semantic-Spatial-Vision Fusion for Brain Tumor Segmentation
by: Zhang, Mingda, et al.
Published: (2025)
by: Zhang, Mingda, et al.
Published: (2025)
PA-Attack: Guiding Gray-Box Attacks on LVLM Vision Encoders with Prototypes and Attention
by: Mei, Hefei, et al.
Published: (2026)
by: Mei, Hefei, et al.
Published: (2026)
Post-Training Quantization for Diffusion Transformer via Hierarchical Timestep Grouping
by: Ding, Ning, et al.
Published: (2025)
by: Ding, Ning, et al.
Published: (2025)
LIDAR: Lightweight Adaptive Cue-Aware Fusion Vision Mamba for Multimodal Segmentation of Structural Cracks
by: Liu, Hui, et al.
Published: (2025)
by: Liu, Hui, et al.
Published: (2025)
Unbiased Dynamic Multimodal Fusion
by: Wei, Shicai, et al.
Published: (2026)
by: Wei, Shicai, et al.
Published: (2026)
ExFusion: Efficient Transformer Training via Multi-Experts Fusion
by: Ruan, Jiacheng, et al.
Published: (2026)
by: Ruan, Jiacheng, et al.
Published: (2026)
Fusion of regional and sparse attention in Vision Transformers
by: Ibtehaz, Nabil, et al.
Published: (2024)
by: Ibtehaz, Nabil, et al.
Published: (2024)
Bridging Pixels and Words: Mask-Aware Local Semantic Fusion for Multimodal Media Verification
by: Chen, Zizhao, et al.
Published: (2026)
by: Chen, Zizhao, et al.
Published: (2026)
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
by: Li, Muyang, et al.
Published: (2024)
by: Li, Muyang, et al.
Published: (2024)
ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer Acceleration
by: Huang, Ning-Chi, et al.
Published: (2024)
by: Huang, Ning-Chi, et al.
Published: (2024)
SSA-Seg: Semantic and Spatial Adaptive Pixel-level Classifier for Semantic Segmentation
by: Ma, Xiaowen, et al.
Published: (2024)
by: Ma, Xiaowen, et al.
Published: (2024)
Efficient Hybrid Zoom using Camera Fusion on Mobile Phones
by: Wu, Xiaotong, et al.
Published: (2024)
by: Wu, Xiaotong, et al.
Published: (2024)
SMFusion: Semantic-Preserving Fusion of Multimodal Medical Images for Enhanced Clinical Diagnosis
by: Xiang, Haozhe, et al.
Published: (2025)
by: Xiang, Haozhe, et al.
Published: (2025)
CaReFlow: Cyclic Adaptive Rectified Flow for Multimodal Fusion
by: Mai, Sijie, et al.
Published: (2026)
by: Mai, Sijie, et al.
Published: (2026)
Similar Items
-
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
by: Hao, Zhiwei, et al.
Published: (2024) -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
by: Guo, Jianyuan, et al.
Published: (2024) -
Data-efficient Large Vision Models through Sequential Autoregression
by: Guo, Jianyuan, et al.
Published: (2024) -
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
by: Han, Xiaofeng, et al.
Published: (2025) -
FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain
by: Wang, YuAn, et al.
Published: (2025)