Saved in:
| Main Authors: | Gao, Yu, Huang, Jiancheng, Sun, Xiaopeng, Jie, Zequn, Zhong, Yujie, Ma, Lin |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.03025 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MRStyle: A Unified Framework for Color Style Transfer with Multi-Modality Reference
by: Huang, Jiancheng, et al.
Published: (2024)
by: Huang, Jiancheng, et al.
Published: (2024)
M4V: Multi-Modal Mamba for Text-to-Video Generation
by: Huang, Jiancheng, et al.
Published: (2025)
by: Huang, Jiancheng, et al.
Published: (2025)
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
by: Feng, Chengjian, et al.
Published: (2024)
by: Feng, Chengjian, et al.
Published: (2024)
RFSR: Improving ISR Diffusion Models via Reward Feedback Learning
by: Sun, Xiaopeng, et al.
Published: (2024)
by: Sun, Xiaopeng, et al.
Published: (2024)
MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis
by: Chen, Lei, et al.
Published: (2024)
by: Chen, Lei, et al.
Published: (2024)
VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models
by: Lan, Xiaohan, et al.
Published: (2024)
by: Lan, Xiaohan, et al.
Published: (2024)
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs
by: Chen, Shaoxiang, et al.
Published: (2024)
by: Chen, Shaoxiang, et al.
Published: (2024)
Cross-Modal Attention Calibration for LVLM Hallucination Mitigation
by: Li, Jiaming, et al.
Published: (2025)
by: Li, Jiaming, et al.
Published: (2025)
TASR: Timestep-Aware Diffusion Model for Image Super-Resolution
by: Lin, Qinwei, et al.
Published: (2024)
by: Lin, Qinwei, et al.
Published: (2024)
RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving
by: Huang, Zhijian, et al.
Published: (2024)
by: Huang, Zhijian, et al.
Published: (2024)
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models
by: Chen, Shimin, et al.
Published: (2024)
by: Chen, Shimin, et al.
Published: (2024)
FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction
by: Jiao, Siyu, et al.
Published: (2025)
by: Jiao, Siyu, et al.
Published: (2025)
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
by: Chen, Shimin, et al.
Published: (2024)
by: Chen, Shimin, et al.
Published: (2024)
SpikeMba: Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding
by: Li, Wenrui, et al.
Published: (2024)
by: Li, Wenrui, et al.
Published: (2024)
AP-CAP: Advancing High-Quality Data Synthesis for Animal Pose Estimation via a Controllable Image Generation Pipeline
by: Wang, Lei, et al.
Published: (2025)
by: Wang, Lei, et al.
Published: (2025)
DisTime: Distribution-based Time Representation for Video Large Language Models
by: Zeng, Yingsen, et al.
Published: (2025)
by: Zeng, Yingsen, et al.
Published: (2025)
AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
by: Huang, Duojun, et al.
Published: (2024)
by: Huang, Duojun, et al.
Published: (2024)
ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?
by: Han, Haonan, et al.
Published: (2026)
by: Han, Haonan, et al.
Published: (2026)
STAR: STacked AutoRegressive Scheme for Unified Multimodal Learning
by: Qin, Jie, et al.
Published: (2025)
by: Qin, Jie, et al.
Published: (2025)
MagicFight: Personalized Martial Arts Combat Video Generation
by: Huang, Jiancheng, et al.
Published: (2026)
by: Huang, Jiancheng, et al.
Published: (2026)
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
by: Jiao, Yang, et al.
Published: (2024)
by: Jiao, Yang, et al.
Published: (2024)
Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba
by: Ren, Hongwei, et al.
Published: (2024)
by: Ren, Hongwei, et al.
Published: (2024)
DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation
by: Hu, Jie, et al.
Published: (2026)
by: Hu, Jie, et al.
Published: (2026)
Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM
by: Huang, Yizhou, et al.
Published: (2025)
by: Huang, Yizhou, et al.
Published: (2025)
Vivim: a Video Vision Mamba for Medical Video Segmentation
by: Yang, Yijun, et al.
Published: (2024)
by: Yang, Yijun, et al.
Published: (2024)
Delving Deeper: Hierarchical Visual Perception for Robust Video-Text Retrieval
by: Xie, Zequn, et al.
Published: (2026)
by: Xie, Zequn, et al.
Published: (2026)
Making Large Language Models Better Planners with Reasoning-Decision Alignment
by: Huang, Zhijian, et al.
Published: (2024)
by: Huang, Zhijian, et al.
Published: (2024)
Demystify Mamba in Vision: A Linear Attention Perspective
by: Han, Dongchen, et al.
Published: (2024)
by: Han, Dongchen, et al.
Published: (2024)
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
by: Jiao, Yang, et al.
Published: (2025)
by: Jiao, Yang, et al.
Published: (2025)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization
by: Li, Jinlong, et al.
Published: (2024)
by: Li, Jinlong, et al.
Published: (2024)
Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation
by: Li, Jinlong, et al.
Published: (2025)
by: Li, Jinlong, et al.
Published: (2025)
MultiCrafter: High-Fidelity Multi-Subject Generation via Disentangled Attention and Identity-Aware Preference Alignment
by: Wu, Tao, et al.
Published: (2025)
by: Wu, Tao, et al.
Published: (2025)
SEMA: a Scalable and Efficient Mamba like Attention via Token Localization and Averaging
by: Tran, Nhat Thanh, et al.
Published: (2025)
by: Tran, Nhat Thanh, et al.
Published: (2025)
CONQUER: Context-Aware Representation with Query Enhancement for Text-Based Person Search
by: Xie, Zequn
Published: (2026)
by: Xie, Zequn
Published: (2026)
Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
by: Xia, Yifei, et al.
Published: (2025)
by: Xia, Yifei, et al.
Published: (2025)
FluencyVE: Marrying Temporal-Aware Mamba with Bypass Attention for Video Editing
by: Cai, Mingshu, et al.
Published: (2025)
by: Cai, Mingshu, et al.
Published: (2025)
Q-MambaIR: Accurate Quantized Mamba for Efficient Image Restoration
by: Chen, Yujie, et al.
Published: (2025)
by: Chen, Yujie, et al.
Published: (2025)
Training-free Token Reduction for Vision Mamba
by: Ma, Qiankun, et al.
Published: (2025)
by: Ma, Qiankun, et al.
Published: (2025)
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark
by: Chen, Haoxing, et al.
Published: (2024)
by: Chen, Haoxing, et al.
Published: (2024)
LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba
by: Fu, Yunxiang, et al.
Published: (2024)
by: Fu, Yunxiang, et al.
Published: (2024)
Similar Items
-
MRStyle: A Unified Framework for Color Style Transfer with Multi-Modality Reference
by: Huang, Jiancheng, et al.
Published: (2024) -
M4V: Multi-Modal Mamba for Text-to-Video Generation
by: Huang, Jiancheng, et al.
Published: (2025) -
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
by: Feng, Chengjian, et al.
Published: (2024) -
RFSR: Improving ISR Diffusion Models via Reward Feedback Learning
by: Sun, Xiaopeng, et al.
Published: (2024) -
MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis
by: Chen, Lei, et al.
Published: (2024)