:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Ji, Zhang, Zifeng, Lu, Mingjie, Wei, Hongyang, Li, Dong, Xie, Yile, Peng, Jinzhang, Tian, Lu, Sirasao, Ashish, Barsoum, Emad
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2404.07821
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Fast Occupancy Network
by: Lu, Mingjie, et al.
Published: (2024)

EGSRAL: An Enhanced 3D Gaussian Splatting based Renderer with Automated Labeling for Large-Scale Driving Scene
by: Huo, Yixiong, et al.
Published: (2024)

DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization
by: Zhu, Haowei, et al.
Published: (2024)

UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer
by: Liu, Ji, et al.
Published: (2024)

Partial Convolution Meets Visual Attention
by: Huang, Haiduo, et al.
Published: (2025)

MonoGS++: Fast and Accurate Monocular RGB Gaussian SLAM
by: Li, Renwu, et al.
Published: (2025)

LADDER: An Efficient Framework for Video Frame Interpolation
by: Shen, Tong, et al.
Published: (2024)

Towards Scale-Aware Full Surround Monodepth with Transformers
by: Yang, Yuchen, et al.
Published: (2024)

DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity
by: Zhu, Haowei, et al.
Published: (2026)

DL-QAT: Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models
by: Ke, Wenjin, et al.
Published: (2025)

E-MMDiT: Revisiting Multimodal Diffusion Transformer Design for Fast Image Synthesis under Limited Resources
by: Shen, Tong, et al.
Published: (2025)

AMD-Hummingbird: Towards an Efficient Text-to-Video Model
by: Isobe, Takashi, et al.
Published: (2025)

Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs
by: Cui, Qinpeng, et al.
Published: (2024)

Edit as You See: Image-guided Video Editing via Masked Motion Modeling
by: Huang, Zhi-Lin, et al.
Published: (2025)

Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
by: Wang, Shuai, et al.
Published: (2025)

ReNeg: Learning Negative Embedding with Reward Guidance
by: Li, Xiaomin, et al.
Published: (2024)

Ego-InBetween: Generating Object State Transitions in Ego-Centric Videos
by: Ge, Mengmeng, et al.
Published: (2026)

Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism
by: Li, Guanchen, et al.
Published: (2024)

SpecVLM: Fast Speculative Decoding in Vision-Language Models
by: Huang, Haiduo, et al.
Published: (2025)

Pause and Think: A Dataset and Benchmark for Video-Grounded Assistive Action Suggestion
by: Singh, Shivam, et al.
Published: (2026)

DiffBench Meets DiffAgent: End-to-End LLM-Driven Diffusion Acceleration Code Generation
by: jiao, Jiajun, et al.
Published: (2026)

CaptionQA: Is Your Caption as Useful as the Image Itself?
by: Yang, Shijia, et al.
Published: (2025)

Sparse-Up: Learnable Sparse Upsampling for 3D Generation with High-Fidelity Textures
by: Xiao, Lu, et al.
Published: (2025)

DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference
by: Singh, Aditya Kumar, et al.
Published: (2026)

SparseFusion: Efficient Sparse Multi-Modal Fusion Framework for Long-Range 3D Perception
by: Li, Yiheng, et al.
Published: (2024)

TAPM-Net: Trajectory-Aware Perturbation Modeling for Infrared Small Target Detection
by: Xie, Hongyang, et al.
Published: (2026)

DRIFT: Transferring Reasoning Priors for Efficient MLLM Fine-Tuning
by: Huang, Chao, et al.
Published: (2025)

ACMo: Attribute Controllable Motion Generation
by: Wei, Mingjie, et al.
Published: (2025)

Teaching CORnet Human fMRI Representations for Enhanced Model-Brain Alignment
by: Lu, Zitong, et al.
Published: (2024)

Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking
by: Zheng, Zirui, et al.
Published: (2025)

Reason-Then-Retrieve for CoVR-R with Structured Edit Prompts and Dense-Sparse Fusion
by: Liu, DongQing, et al.
Published: (2026)

Latent Visual Reasoning
by: Li, Bangzheng, et al.
Published: (2025)

Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation
by: Wang, Ze, et al.
Published: (2025)

ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
by: Guo, Yuxiang, et al.
Published: (2025)

KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation
by: Wang, Xingrui, et al.
Published: (2025)

Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction
by: Yang, Senqiao, et al.
Published: (2023)

Fully Sparse 3D Occupancy Prediction
by: Liu, Haisong, et al.
Published: (2023)

SAMSON: 3rd Place Solution of LSVOS 2025 VOS Challenge
by: Xie, Yujie, et al.
Published: (2025)

MOVi: Training-free Text-conditioned Multi-Object Video Generation
by: Rahman, Aimon, et al.
Published: (2025)

Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation
by: Wei, Mingjie, et al.
Published: (2025)