Saved in:
| Main Authors: | Liu, Ji, Zhang, Zifeng, Lu, Mingjie, Wei, Hongyang, Li, Dong, Xie, Yile, Peng, Jinzhang, Tian, Lu, Sirasao, Ashish, Barsoum, Emad |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.07821 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Fast Occupancy Network
by: Lu, Mingjie, et al.
Published: (2024)
by: Lu, Mingjie, et al.
Published: (2024)
EGSRAL: An Enhanced 3D Gaussian Splatting based Renderer with Automated Labeling for Large-Scale Driving Scene
by: Huo, Yixiong, et al.
Published: (2024)
by: Huo, Yixiong, et al.
Published: (2024)
DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization
by: Zhu, Haowei, et al.
Published: (2024)
by: Zhu, Haowei, et al.
Published: (2024)
UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer
by: Liu, Ji, et al.
Published: (2024)
by: Liu, Ji, et al.
Published: (2024)
Partial Convolution Meets Visual Attention
by: Huang, Haiduo, et al.
Published: (2025)
by: Huang, Haiduo, et al.
Published: (2025)
MonoGS++: Fast and Accurate Monocular RGB Gaussian SLAM
by: Li, Renwu, et al.
Published: (2025)
by: Li, Renwu, et al.
Published: (2025)
LADDER: An Efficient Framework for Video Frame Interpolation
by: Shen, Tong, et al.
Published: (2024)
by: Shen, Tong, et al.
Published: (2024)
Towards Scale-Aware Full Surround Monodepth with Transformers
by: Yang, Yuchen, et al.
Published: (2024)
by: Yang, Yuchen, et al.
Published: (2024)
DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity
by: Zhu, Haowei, et al.
Published: (2026)
by: Zhu, Haowei, et al.
Published: (2026)
DL-QAT: Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models
by: Ke, Wenjin, et al.
Published: (2025)
by: Ke, Wenjin, et al.
Published: (2025)
E-MMDiT: Revisiting Multimodal Diffusion Transformer Design for Fast Image Synthesis under Limited Resources
by: Shen, Tong, et al.
Published: (2025)
by: Shen, Tong, et al.
Published: (2025)
AMD-Hummingbird: Towards an Efficient Text-to-Video Model
by: Isobe, Takashi, et al.
Published: (2025)
by: Isobe, Takashi, et al.
Published: (2025)
Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs
by: Cui, Qinpeng, et al.
Published: (2024)
by: Cui, Qinpeng, et al.
Published: (2024)
Edit as You See: Image-guided Video Editing via Masked Motion Modeling
by: Huang, Zhi-Lin, et al.
Published: (2025)
by: Huang, Zhi-Lin, et al.
Published: (2025)
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
by: Wang, Shuai, et al.
Published: (2025)
by: Wang, Shuai, et al.
Published: (2025)
ReNeg: Learning Negative Embedding with Reward Guidance
by: Li, Xiaomin, et al.
Published: (2024)
by: Li, Xiaomin, et al.
Published: (2024)
Ego-InBetween: Generating Object State Transitions in Ego-Centric Videos
by: Ge, Mengmeng, et al.
Published: (2026)
by: Ge, Mengmeng, et al.
Published: (2026)
Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism
by: Li, Guanchen, et al.
Published: (2024)
by: Li, Guanchen, et al.
Published: (2024)
SpecVLM: Fast Speculative Decoding in Vision-Language Models
by: Huang, Haiduo, et al.
Published: (2025)
by: Huang, Haiduo, et al.
Published: (2025)
Pause and Think: A Dataset and Benchmark for Video-Grounded Assistive Action Suggestion
by: Singh, Shivam, et al.
Published: (2026)
by: Singh, Shivam, et al.
Published: (2026)
DiffBench Meets DiffAgent: End-to-End LLM-Driven Diffusion Acceleration Code Generation
by: jiao, Jiajun, et al.
Published: (2026)
by: jiao, Jiajun, et al.
Published: (2026)
CaptionQA: Is Your Caption as Useful as the Image Itself?
by: Yang, Shijia, et al.
Published: (2025)
by: Yang, Shijia, et al.
Published: (2025)
Sparse-Up: Learnable Sparse Upsampling for 3D Generation with High-Fidelity Textures
by: Xiao, Lu, et al.
Published: (2025)
by: Xiao, Lu, et al.
Published: (2025)
DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference
by: Singh, Aditya Kumar, et al.
Published: (2026)
by: Singh, Aditya Kumar, et al.
Published: (2026)
SparseFusion: Efficient Sparse Multi-Modal Fusion Framework for Long-Range 3D Perception
by: Li, Yiheng, et al.
Published: (2024)
by: Li, Yiheng, et al.
Published: (2024)
TAPM-Net: Trajectory-Aware Perturbation Modeling for Infrared Small Target Detection
by: Xie, Hongyang, et al.
Published: (2026)
by: Xie, Hongyang, et al.
Published: (2026)
DRIFT: Transferring Reasoning Priors for Efficient MLLM Fine-Tuning
by: Huang, Chao, et al.
Published: (2025)
by: Huang, Chao, et al.
Published: (2025)
ACMo: Attribute Controllable Motion Generation
by: Wei, Mingjie, et al.
Published: (2025)
by: Wei, Mingjie, et al.
Published: (2025)
Teaching CORnet Human fMRI Representations for Enhanced Model-Brain Alignment
by: Lu, Zitong, et al.
Published: (2024)
by: Lu, Zitong, et al.
Published: (2024)
Layout-Conditioned Autoregressive Text-to-Image Generation via Structured Masking
by: Zheng, Zirui, et al.
Published: (2025)
by: Zheng, Zirui, et al.
Published: (2025)
Reason-Then-Retrieve for CoVR-R with Structured Edit Prompts and Dense-Sparse Fusion
by: Liu, DongQing, et al.
Published: (2026)
by: Liu, DongQing, et al.
Published: (2026)
Latent Visual Reasoning
by: Li, Bangzheng, et al.
Published: (2025)
by: Li, Bangzheng, et al.
Published: (2025)
Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation
by: Wang, Ze, et al.
Published: (2025)
by: Wang, Ze, et al.
Published: (2025)
ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
by: Guo, Yuxiang, et al.
Published: (2025)
by: Guo, Yuxiang, et al.
Published: (2025)
KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation
by: Wang, Xingrui, et al.
Published: (2025)
by: Wang, Xingrui, et al.
Published: (2025)
Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction
by: Yang, Senqiao, et al.
Published: (2023)
by: Yang, Senqiao, et al.
Published: (2023)
Fully Sparse 3D Occupancy Prediction
by: Liu, Haisong, et al.
Published: (2023)
by: Liu, Haisong, et al.
Published: (2023)
SAMSON: 3rd Place Solution of LSVOS 2025 VOS Challenge
by: Xie, Yujie, et al.
Published: (2025)
by: Xie, Yujie, et al.
Published: (2025)
MOVi: Training-free Text-conditioned Multi-Object Video Generation
by: Rahman, Aimon, et al.
Published: (2025)
by: Rahman, Aimon, et al.
Published: (2025)
Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation
by: Wei, Mingjie, et al.
Published: (2025)
by: Wei, Mingjie, et al.
Published: (2025)
Similar Items
-
Fast Occupancy Network
by: Lu, Mingjie, et al.
Published: (2024) -
EGSRAL: An Enhanced 3D Gaussian Splatting based Renderer with Automated Labeling for Large-Scale Driving Scene
by: Huo, Yixiong, et al.
Published: (2024) -
DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization
by: Zhu, Haowei, et al.
Published: (2024) -
UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer
by: Liu, Ji, et al.
Published: (2024) -
Partial Convolution Meets Visual Attention
by: Huang, Haiduo, et al.
Published: (2025)