Saved in:
| Main Authors: | Ye, Hancheng, Yu, Chong, Ye, Peng, Xia, Renqiu, Tang, Yansong, Lu, Jiwen, Chen, Tao, Zhang, Bo |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.15835 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
by: Cao, Jianjian, et al.
Published: (2024)
by: Cao, Jianjian, et al.
Published: (2024)
StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding
by: Xia, Renqiu, et al.
Published: (2023)
by: Xia, Renqiu, et al.
Published: (2023)
Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
by: Ye, Hancheng, et al.
Published: (2024)
by: Ye, Hancheng, et al.
Published: (2024)
Enhanced Sparsification via Stimulative Training
by: Tang, Shengji, et al.
Published: (2024)
by: Tang, Shengji, et al.
Published: (2024)
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
by: Xia, Renqiu, et al.
Published: (2024)
by: Xia, Renqiu, et al.
Published: (2024)
VoCo-LLaMA: Towards Vision Compression with Large Language Models
by: Ye, Xubing, et al.
Published: (2024)
by: Ye, Xubing, et al.
Published: (2024)
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
by: Xia, Renqiu, et al.
Published: (2024)
by: Xia, Renqiu, et al.
Published: (2024)
Q-VLM: Post-training Quantization for Large Vision-Language Models
by: Wang, Changyuan, et al.
Published: (2024)
by: Wang, Changyuan, et al.
Published: (2024)
Think Twice, Act Once: Token-Aware Compression and Action Reuse for Efficient Inference in Vision-Language-Action Models
by: Tan, Xudong, et al.
Published: (2025)
by: Tan, Xudong, et al.
Published: (2025)
SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations
by: Yan, Xiangchao, et al.
Published: (2023)
by: Yan, Xiangchao, et al.
Published: (2023)
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation
by: Zhang, Chubin, et al.
Published: (2024)
by: Zhang, Chubin, et al.
Published: (2024)
BridgeNet: Comprehensive and Effective Feature Interactions via Bridge Feature for Multi-task Dense Predictions
by: Zhang, Jingdong, et al.
Published: (2023)
by: Zhang, Jingdong, et al.
Published: (2023)
On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets
by: Liao, Ning, et al.
Published: (2023)
by: Liao, Ning, et al.
Published: (2023)
S2HPruner: Soft-to-Hard Distillation Bridges the Discretization Gap in Pruning
by: Lin, Weihao, et al.
Published: (2024)
by: Lin, Weihao, et al.
Published: (2024)
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
by: Lu, Guanxing, et al.
Published: (2024)
by: Lu, Guanxing, et al.
Published: (2024)
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
by: Wang, Jiahui, et al.
Published: (2025)
by: Wang, Jiahui, et al.
Published: (2025)
CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos
by: Zhang, Chubin, et al.
Published: (2026)
by: Zhang, Chubin, et al.
Published: (2026)
Towards Accurate Post-training Quantization for Diffusion Models
by: Wang, Changyuan, et al.
Published: (2023)
by: Wang, Changyuan, et al.
Published: (2023)
FlowIE: Efficient Image Enhancement via Rectified Flow
by: Zhu, Yixuan, et al.
Published: (2024)
by: Zhu, Yixuan, et al.
Published: (2024)
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery
by: Zhu, Yixuan, et al.
Published: (2024)
by: Zhu, Yixuan, et al.
Published: (2024)
Segment Anything with Motion, Geometry, and Semantic Adaptation for Complex Nonlinear Visual Object Tracking
by: Zhu, Deyi, et al.
Published: (2026)
by: Zhu, Deyi, et al.
Published: (2026)
JPEG Compliant Compression for Both Human and Machine, A Report
by: Ye, Linfeng
Published: (2025)
by: Ye, Linfeng
Published: (2025)
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
by: Ye, Xubing, et al.
Published: (2024)
by: Ye, Xubing, et al.
Published: (2024)
GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting
by: Dong, Jiajun, et al.
Published: (2025)
by: Dong, Jiajun, et al.
Published: (2025)
LOGO: A Long-Form Video Dataset for Group Action Quality Assessment
by: Zhang, Shiyi, et al.
Published: (2024)
by: Zhang, Shiyi, et al.
Published: (2024)
Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
by: Bai, Sule, et al.
Published: (2024)
by: Bai, Sule, et al.
Published: (2024)
Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution
by: Li, Zhiheng, et al.
Published: (2024)
by: Li, Zhiheng, et al.
Published: (2024)
Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
by: Zhang, Shiyi, et al.
Published: (2024)
by: Zhang, Shiyi, et al.
Published: (2024)
Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching
by: Wang, Bin, et al.
Published: (2024)
by: Wang, Bin, et al.
Published: (2024)
JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search
by: Zou, Dongyun, et al.
Published: (2026)
by: Zou, Dongyun, et al.
Published: (2026)
Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity
by: Fang, Zhengyao, et al.
Published: (2026)
by: Fang, Zhengyao, et al.
Published: (2026)
DriveVGGT: Calibration-Constrained Visual Geometry Transformers for Multi-Camera Autonomous Driving
by: Jia, Xiaosong, et al.
Published: (2025)
by: Jia, Xiaosong, et al.
Published: (2025)
Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration
by: Zeng, Fanhu, et al.
Published: (2025)
by: Zeng, Fanhu, et al.
Published: (2025)
VPNeXt -- Rethinking Dense Decoding for Plain Vision Transformer
by: Tang, Xikai, et al.
Published: (2025)
by: Tang, Xikai, et al.
Published: (2025)
Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline
by: Zhao, Linqing, et al.
Published: (2025)
by: Zhao, Linqing, et al.
Published: (2025)
OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments
by: Zhang, Chubin, et al.
Published: (2023)
by: Zhang, Chubin, et al.
Published: (2023)
Segment and Caption Anything
by: Huang, Xiaoke, et al.
Published: (2023)
by: Huang, Xiaoke, et al.
Published: (2023)
Test-time Sparsity for Extreme Fast Action Diffusion
by: Ji, Kangye, et al.
Published: (2026)
by: Ji, Kangye, et al.
Published: (2026)
XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression
by: Su, Zunhai, et al.
Published: (2026)
by: Su, Zunhai, et al.
Published: (2026)
XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression
by: Su, Zunhai, et al.
Published: (2026)
by: Su, Zunhai, et al.
Published: (2026)
Similar Items
-
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
by: Cao, Jianjian, et al.
Published: (2024) -
StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding
by: Xia, Renqiu, et al.
Published: (2023) -
Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
by: Ye, Hancheng, et al.
Published: (2024) -
Enhanced Sparsification via Stimulative Training
by: Tang, Shengji, et al.
Published: (2024) -
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
by: Xia, Renqiu, et al.
Published: (2024)