:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhang, Yuguang, Fan, Qihang, Huang, Huaibo
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2405.13335
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Advancing Vision Transformer with Enhanced Spatial Priors
von: Fan, Qihang, et al.
Veröffentlicht: (2026)

Lightweight Vision Transformer with Bidirectional Interaction
von: Fan, Qihang, et al.
Veröffentlicht: (2023)

RMT: Retentive Networks Meet Vision Transformers
von: Fan, Qihang, et al.
Veröffentlicht: (2023)

Random Wins All: Rethinking Grouping Strategies for Vision Tokens
von: Fan, Qihang, et al.
Veröffentlicht: (2026)

Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens
von: Fan, Qihang, et al.
Veröffentlicht: (2024)

Breaking the Low-Rank Dilemma of Linear Attention
von: Fan, Qihang, et al.
Veröffentlicht: (2024)

ViTAR: Vision Transformer with Any Resolution
von: Fan, Qihang, et al.
Veröffentlicht: (2024)

Rectifying Magnitude Neglect in Linear Attention
von: Fan, Qihang, et al.
Veröffentlicht: (2025)

Breaking Complexity Barriers: High-Resolution Image Restoration with Rank Enhanced Linear Attention
von: Ai, Yuang, et al.
Veröffentlicht: (2025)

DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling
von: Ai, Yuang, et al.
Veröffentlicht: (2025)

Vision Transformer with Super Token Sampling
von: Huang, Huaibo, et al.
Veröffentlicht: (2022)

Expand and Prune: Maximizing Trajectory Diversity for Effective GRPO in Generative Models
von: Ge, Shiran, et al.
Veröffentlicht: (2025)

Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning
von: Chen, Mingrui, et al.
Veröffentlicht: (2025)

Straighter Flow Matching via a Diffusion-Based Coupling Prior
von: Xing, Siyu, et al.
Veröffentlicht: (2023)

Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer
von: Ai, Yuang, et al.
Veröffentlicht: (2023)

DeVAn: Dense Video Annotation for Video-Language Models
von: Liu, Tingkai, et al.
Veröffentlicht: (2023)

LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration
von: Ai, Yuang, et al.
Veröffentlicht: (2024)

ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding
von: Peng, Qihang, et al.
Veröffentlicht: (2025)

PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection
von: Zhang, Tianhao, et al.
Veröffentlicht: (2024)

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
von: Han, Xiaotian, et al.
Veröffentlicht: (2024)

ZePo: Zero-Shot Portrait Stylization with Faster Sampling
von: Liu, Jin, et al.
Veröffentlicht: (2024)

NOFT: Test-Time Noise Finetune via Information Bottleneck for Highly Correlated Asset Creation
von: Li, Jia, et al.
Veröffentlicht: (2025)

Sparse but not Simpler: A Multi-Level Interpretability Analysis of Vision Transformers
von: Zhang, Siyu
Veröffentlicht: (2026)

SparseLaneSTP: Leveraging Spatio-Temporal Priors with Sparse Transformers for 3D Lane Detection
von: Pittner, Maximilian, et al.
Veröffentlicht: (2026)

Learning Priors of Human Motion With Vision Transformers
von: Falqueto, Placido, et al.
Veröffentlicht: (2025)

AutoMoT: A Unified Vision-Language-Action Model with Asynchronous Mixture-of-Transformers for End-to-End Autonomous Driving
von: Huang, Wenhui, et al.
Veröffentlicht: (2026)

Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference
von: Liu, Ting, et al.
Veröffentlicht: (2024)

Jailbreaks on Vision Language Model via Multimodal Reasoning
von: Noheria, Aarush, et al.
Veröffentlicht: (2026)

Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration
von: Ai, Yuang, et al.
Veröffentlicht: (2023)

Marmot: Object-Level Self-Correction via Multi-Agent Reasoning
von: Sun, Jiayang, et al.
Veröffentlicht: (2025)

Think 360°: Evaluating the Width-centric Reasoning Capability of MLLMs Beyond Depth
von: Chen, Mingrui, et al.
Veröffentlicht: (2026)

InfoBFR: Real-World Blind Face Restoration via Information Bottleneck
von: Gao, Nan, et al.
Veröffentlicht: (2025)

Parallel Augmentation and Dual Enhancement for Occluded Person Re-identification
von: Wang, Zi, et al.
Veröffentlicht: (2022)

On Vision Transformers for Classification Tasks in Side-Scan Sonar Imagery
von: Sheffield, BW, et al.
Veröffentlicht: (2024)

Advancing Structured Priors for Sparse-Voxel Surface Reconstruction
von: Chi, Ting-Hsun, et al.
Veröffentlicht: (2026)

Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints
von: Zhu, Minghan, et al.
Veröffentlicht: (2025)

ViTamin: Designing Scalable Vision Models in the Vision-Language Era
von: Chen, Jieneng, et al.
Veröffentlicht: (2024)

Learning Photometric Feature Transform for Free-form Object Scan
von: Feng, Xiang, et al.
Veröffentlicht: (2023)

Evaluate Geometry of Radiance Fields with Low-frequency Color Prior
von: Fang, Qihang, et al.
Veröffentlicht: (2023)

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
von: Zhang, Yuan, et al.
Veröffentlicht: (2024)