:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gutiérrez, Juan, Gutiérrez-García, Victor, Blanco-Murillo, José Luis
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.06912
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AttAnchor: Guiding Cross-Modal Token Alignment in VLMs with Attention Anchors
by: Zhang, Junyang, et al.
Published: (2025)

AnchorDiff: Training-Free Concept Grounding for MM-DiTs via Anchor-Based Graph Propagation
by: Zhang, Jian, et al.
Published: (2026)

History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions
by: Salgado, Alberto G. Rodríguez
Published: (2026)

LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
by: Wang, Hanyu, et al.
Published: (2024)

Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors
by: Kuang, Zhengfei, et al.
Published: (2024)

What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging
by: Kang, Inha, et al.
Published: (2025)

GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation
by: Xu, Xuwei, et al.
Published: (2023)

Multi-level Matching Network for Multimodal Entity Linking
by: Hu, Zhiwei, et al.
Published: (2024)

PMG: Progressive Motion Generation via Sparse Anchor Postures Curriculum Learning
by: Xi, Yingjie, et al.
Published: (2025)

Frequency-Aware Token Reduction for Efficient Vision Transformer
by: Lee, Dong-Jae, et al.
Published: (2025)

LaneDiffusion: Improving Centerline Graph Learning via Prior Injected BEV Feature Generation
by: Wang, Zijie, et al.
Published: (2025)

Auto-Regressive Surface Cutting
by: Li, Yang, et al.
Published: (2025)

VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient MLLMs Inference
by: Jiang, Pengfei, et al.
Published: (2025)

EAvatar: Expression-Aware Head Avatar Reconstruction with Generative Geometry Priors
by: Zhang, Shikun, et al.
Published: (2025)

Token Pruning using a Lightweight Background Aware Vision Transformer
by: Sah, Sudhakar, et al.
Published: (2024)

Bi-Anchor Interpolation Solver for Accelerating Generative Modeling
by: Chen, Hongxu, et al.
Published: (2026)

Cut to the Chase: Training-free Multimodal Summarization via Chain-of-Events
by: You, Xiaoxing, et al.
Published: (2026)

Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior
by: Li, Yulin, et al.
Published: (2025)

Cut2Next: Generating Next Shot via In-Context Tuning
by: He, Jingwen, et al.
Published: (2025)

Integrating Prior Observations for Incremental 3D Scene Graph Prediction
by: Renz, Marian, et al.
Published: (2025)

MoCA-Video: Motion-Aware Concept Alignment for Consistent Video Editing
by: Zhang, Tong, et al.
Published: (2025)

QAPruner: Quantization-Aware Vision Token Pruning for Multimodal Large Language Models
by: Wang, Xinhao, et al.
Published: (2026)

Panoramic Distortion-Aware Tokenization for Person Detection and Localization in Overhead Fisheye Images
by: Wakai, Nobuhiko, et al.
Published: (2025)

ZSPAPrune: Zero-Shot Prompt-Aware Token Pruning for Vision-Language Models
by: Zhang, Pu, et al.
Published: (2025)

PAR: Prompt-Aware Token Reduction Method for Efficient Large Multimodal Models
by: Liu, Yingen, et al.
Published: (2024)

SweetTok: Semantic-Aware Spatial-Temporal Tokenizer for Compact Video Discretization
by: Tan, Zhentao, et al.
Published: (2024)

VisPCO: Visual Token Pruning Configuration Optimization via Budget-Aware Pareto-Frontier Learning for Vision-Language Models
by: Ji, Huawei, et al.
Published: (2026)

Points-to-3D: Structure-Aware 3D Generation with Point Cloud Priors
by: Xia, Jiatong, et al.
Published: (2026)

PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior
by: Lee, Seunggwan, et al.
Published: (2025)

TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation
by: Li, Ruineng, et al.
Published: (2025)

Monocular Normal Estimation via Shading Sequence Estimation
by: Li, Zongrui, et al.
Published: (2026)

Controllable Video Object Insertion via Multiview Priors
by: Qi, Xia, et al.
Published: (2026)

AdaTok: Adaptive Token Compression with Object-Aware Representations for Efficient Multimodal LLMs
by: Zhang, Xinliang, et al.
Published: (2025)

Beyond Fixed Anchors: Precisely Erasing Concepts with Sibling Exclusive Counterparts
by: Zhang, Tong, et al.
Published: (2025)

See Before You Code: Learning Visual Priors for Spatially Aware Educational Animation Generation
by: Li, Yuejia, et al.
Published: (2026)

UnfoldLDM: Degradation-Aware Unfolding with Iterative Latent Diffusion Priors for Blind Image Restoration
by: He, Chunming, et al.
Published: (2025)

DAP-LED: Learning Degradation-Aware Priors with CLIP for Joint Low-light Enhancement and Deblurring
by: Wang, Ling, et al.
Published: (2024)

Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
by: Zhang, Yue, et al.
Published: (2024)

Augmented Structure Preserving Neural Networks for cell biomechanics
by: Olalla-Pombo, Juan, et al.
Published: (2025)

VDInstruct: Zero-Shot Key Information Extraction via Content-Aware Vision Tokenization
by: Nguyen, Son, et al.
Published: (2025)