Saved in:
| Main Authors: | Yang, Panqi, Jing, Haodong, Chao, Jiahao, Xiang, Tingyan, Lin, Li, Hu, Yao, Luo, Yang, Ma, Yongqiang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.05646 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
UniHOI: Unified Human-Object Interaction Understanding via Unified Token Space
by: Yang, Panqi, et al.
Published: (2025)
by: Yang, Panqi, et al.
Published: (2025)
Rethinking Visual Token Reduction in LVLMs Under Cross-Modal Misalignment
by: Xu, Rui, et al.
Published: (2025)
by: Xu, Rui, et al.
Published: (2025)
EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning
by: Zhang, Xiao, et al.
Published: (2025)
by: Zhang, Xiao, et al.
Published: (2025)
See Through Their Minds: Learning Transferable Neural Representation from Cross-Subject fMRI
by: Liu, Yulong, et al.
Published: (2024)
by: Liu, Yulong, et al.
Published: (2024)
FlexMUSE: Multimodal Unification and Semantics Enhancement Framework with Flexible interaction for Creative Writing
by: Chen, Jiahao, et al.
Published: (2025)
by: Chen, Jiahao, et al.
Published: (2025)
Adversarial Examples are Misaligned in Diffusion Model Manifolds
by: Lorenz, Peter, et al.
Published: (2024)
by: Lorenz, Peter, et al.
Published: (2024)
VAEVQ: Enhancing Discrete Visual Tokenization through Variational Modeling
by: Yang, Sicheng, et al.
Published: (2025)
by: Yang, Sicheng, et al.
Published: (2025)
ViRED: Prediction of Visual Relations in Engineering Drawings
by: Gu, Chao, et al.
Published: (2024)
by: Gu, Chao, et al.
Published: (2024)
Resolving Primitive-Sharing Ambiguity in Long-Tailed Industrial Point Cloud Segmentation via Spatial Context Constraints
by: Yin, Chao, et al.
Published: (2026)
by: Yin, Chao, et al.
Published: (2026)
Misalignment-Robust Frequency Distribution Loss for Image Transformation
by: Ni, Zhangkai, et al.
Published: (2024)
by: Ni, Zhangkai, et al.
Published: (2024)
MUSE: Multi-Scale Dense Self-Distillation for Nucleus Detection and Classification
by: Yang, Zijiang, et al.
Published: (2025)
by: Yang, Zijiang, et al.
Published: (2025)
ToDRE: Effective Visual Token Pruning via Token Diversity and Task Relevance
by: Li, Duo, et al.
Published: (2025)
by: Li, Duo, et al.
Published: (2025)
IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning
by: Sun, Zhichao, et al.
Published: (2026)
by: Sun, Zhichao, et al.
Published: (2026)
ConViTac: Aligning Visual-Tactile Fusion with Contrastive Representations
by: Wu, Zhiyuan, et al.
Published: (2025)
by: Wu, Zhiyuan, et al.
Published: (2025)
Topo2Seq: Enhanced Topology Reasoning via Topology Sequence Learning
by: Yang, Yiming, et al.
Published: (2025)
by: Yang, Yiming, et al.
Published: (2025)
TokenSeg: Efficient 3D Medical Image Segmentation via Hierarchical Visual Token Compression
by: Zeng, Sen, et al.
Published: (2026)
by: Zeng, Sen, et al.
Published: (2026)
Collaborative Hybrid Propagator for Temporal Misalignment in Audio-Visual Segmentation
by: Li, Kexin, et al.
Published: (2024)
by: Li, Kexin, et al.
Published: (2024)
MUSE: Manipulating Unified Framework for Synthesizing Emotions in Images via Test-Time Optimization
by: Xia, Yingjie, et al.
Published: (2025)
by: Xia, Yingjie, et al.
Published: (2025)
MUSE: Harnessing Precise and Diverse Semantics for Few-Shot Whole Slide Image Classification
by: Xu, Jiahao, et al.
Published: (2026)
by: Xu, Jiahao, et al.
Published: (2026)
ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension
by: Ma, Tianren, et al.
Published: (2024)
by: Ma, Tianren, et al.
Published: (2024)
ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
by: Xie, Yin, et al.
Published: (2024)
by: Xie, Yin, et al.
Published: (2024)
MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
by: Wang, Chao, et al.
Published: (2025)
by: Wang, Chao, et al.
Published: (2025)
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
by: Ma, Fan, et al.
Published: (2023)
by: Ma, Fan, et al.
Published: (2023)
QG-VTC: Question-Guided Visual Token Compression in MLLMs for Efficient VQA
by: Li, Shuai, et al.
Published: (2025)
by: Li, Shuai, et al.
Published: (2025)
TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models
by: Tan, Xudong, et al.
Published: (2025)
by: Tan, Xudong, et al.
Published: (2025)
Subjective Visual Quality Assessment for High-Fidelity Learning-Based Image Compression
by: Jenadeleh, Mohsen, et al.
Published: (2025)
by: Jenadeleh, Mohsen, et al.
Published: (2025)
TCFormer: Visual Recognition via Token Clustering Transformer
by: Zeng, Wang, et al.
Published: (2024)
by: Zeng, Wang, et al.
Published: (2024)
TopoGaussian: Inferring Internal Topology Structures from Visual Clues
by: Xiong, Xiaoyu, et al.
Published: (2025)
by: Xiong, Xiaoyu, et al.
Published: (2025)
Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation
by: Zhang, Hao, et al.
Published: (2024)
by: Zhang, Hao, et al.
Published: (2024)
Dissecting Representation Misalignment in Contrastive Learning via Influence Function
by: Hu, Lijie, et al.
Published: (2024)
by: Hu, Lijie, et al.
Published: (2024)
TokBench: Evaluating Your Visual Tokenizer before Visual Generation
by: Wu, Junfeng, et al.
Published: (2025)
by: Wu, Junfeng, et al.
Published: (2025)
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
by: Wang, Yuqing, et al.
Published: (2025)
by: Wang, Yuqing, et al.
Published: (2025)
Compression Tells Intelligence: Visual Coding, Visual Token Technology, and the Unification
by: Jin, Xin, et al.
Published: (2026)
by: Jin, Xin, et al.
Published: (2026)
Mettle: Meta-Token Learning for Memory-Efficient Audio-Visual Adaptation
by: Zhou, Jinxing, et al.
Published: (2025)
by: Zhou, Jinxing, et al.
Published: (2025)
No Cache Left Idle: Accelerating diffusion model via Extreme-slimming Caching
by: Wen, Tingyan, et al.
Published: (2025)
by: Wen, Tingyan, et al.
Published: (2025)
Topology-Aware Skeleton Detection via Lighthouse-Guided Structured Inference
by: Fu, Daoyong, et al.
Published: (2026)
by: Fu, Daoyong, et al.
Published: (2026)
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs
by: Yin, Yuanyang, et al.
Published: (2024)
by: Yin, Yuanyang, et al.
Published: (2024)
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
by: Jiao, Yang, et al.
Published: (2025)
by: Jiao, Yang, et al.
Published: (2025)
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens
by: Wang, Yuqing, et al.
Published: (2026)
by: Wang, Yuqing, et al.
Published: (2026)
WinTok: A Win-Win Hybrid Tokenizer via Decomposing Visual Understanding and Generation with Transferable Tokens
by: Guo, Yiwei, et al.
Published: (2026)
by: Guo, Yiwei, et al.
Published: (2026)
Similar Items
-
UniHOI: Unified Human-Object Interaction Understanding via Unified Token Space
by: Yang, Panqi, et al.
Published: (2025) -
Rethinking Visual Token Reduction in LVLMs Under Cross-Modal Misalignment
by: Xu, Rui, et al.
Published: (2025) -
EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning
by: Zhang, Xiao, et al.
Published: (2025) -
See Through Their Minds: Learning Transferable Neural Representation from Cross-Subject fMRI
by: Liu, Yulong, et al.
Published: (2024) -
FlexMUSE: Multimodal Unification and Semantics Enhancement Framework with Flexible interaction for Creative Writing
by: Chen, Jiahao, et al.
Published: (2025)