Saved in:
| Main Authors: | Ding, Xi, Wang, Lei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.13845 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Learning Time in Static Classifiers
by: Ding, Xi, et al.
Published: (2025)
by: Ding, Xi, et al.
Published: (2025)
Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight
by: Ding, Xi, et al.
Published: (2024)
by: Ding, Xi, et al.
Published: (2024)
Subspace Kernel Learning on Tensor Sequences
by: Wang, Lei, et al.
Published: (2026)
by: Wang, Lei, et al.
Published: (2026)
Graph Your Own Prompt
by: Ding, Xi, et al.
Published: (2025)
by: Ding, Xi, et al.
Published: (2025)
Trust-Aware Joint Feature-Prediction Discrepancy for Robust Domain Adaptation
by: Ding, Xi, et al.
Published: (2026)
by: Ding, Xi, et al.
Published: (2026)
Optimization-Free Test-Time Adaptation for Cross-Person Activity Recognition
by: Wang, Shuoyuan, et al.
Published: (2023)
by: Wang, Shuoyuan, et al.
Published: (2023)
Do Understanding and Generation Fight? A Diagnostic Study of DPO for Unified Multimodal Models
by: Rao, Abinav, et al.
Published: (2026)
by: Rao, Abinav, et al.
Published: (2026)
Composition Vision-Language Understanding via Segment and Depth Anything Model
by: Huo, Mingxiao, et al.
Published: (2024)
by: Huo, Mingxiao, et al.
Published: (2024)
PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation
by: Li, Xiaolong, et al.
Published: (2025)
by: Li, Xiaolong, et al.
Published: (2025)
Video Understanding by Design: How Datasets Shape Architectures and Insights
by: Wang, Lei, et al.
Published: (2025)
by: Wang, Lei, et al.
Published: (2025)
Language-Image Models with 3D Understanding
by: Cho, Jang Hyun, et al.
Published: (2024)
by: Cho, Jang Hyun, et al.
Published: (2024)
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
by: Nguyen, Le Thien Phuc, et al.
Published: (2025)
by: Nguyen, Le Thien Phuc, et al.
Published: (2025)
Effortless Active Labeling for Long-Term Test-Time Adaptation
by: Wang, Guowei, et al.
Published: (2025)
by: Wang, Guowei, et al.
Published: (2025)
Do Transformers Understand Ancient Roman Coin Motifs Better than CNNs?
by: Reid, David, et al.
Published: (2026)
by: Reid, David, et al.
Published: (2026)
Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models
by: Ming, Yifei, et al.
Published: (2024)
by: Ming, Yifei, et al.
Published: (2024)
HyDRA: Hierarchical and Dynamic Rank Adaptation for Mobile Vision Language Model
by: Xi, Yuanhao, et al.
Published: (2025)
by: Xi, Yuanhao, et al.
Published: (2025)
The Underappreciated Power of Vision Models for Graph Structural Understanding
by: Zhao, Xinjian, et al.
Published: (2025)
by: Zhao, Xinjian, et al.
Published: (2025)
VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
by: Zou, Bocheng, et al.
Published: (2024)
by: Zou, Bocheng, et al.
Published: (2024)
Adaptive Keyframe Sampling for Long Video Understanding
by: Tang, Xi, et al.
Published: (2025)
by: Tang, Xi, et al.
Published: (2025)
Doubly Debiased Test-Time Prompt Tuning for Vision-Language Models
by: Song, Fei, et al.
Published: (2025)
by: Song, Fei, et al.
Published: (2025)
ModelGrow: Continual Text-to-Video Pre-training with Model Expansion and Language Understanding Enhancement
by: Rao, Zhefan, et al.
Published: (2024)
by: Rao, Zhefan, et al.
Published: (2024)
Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking
by: Yang, Jingcheng, et al.
Published: (2026)
by: Yang, Jingcheng, et al.
Published: (2026)
Tree of Attributes Prompt Learning for Vision-Language Models
by: Ding, Tong, et al.
Published: (2024)
by: Ding, Tong, et al.
Published: (2024)
About Time: Advances, Challenges, and Outlooks of Action Understanding
by: Stergiou, Alexandros, et al.
Published: (2024)
by: Stergiou, Alexandros, et al.
Published: (2024)
Towards Generalisable Time Series Understanding Across Domains
by: Turgut, Özgün, et al.
Published: (2024)
by: Turgut, Özgün, et al.
Published: (2024)
AdaNeg: Adaptive Negative Proxy Guided OOD Detection with Vision-Language Models
by: Zhang, Yabin, et al.
Published: (2024)
by: Zhang, Yabin, et al.
Published: (2024)
Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models
by: Sui, Elaine, et al.
Published: (2024)
by: Sui, Elaine, et al.
Published: (2024)
Understanding the Effects of Distractors on Reasoning Vision-Language Models
by: Bae, Jiyun, et al.
Published: (2025)
by: Bae, Jiyun, et al.
Published: (2025)
OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding
by: Engelmann, Francis, et al.
Published: (2024)
by: Engelmann, Francis, et al.
Published: (2024)
Harnessing Vision-Language Models for Time Series Anomaly Detection
by: He, Zelin, et al.
Published: (2025)
by: He, Zelin, et al.
Published: (2025)
RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models
by: Moshtaghi, Mehdi, et al.
Published: (2025)
by: Moshtaghi, Mehdi, et al.
Published: (2025)
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding
by: Cai, Mu, et al.
Published: (2023)
by: Cai, Mu, et al.
Published: (2023)
Feedback-based Modal Mutual Search for Attacking Vision-Language Pre-training Models
by: Ding, Renhua, et al.
Published: (2024)
by: Ding, Renhua, et al.
Published: (2024)
HourVideo: 1-Hour Video-Language Understanding
by: Chandrasegaran, Keshigeyan, et al.
Published: (2024)
by: Chandrasegaran, Keshigeyan, et al.
Published: (2024)
Understanding the Fine-Grained Knowledge Capabilities of Vision-Language Models
by: Ghosh, Dhruba, et al.
Published: (2026)
by: Ghosh, Dhruba, et al.
Published: (2026)
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?
by: Yao, Yang, et al.
Published: (2025)
by: Yao, Yang, et al.
Published: (2025)
VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model
by: Wang, Beichen, et al.
Published: (2024)
by: Wang, Beichen, et al.
Published: (2024)
CaTS-Bench: Can Language Models Describe Time Series?
by: Zhou, Luca, et al.
Published: (2025)
by: Zhou, Luca, et al.
Published: (2025)
How Do Vision-Language Models Process Conflicting Information Across Modalities?
by: Hua, Tianze, et al.
Published: (2025)
by: Hua, Tianze, et al.
Published: (2025)
Can Large Language Models Understand Symbolic Graphics Programs?
by: Qiu, Zeju, et al.
Published: (2024)
by: Qiu, Zeju, et al.
Published: (2024)
Similar Items
-
Learning Time in Static Classifiers
by: Ding, Xi, et al.
Published: (2025) -
Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight
by: Ding, Xi, et al.
Published: (2024) -
Subspace Kernel Learning on Tensor Sequences
by: Wang, Lei, et al.
Published: (2026) -
Graph Your Own Prompt
by: Ding, Xi, et al.
Published: (2025) -
Trust-Aware Joint Feature-Prediction Discrepancy for Robust Domain Adaptation
by: Ding, Xi, et al.
Published: (2026)