Saved in:
| Main Authors: | Khor, Yin-Loon, Wong, Yi-Jie, Hum, Yan Chai |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.03172 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EffiPerception: an Efficient Framework for Various Perception Tasks
by: Xiang, Xinhao, et al.
Published: (2024)
by: Xiang, Xinhao, et al.
Published: (2024)
EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models
by: Wang, Zekun, et al.
Published: (2025)
by: Wang, Zekun, et al.
Published: (2025)
REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation
by: Xue, Xizhe, et al.
Published: (2024)
by: Xue, Xizhe, et al.
Published: (2024)
DAVE: A VLM Vision Encoder for Document Understanding and Web Agents
by: Huang, Brandon, et al.
Published: (2025)
by: Huang, Brandon, et al.
Published: (2025)
Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning
by: Tan, Jing Jie, et al.
Published: (2025)
by: Tan, Jing Jie, et al.
Published: (2025)
GeoDANO: Geometric VLM with Domain Agnostic Vision Encoder
by: Cho, Seunghyuk, et al.
Published: (2025)
by: Cho, Seunghyuk, et al.
Published: (2025)
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
by: Zhang, Boqiang, et al.
Published: (2026)
by: Zhang, Boqiang, et al.
Published: (2026)
Bodhi VLM: Privacy-Alignment Modeling for Hierarchical Visual Representations in Vision Backbones and VLM Encoders via Bottom-Up and Top-Down Feature Search
by: Ma, Bo, et al.
Published: (2026)
by: Ma, Bo, et al.
Published: (2026)
EffiVED:Efficient Video Editing via Text-instruction Diffusion Models
by: Zhang, Zhenghao, et al.
Published: (2024)
by: Zhang, Zhenghao, et al.
Published: (2024)
EffiComm: Bandwidth Efficient Multi Agent Communication
by: Yazgan, Melih, et al.
Published: (2025)
by: Yazgan, Melih, et al.
Published: (2025)
Dual Associated Encoder for Face Restoration
by: Tsai, Yu-Ju, et al.
Published: (2023)
by: Tsai, Yu-Ju, et al.
Published: (2023)
MoiréNet: A Compact Dual-Domain Network for Image Demoiréing
by: Guo, Shuwei, et al.
Published: (2025)
by: Guo, Shuwei, et al.
Published: (2025)
Vehicle Detection Performance in Nordic Region
by: Mokayed, Hamam, et al.
Published: (2024)
by: Mokayed, Hamam, et al.
Published: (2024)
Dual-Domain Perspective on Degradation-Aware Fusion: A VLM-Guided Robust Infrared and Visible Image Fusion Framework
by: Zhang, Tianpei, et al.
Published: (2025)
by: Zhang, Tianpei, et al.
Published: (2025)
Towards Comprehensive Interactive Change Understanding in Remote Sensing: A Large-scale Dataset and Dual-granularity Enhanced VLM
by: Xue, Junxiao, et al.
Published: (2025)
by: Xue, Junxiao, et al.
Published: (2025)
DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference
by: Singh, Aditya Kumar, et al.
Published: (2026)
by: Singh, Aditya Kumar, et al.
Published: (2026)
A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM
by: Han, ByungOk, et al.
Published: (2024)
by: Han, ByungOk, et al.
Published: (2024)
Is Micro-expression Ethnic Leaning?
by: Khor, Huai-Qian, et al.
Published: (2025)
by: Khor, Huai-Qian, et al.
Published: (2025)
Infused Suppression Of Magnification Artefacts For Micro-AU Detection
by: Khor, Huai-Qian, et al.
Published: (2025)
by: Khor, Huai-Qian, et al.
Published: (2025)
CogVLM: Visual Expert for Pretrained Language Models
by: Wang, Weihan, et al.
Published: (2023)
by: Wang, Weihan, et al.
Published: (2023)
AGE-Net: Spectral--Spatial Fusion and Anatomical Graph Reasoning with Evidential Ordinal Regression for Knee Osteoarthritis Grading
by: Li, Xiaoyang, et al.
Published: (2026)
by: Li, Xiaoyang, et al.
Published: (2026)
DuMo: Dual Encoder Modulation Network for Precise Concept Erasure
by: Han, Feng, et al.
Published: (2025)
by: Han, Feng, et al.
Published: (2025)
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval
by: Jiang, Longtao, et al.
Published: (2024)
by: Jiang, Longtao, et al.
Published: (2024)
OmniSAT: Compact Action Token, Faster Auto Regression
by: Lyu, Huaihai, et al.
Published: (2025)
by: Lyu, Huaihai, et al.
Published: (2025)
From Representational Complementarity to Dual Systems: Synergizing VLM and Vision-Only Backbones for End-to-End Driving
by: Ang, Sining, et al.
Published: (2026)
by: Ang, Sining, et al.
Published: (2026)
Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction
by: Shou, Yuntao, et al.
Published: (2024)
by: Shou, Yuntao, et al.
Published: (2024)
Benchmarking and Enhancing VLM for Compressed Image Understanding
by: Zhang, Zifu, et al.
Published: (2025)
by: Zhang, Zifu, et al.
Published: (2025)
Precision Synthesis of Multi-Tracer PET via VLM-Modulated Rectified Flow for Stratifying Mild Cognitive Impairment
by: Liu, Tuo, et al.
Published: (2026)
by: Liu, Tuo, et al.
Published: (2026)
TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions
by: Chen, Ce, et al.
Published: (2026)
by: Chen, Ce, et al.
Published: (2026)
MiniMax-Remover: Taming Bad Noise Helps Video Object Removal
by: Zi, Bojia, et al.
Published: (2025)
by: Zi, Bojia, et al.
Published: (2025)
CogVLM2: Visual Language Models for Image and Video Understanding
by: Hong, Wenyi, et al.
Published: (2024)
by: Hong, Wenyi, et al.
Published: (2024)
An Attentive Dual-Encoder Framework Leveraging Multimodal Visual and Semantic Information for Automatic OSAHS Diagnosis
by: Wei, Yingchen, et al.
Published: (2024)
by: Wei, Yingchen, et al.
Published: (2024)
ThyroidEffi 1.0: A Cost-Effective System for High-Performance Multi-Class Thyroid Carcinoma Classification
by: Pham-Ngoc, Hai, et al.
Published: (2025)
by: Pham-Ngoc, Hai, et al.
Published: (2025)
SiMiC: Context-Aware Silicon Microstructure Characterization Using Attention-Based Convolutional Neural Networks for Field-Emission Tip Analysis
by: Tan, Jing Jie, et al.
Published: (2026)
by: Tan, Jing Jie, et al.
Published: (2026)
Dual-Prompt CLIP with Hybrid Visual Encoders for Occluded Person Re-Identification
by: Ji, Zhangjian, et al.
Published: (2026)
by: Ji, Zhangjian, et al.
Published: (2026)
FDCE-Net: Underwater Image Enhancement with Embedding Frequency and Dual Color Encoder
by: Cheng, Zheng, et al.
Published: (2024)
by: Cheng, Zheng, et al.
Published: (2024)
Contrastive Pretraining with Dual Visual Encoders for Gloss-Free Sign Language Translation
by: Sincan, Ozge Mercanoglu, et al.
Published: (2025)
by: Sincan, Ozge Mercanoglu, et al.
Published: (2025)
Language-Image Alignment with Fixed Text Encoders
by: Yang, Jingfeng, et al.
Published: (2025)
by: Yang, Jingfeng, et al.
Published: (2025)
Q-VLM: Post-training Quantization for Large Vision-Language Models
by: Wang, Changyuan, et al.
Published: (2024)
by: Wang, Changyuan, et al.
Published: (2024)
Slot-VLM: SlowFast Slots for Video-Language Modeling
by: Xu, Jiaqi, et al.
Published: (2024)
by: Xu, Jiaqi, et al.
Published: (2024)
Similar Items
-
EffiPerception: an Efficient Framework for Various Perception Tasks
by: Xiang, Xinhao, et al.
Published: (2024) -
EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models
by: Wang, Zekun, et al.
Published: (2025) -
REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation
by: Xue, Xizhe, et al.
Published: (2024) -
DAVE: A VLM Vision Encoder for Document Understanding and Web Agents
by: Huang, Brandon, et al.
Published: (2025) -
Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning
by: Tan, Jing Jie, et al.
Published: (2025)