:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Khor, Yin-Loon, Wong, Yi-Jie, Hum, Yan Chai
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.03172
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EffiPerception: an Efficient Framework for Various Perception Tasks
by: Xiang, Xinhao, et al.
Published: (2024)

EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models
by: Wang, Zekun, et al.
Published: (2025)

REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation
by: Xue, Xizhe, et al.
Published: (2024)

DAVE: A VLM Vision Encoder for Document Understanding and Web Agents
by: Huang, Brandon, et al.
Published: (2025)

Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning
by: Tan, Jing Jie, et al.
Published: (2025)

GeoDANO: Geometric VLM with Domain Agnostic Vision Encoder
by: Cho, Seunghyuk, et al.
Published: (2025)

Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
by: Zhang, Boqiang, et al.
Published: (2026)

Bodhi VLM: Privacy-Alignment Modeling for Hierarchical Visual Representations in Vision Backbones and VLM Encoders via Bottom-Up and Top-Down Feature Search
by: Ma, Bo, et al.
Published: (2026)

EffiVED:Efficient Video Editing via Text-instruction Diffusion Models
by: Zhang, Zhenghao, et al.
Published: (2024)

EffiComm: Bandwidth Efficient Multi Agent Communication
by: Yazgan, Melih, et al.
Published: (2025)

Dual Associated Encoder for Face Restoration
by: Tsai, Yu-Ju, et al.
Published: (2023)

MoiréNet: A Compact Dual-Domain Network for Image Demoiréing
by: Guo, Shuwei, et al.
Published: (2025)

Vehicle Detection Performance in Nordic Region
by: Mokayed, Hamam, et al.
Published: (2024)

Dual-Domain Perspective on Degradation-Aware Fusion: A VLM-Guided Robust Infrared and Visible Image Fusion Framework
by: Zhang, Tianpei, et al.
Published: (2025)

Towards Comprehensive Interactive Change Understanding in Remote Sensing: A Large-scale Dataset and Dual-granularity Enhanced VLM
by: Xue, Junxiao, et al.
Published: (2025)

DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference
by: Singh, Aditya Kumar, et al.
Published: (2026)

A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM
by: Han, ByungOk, et al.
Published: (2024)

Is Micro-expression Ethnic Leaning?
by: Khor, Huai-Qian, et al.
Published: (2025)

Infused Suppression Of Magnification Artefacts For Micro-AU Detection
by: Khor, Huai-Qian, et al.
Published: (2025)

CogVLM: Visual Expert for Pretrained Language Models
by: Wang, Weihan, et al.
Published: (2023)

AGE-Net: Spectral--Spatial Fusion and Anatomical Graph Reasoning with Evidential Ordinal Regression for Knee Osteoarthritis Grading
by: Li, Xiaoyang, et al.
Published: (2026)

DuMo: Dual Encoder Modulation Network for Precise Concept Erasure
by: Han, Feng, et al.
Published: (2025)

SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval
by: Jiang, Longtao, et al.
Published: (2024)

OmniSAT: Compact Action Token, Faster Auto Regression
by: Lyu, Huaihai, et al.
Published: (2025)

From Representational Complementarity to Dual Systems: Synergizing VLM and Vision-Only Backbones for End-to-End Driving
by: Ang, Sining, et al.
Published: (2026)

Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction
by: Shou, Yuntao, et al.
Published: (2024)

Benchmarking and Enhancing VLM for Compressed Image Understanding
by: Zhang, Zifu, et al.
Published: (2025)

Precision Synthesis of Multi-Tracer PET via VLM-Modulated Rectified Flow for Stratifying Mild Cognitive Impairment
by: Liu, Tuo, et al.
Published: (2026)

TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions
by: Chen, Ce, et al.
Published: (2026)

MiniMax-Remover: Taming Bad Noise Helps Video Object Removal
by: Zi, Bojia, et al.
Published: (2025)

CogVLM2: Visual Language Models for Image and Video Understanding
by: Hong, Wenyi, et al.
Published: (2024)

An Attentive Dual-Encoder Framework Leveraging Multimodal Visual and Semantic Information for Automatic OSAHS Diagnosis
by: Wei, Yingchen, et al.
Published: (2024)

ThyroidEffi 1.0: A Cost-Effective System for High-Performance Multi-Class Thyroid Carcinoma Classification
by: Pham-Ngoc, Hai, et al.
Published: (2025)

SiMiC: Context-Aware Silicon Microstructure Characterization Using Attention-Based Convolutional Neural Networks for Field-Emission Tip Analysis
by: Tan, Jing Jie, et al.
Published: (2026)

Dual-Prompt CLIP with Hybrid Visual Encoders for Occluded Person Re-Identification
by: Ji, Zhangjian, et al.
Published: (2026)

FDCE-Net: Underwater Image Enhancement with Embedding Frequency and Dual Color Encoder
by: Cheng, Zheng, et al.
Published: (2024)

Contrastive Pretraining with Dual Visual Encoders for Gloss-Free Sign Language Translation
by: Sincan, Ozge Mercanoglu, et al.
Published: (2025)

Language-Image Alignment with Fixed Text Encoders
by: Yang, Jingfeng, et al.
Published: (2025)

Q-VLM: Post-training Quantization for Large Vision-Language Models
by: Wang, Changyuan, et al.
Published: (2024)

Slot-VLM: SlowFast Slots for Video-Language Modeling
by: Xu, Jiaqi, et al.
Published: (2024)