:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Liang, Jiajia
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition I.2.10
Online Access:	https://arxiv.org/abs/2501.01864
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

U-Net-Like Spiking Neural Networks for Single Image Dehazing
by: Li, Huibin, et al.
Published: (2025)

FoR-Net: Learning to Focus on Hard Regions for Efficient Semantic Segmentation
by: Chan, Sheng-Wei, et al.
Published: (2026)

ViG-LRGC: Vision Graph Neural Networks with Learnable Reparameterized Graph Construction
by: Elsharkawi, Ismael, et al.
Published: (2025)

MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
by: Wang, Chao, et al.
Published: (2025)

Watch Your Step: Information Injection in Diffusion Models via Shadow Timestep Embedding
by: Huang, An, et al.
Published: (2026)

BlindSight: Harnessing Sparsity for Efficient Vision-Language Models
by: Srikrishnan, Tharun Adithya, et al.
Published: (2025)

YotoR-You Only Transform One Representation
by: Villa, José Ignacio Díaz, et al.
Published: (2024)

Computer Vision for Clinical Gait Analysis: A Gait Abnormality Video Dataset
by: Ranjan, Rahm, et al.
Published: (2024)

SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization
by: Liu, Sicheng, et al.
Published: (2024)

Illumination and Shadows in Head Rotation: experiments with Denoising Diffusion Models
by: Asperti, Andrea, et al.
Published: (2023)

A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS
by: Terven, Juan, et al.
Published: (2023)

GeoHeight-Bench: Towards Height-Aware Multimodal Reasoning in Remote Sensing
by: Hu, Xuran, et al.
Published: (2026)

ERNet: Efficient Non-Rigid Registration Network for Point Sequences
by: He, Guangzhao, et al.
Published: (2025)

Hierarchical Feature-level Reverse Propagation for Post-Training Neural Networks
by: Ding, Ni, et al.
Published: (2025)

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding
by: Han, Yudong, et al.
Published: (2024)

NAC-TCN: Temporal Convolutional Networks with Causal Dilated Neighborhood Attention for Emotion Understanding
by: Mehta, Alexander, et al.
Published: (2023)

CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)

PhysVideoGenerator: Towards Physically Aware Video Generation via Latent Physics Guidance
by: Satish, Siddarth Nilol Kundur, et al.
Published: (2026)

VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
by: Chen, Zhangquan, et al.
Published: (2025)

More Than Meets the Eye: Measuring the Semiotic Gap in Vision-Language Models via Semantic Anchorage
by: He, Wei
Published: (2026)

Boundary-Protection W8A8 HiFloat8 Quantization for Large-Scale Text-to-Video Diffusion Transformers
by: Zhao, Yiming
Published: (2026)

Combined Hyperbolic and Euclidean Soft Triple Loss Beyond the Single Space Deep Metric Learning
by: Saeki, Shozo, et al.
Published: (2025)

A Vision-Language Model for Focal Liver Lesion Classification
by: Jian, Song, et al.
Published: (2025)

Learning Association via Track-Detection Matching for Multi-Object Tracking
by: Adžemović, Momir
Published: (2025)

Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models
by: Dubois, L'ea, et al.
Published: (2025)

H-FCBFormer Hierarchical Fully Convolutional Branch Transformer for Occlusal Contact Segmentation with Articulating Paper
by: Banks, Ryan, et al.
Published: (2024)

Context-Aware Network Based on Multi-scale Spatio-temporal Attention for Action Recognition in Videos
by: Li, Xiaoyang, et al.
Published: (2025)

A Recipe for Geometry-Aware 3D Mesh Transformers
by: Farazi, Mohammad, et al.
Published: (2024)

OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance
by: Wang, Chaoyi, et al.
Published: (2025)

VIAFormer: Voxel-Image Alignment Transformer for High-Fidelity Voxel Refinement
by: Fang, Tiancheng, et al.
Published: (2026)

Towards a Generalizable Fusion Architecture for Multimodal Object Detection
by: Berjawi, Jad, et al.
Published: (2025)

NumeriKontrol: Adding Numeric Control to Diffusion Transformers for Instruction-based Image Editing
by: Xu, Zhenyu, et al.
Published: (2025)

IMASHRIMP: Automatic White Shrimp (Penaeus vannamei) Biometrical Analysis from Laboratory Images Using Computer Vision and Deep Learning
by: González, Abiam Remache, et al.
Published: (2025)

Relative Drawing Identification Complexity is Invariant to Modality in Vision-Language Models
by: Freitas, Diogo, et al.
Published: (2025)

M3CAD: Towards Generic Cooperative Autonomous Driving Benchmark
by: Zhu, Morui, et al.
Published: (2025)

On the Limitations of Vision-Language Models in Understanding Image Transforms
by: Anis, Ahmad Mustafa, et al.
Published: (2025)

An Evaluation of a Visual Question Answering Strategy for Zero-shot Facial Expression Recognition in Still Images
by: Castrillón-Santana, Modesto, et al.
Published: (2025)

Textual and Visual Guided Task Adaptation for Source-Free Cross-Domain Few-Shot Segmentation
by: Liu, Jianming, et al.
Published: (2025)

Escaping The Big Data Paradigm in Self-Supervised Representation Learning
by: García, Carlos Vélez, et al.
Published: (2025)

A self-supervised cyclic neural-analytic approach for novel view synthesis and 3D reconstruction
by: Costea, Dragos, et al.
Published: (2025)