Saved in:
| Main Author: | Liang, Jiajia |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.01864 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
U-Net-Like Spiking Neural Networks for Single Image Dehazing
by: Li, Huibin, et al.
Published: (2025)
by: Li, Huibin, et al.
Published: (2025)
FoR-Net: Learning to Focus on Hard Regions for Efficient Semantic Segmentation
by: Chan, Sheng-Wei, et al.
Published: (2026)
by: Chan, Sheng-Wei, et al.
Published: (2026)
ViG-LRGC: Vision Graph Neural Networks with Learnable Reparameterized Graph Construction
by: Elsharkawi, Ismael, et al.
Published: (2025)
by: Elsharkawi, Ismael, et al.
Published: (2025)
MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
by: Wang, Chao, et al.
Published: (2025)
by: Wang, Chao, et al.
Published: (2025)
Watch Your Step: Information Injection in Diffusion Models via Shadow Timestep Embedding
by: Huang, An, et al.
Published: (2026)
by: Huang, An, et al.
Published: (2026)
BlindSight: Harnessing Sparsity for Efficient Vision-Language Models
by: Srikrishnan, Tharun Adithya, et al.
Published: (2025)
by: Srikrishnan, Tharun Adithya, et al.
Published: (2025)
YotoR-You Only Transform One Representation
by: Villa, José Ignacio Díaz, et al.
Published: (2024)
by: Villa, José Ignacio Díaz, et al.
Published: (2024)
Computer Vision for Clinical Gait Analysis: A Gait Abnormality Video Dataset
by: Ranjan, Rahm, et al.
Published: (2024)
by: Ranjan, Rahm, et al.
Published: (2024)
SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization
by: Liu, Sicheng, et al.
Published: (2024)
by: Liu, Sicheng, et al.
Published: (2024)
Illumination and Shadows in Head Rotation: experiments with Denoising Diffusion Models
by: Asperti, Andrea, et al.
Published: (2023)
by: Asperti, Andrea, et al.
Published: (2023)
A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS
by: Terven, Juan, et al.
Published: (2023)
by: Terven, Juan, et al.
Published: (2023)
GeoHeight-Bench: Towards Height-Aware Multimodal Reasoning in Remote Sensing
by: Hu, Xuran, et al.
Published: (2026)
by: Hu, Xuran, et al.
Published: (2026)
ERNet: Efficient Non-Rigid Registration Network for Point Sequences
by: He, Guangzhao, et al.
Published: (2025)
by: He, Guangzhao, et al.
Published: (2025)
Hierarchical Feature-level Reverse Propagation for Post-Training Neural Networks
by: Ding, Ni, et al.
Published: (2025)
by: Ding, Ni, et al.
Published: (2025)
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding
by: Han, Yudong, et al.
Published: (2024)
by: Han, Yudong, et al.
Published: (2024)
NAC-TCN: Temporal Convolutional Networks with Causal Dilated Neighborhood Attention for Emotion Understanding
by: Mehta, Alexander, et al.
Published: (2023)
by: Mehta, Alexander, et al.
Published: (2023)
CLIP-Joint-Detect: End-to-End Joint Training of Object Detectors with Contrastive Vision-Language Supervision
by: Raoufi, Behnam, et al.
Published: (2025)
by: Raoufi, Behnam, et al.
Published: (2025)
PhysVideoGenerator: Towards Physically Aware Video Generation via Latent Physics Guidance
by: Satish, Siddarth Nilol Kundur, et al.
Published: (2026)
by: Satish, Siddarth Nilol Kundur, et al.
Published: (2026)
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
by: Chen, Zhangquan, et al.
Published: (2025)
by: Chen, Zhangquan, et al.
Published: (2025)
More Than Meets the Eye: Measuring the Semiotic Gap in Vision-Language Models via Semantic Anchorage
by: He, Wei
Published: (2026)
by: He, Wei
Published: (2026)
Boundary-Protection W8A8 HiFloat8 Quantization for Large-Scale Text-to-Video Diffusion Transformers
by: Zhao, Yiming
Published: (2026)
by: Zhao, Yiming
Published: (2026)
Combined Hyperbolic and Euclidean Soft Triple Loss Beyond the Single Space Deep Metric Learning
by: Saeki, Shozo, et al.
Published: (2025)
by: Saeki, Shozo, et al.
Published: (2025)
A Vision-Language Model for Focal Liver Lesion Classification
by: Jian, Song, et al.
Published: (2025)
by: Jian, Song, et al.
Published: (2025)
Learning Association via Track-Detection Matching for Multi-Object Tracking
by: Adžemović, Momir
Published: (2025)
by: Adžemović, Momir
Published: (2025)
Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models
by: Dubois, L'ea, et al.
Published: (2025)
by: Dubois, L'ea, et al.
Published: (2025)
H-FCBFormer Hierarchical Fully Convolutional Branch Transformer for Occlusal Contact Segmentation with Articulating Paper
by: Banks, Ryan, et al.
Published: (2024)
by: Banks, Ryan, et al.
Published: (2024)
Context-Aware Network Based on Multi-scale Spatio-temporal Attention for Action Recognition in Videos
by: Li, Xiaoyang, et al.
Published: (2025)
by: Li, Xiaoyang, et al.
Published: (2025)
A Recipe for Geometry-Aware 3D Mesh Transformers
by: Farazi, Mohammad, et al.
Published: (2024)
by: Farazi, Mohammad, et al.
Published: (2024)
OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance
by: Wang, Chaoyi, et al.
Published: (2025)
by: Wang, Chaoyi, et al.
Published: (2025)
VIAFormer: Voxel-Image Alignment Transformer for High-Fidelity Voxel Refinement
by: Fang, Tiancheng, et al.
Published: (2026)
by: Fang, Tiancheng, et al.
Published: (2026)
Towards a Generalizable Fusion Architecture for Multimodal Object Detection
by: Berjawi, Jad, et al.
Published: (2025)
by: Berjawi, Jad, et al.
Published: (2025)
NumeriKontrol: Adding Numeric Control to Diffusion Transformers for Instruction-based Image Editing
by: Xu, Zhenyu, et al.
Published: (2025)
by: Xu, Zhenyu, et al.
Published: (2025)
IMASHRIMP: Automatic White Shrimp (Penaeus vannamei) Biometrical Analysis from Laboratory Images Using Computer Vision and Deep Learning
by: González, Abiam Remache, et al.
Published: (2025)
by: González, Abiam Remache, et al.
Published: (2025)
Relative Drawing Identification Complexity is Invariant to Modality in Vision-Language Models
by: Freitas, Diogo, et al.
Published: (2025)
by: Freitas, Diogo, et al.
Published: (2025)
M3CAD: Towards Generic Cooperative Autonomous Driving Benchmark
by: Zhu, Morui, et al.
Published: (2025)
by: Zhu, Morui, et al.
Published: (2025)
On the Limitations of Vision-Language Models in Understanding Image Transforms
by: Anis, Ahmad Mustafa, et al.
Published: (2025)
by: Anis, Ahmad Mustafa, et al.
Published: (2025)
An Evaluation of a Visual Question Answering Strategy for Zero-shot Facial Expression Recognition in Still Images
by: Castrillón-Santana, Modesto, et al.
Published: (2025)
by: Castrillón-Santana, Modesto, et al.
Published: (2025)
Textual and Visual Guided Task Adaptation for Source-Free Cross-Domain Few-Shot Segmentation
by: Liu, Jianming, et al.
Published: (2025)
by: Liu, Jianming, et al.
Published: (2025)
Escaping The Big Data Paradigm in Self-Supervised Representation Learning
by: García, Carlos Vélez, et al.
Published: (2025)
by: García, Carlos Vélez, et al.
Published: (2025)
A self-supervised cyclic neural-analytic approach for novel view synthesis and 3D reconstruction
by: Costea, Dragos, et al.
Published: (2025)
by: Costea, Dragos, et al.
Published: (2025)
Similar Items
-
U-Net-Like Spiking Neural Networks for Single Image Dehazing
by: Li, Huibin, et al.
Published: (2025) -
FoR-Net: Learning to Focus on Hard Regions for Efficient Semantic Segmentation
by: Chan, Sheng-Wei, et al.
Published: (2026) -
ViG-LRGC: Vision Graph Neural Networks with Learnable Reparameterized Graph Construction
by: Elsharkawi, Ismael, et al.
Published: (2025) -
MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
by: Wang, Chao, et al.
Published: (2025) -
Watch Your Step: Information Injection in Diffusion Models via Shadow Timestep Embedding
by: Huang, An, et al.
Published: (2026)