:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Zhihao, Qiu, Xi, Ma, Yukuo, Zhou, Yifu, Chen, Junjie, Zhang, Hongyuan, Zhang, Chi, Li, Xuelong
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence 68T07 I.2.10; I.2.6
Online Access:	https://arxiv.org/abs/2503.07076
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Perception-Consistency Multimodal Large Language Models Reasoning via Caption-Regularized Policy Optimization
by: Tu, Songjun, et al.
Published: (2025)

Training for X-Ray Vision: Amodal Segmentation, Amodal Content Completion, and View-Invariant Object Representation from Multi-Camera Video
by: Moore, Alexander, et al.
Published: (2025)

CaLoRAify: Calorie Estimation with Visual-Text Pairing and LoRA-Driven Visual Language Models
by: Yao, Dongyu, et al.
Published: (2024)

Unpacking Hateful Memes: Presupposed Context and False Claims
by: Cai, Weibin, et al.
Published: (2025)

GLL: A Differentiable Graph Learning Layer for Neural Networks
by: Brown, Jason, et al.
Published: (2024)

Ultrahigh-Q chiral resonances empowered by multi-head attention deep learning
by: Zhang, Cong, et al.
Published: (2025)

Cooperative Perception: A Resource-Efficient Framework for Multi-Drone 3D Scene Reconstruction Using Federated Diffusion and NeRF
by: Pourmandi, Massoud
Published: (2025)

Predictive Modeling of Maritime Radar Data Using Transformer Architecture
by: Qesaraku, Bjorna, et al.
Published: (2025)

When Does Global Attention Help? A Unified Empirical Study on Atomistic Graph Learning
by: Chowdhury, Arindam, et al.
Published: (2025)

Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling
by: Jung, Seoik, et al.
Published: (2025)

To Whom are You Talking? A Deep Learning Model to Endow Social Robots with Addressee Estimation Skills
by: Mazzola, Carlo, et al.
Published: (2023)

An Analysis of Layer-Freezing Strategies for Enhanced Transfer Learning in YOLO Architectures
by: Dobrzycki, Andrzej D., et al.
Published: (2025)

Think, Act, Learn: A Framework for Autonomous Robotic Agents using Closed-Loop Large Language Models
by: Menon, Anjali R., et al.
Published: (2025)

Balanced conic rectified flow
by: Kim, Shin Seong, et al.
Published: (2025)

Self-Attention And Beyond the Infinite: Towards Linear Transformers with Infinite Self-Attention
by: Roffo, Giorgio, et al.
Published: (2026)

IDOL: Instant Photorealistic 3D Human Creation from a Single Image
by: Zhuang, Yiyu, et al.
Published: (2024)

MRI Brain Tumor Detection with Computer Vision
by: Krolik, Jack, et al.
Published: (2025)

VA-$π$: Variational Policy Alignment for Pixel-Aware Autoregressive Generation
by: Liao, Xinyao, et al.
Published: (2025)

Exploring specialization and sensitivity of convolutional neural networks in the context of simultaneous image augmentations
by: Kharyuk, Pavel, et al.
Published: (2025)

Platonic Representations in the Human Brain: Unsupervised Recovery of Universal Geometry
by: Marcos-Manchón, Pablo, et al.
Published: (2026)

Inducing Causal World Models in LLMs for Zero-Shot Physical Reasoning
by: Sharma, Aditya, et al.
Published: (2025)

Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution
by: Kim, Soyeon, et al.
Published: (2026)

Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution
by: Kim, Soyeon, et al.
Published: (2026)

Application of Sensitivity Analysis Methods for Studying Neural Network Models
by: Miao, Jiaxuan, et al.
Published: (2025)

Method of UAV Inspection of Photovoltaic Modules Using Thermal and RGB Data Fusion
by: Lysyi, Andrii, et al.
Published: (2025)

Multimodal Generative AI for Story Point Estimation in Software Development
by: Islam, Mohammad Rubyet, et al.
Published: (2025)

Rethinking Visual Intelligence: Insights from Video Pretraining
by: Acuaviva, Pablo, et al.
Published: (2025)

A Survey on Vision-Language-Action Models for Embodied AI
by: Ma, Yueen, et al.
Published: (2024)

Contrastive Consolidation of Top-Down Modulations Achieves Sparsely Supervised Continual Learning
by: Tran, Viet Anh Khoa, et al.
Published: (2025)

Tricks and Plug-ins for Gradient Boosting in Image Classification
by: Fang, Biyi, et al.
Published: (2025)

ExpReS-VLA: Specializing Vision-Language-Action Models Through Experience Replay and Retrieval
by: Syed, Shahram Najam, et al.
Published: (2025)

Multi-Scale Graph Learning for Anti-Sparse Downscaling
by: Fan, Yingda, et al.
Published: (2025)

Low Dose CT for Stroke Diagnosis: A Dual Pipeline Deep Learning Framework for Portable Neuroimaging
by: Ghosal, Rhea, et al.
Published: (2026)

Sat-JEPA-Diff: Bridging Self-Supervised Learning and Generative Diffusion for Remote Sensing
by: Komurcu, Kursat, et al.
Published: (2026)

OR-VSKC: Resolving Visual-Semantic Knowledge Conflicts in Operating Rooms with Synthetic Data-Guided Alignment
by: Zhao, Weiyi, et al.
Published: (2025)

Akasha 2: Hamiltonian State Space Duality and Visual-Language Joint Embedding Predictive Architectur
by: Meziani, Yani
Published: (2026)

Task-Aligned Self-Supervised Learning for Medical Image Analysis: A Systematic Review and Practical Design Guidelines
by: Wimalasiri, Chathura
Published: (2026)

Robust Noise Attenuation via Adaptive Pooling of Transformer Outputs
by: Brothers, Greyson
Published: (2025)

Enhancing Low-Altitude Airspace Security: MLLM-Enabled UAV Intent Recognition
by: Lei, Guangyu, et al.
Published: (2025)

APT: Adaptive Personalized Training for Diffusion Models with Limited Data
by: Chae, JungWoo, et al.
Published: (2025)