Saved in:
| Main Authors: | Huang, Zhihao, Qiu, Xi, Ma, Yukuo, Zhou, Yifu, Chen, Junjie, Zhang, Hongyuan, Zhang, Chi, Li, Xuelong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.07076 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Perception-Consistency Multimodal Large Language Models Reasoning via Caption-Regularized Policy Optimization
by: Tu, Songjun, et al.
Published: (2025)
by: Tu, Songjun, et al.
Published: (2025)
Training for X-Ray Vision: Amodal Segmentation, Amodal Content Completion, and View-Invariant Object Representation from Multi-Camera Video
by: Moore, Alexander, et al.
Published: (2025)
by: Moore, Alexander, et al.
Published: (2025)
CaLoRAify: Calorie Estimation with Visual-Text Pairing and LoRA-Driven Visual Language Models
by: Yao, Dongyu, et al.
Published: (2024)
by: Yao, Dongyu, et al.
Published: (2024)
Unpacking Hateful Memes: Presupposed Context and False Claims
by: Cai, Weibin, et al.
Published: (2025)
by: Cai, Weibin, et al.
Published: (2025)
GLL: A Differentiable Graph Learning Layer for Neural Networks
by: Brown, Jason, et al.
Published: (2024)
by: Brown, Jason, et al.
Published: (2024)
Ultrahigh-Q chiral resonances empowered by multi-head attention deep learning
by: Zhang, Cong, et al.
Published: (2025)
by: Zhang, Cong, et al.
Published: (2025)
Cooperative Perception: A Resource-Efficient Framework for Multi-Drone 3D Scene Reconstruction Using Federated Diffusion and NeRF
by: Pourmandi, Massoud
Published: (2025)
by: Pourmandi, Massoud
Published: (2025)
Predictive Modeling of Maritime Radar Data Using Transformer Architecture
by: Qesaraku, Bjorna, et al.
Published: (2025)
by: Qesaraku, Bjorna, et al.
Published: (2025)
When Does Global Attention Help? A Unified Empirical Study on Atomistic Graph Learning
by: Chowdhury, Arindam, et al.
Published: (2025)
by: Chowdhury, Arindam, et al.
Published: (2025)
Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling
by: Jung, Seoik, et al.
Published: (2025)
by: Jung, Seoik, et al.
Published: (2025)
To Whom are You Talking? A Deep Learning Model to Endow Social Robots with Addressee Estimation Skills
by: Mazzola, Carlo, et al.
Published: (2023)
by: Mazzola, Carlo, et al.
Published: (2023)
An Analysis of Layer-Freezing Strategies for Enhanced Transfer Learning in YOLO Architectures
by: Dobrzycki, Andrzej D., et al.
Published: (2025)
by: Dobrzycki, Andrzej D., et al.
Published: (2025)
Think, Act, Learn: A Framework for Autonomous Robotic Agents using Closed-Loop Large Language Models
by: Menon, Anjali R., et al.
Published: (2025)
by: Menon, Anjali R., et al.
Published: (2025)
Balanced conic rectified flow
by: Kim, Shin Seong, et al.
Published: (2025)
by: Kim, Shin Seong, et al.
Published: (2025)
Self-Attention And Beyond the Infinite: Towards Linear Transformers with Infinite Self-Attention
by: Roffo, Giorgio, et al.
Published: (2026)
by: Roffo, Giorgio, et al.
Published: (2026)
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
by: Zhuang, Yiyu, et al.
Published: (2024)
by: Zhuang, Yiyu, et al.
Published: (2024)
MRI Brain Tumor Detection with Computer Vision
by: Krolik, Jack, et al.
Published: (2025)
by: Krolik, Jack, et al.
Published: (2025)
VA-$π$: Variational Policy Alignment for Pixel-Aware Autoregressive Generation
by: Liao, Xinyao, et al.
Published: (2025)
by: Liao, Xinyao, et al.
Published: (2025)
Exploring specialization and sensitivity of convolutional neural networks in the context of simultaneous image augmentations
by: Kharyuk, Pavel, et al.
Published: (2025)
by: Kharyuk, Pavel, et al.
Published: (2025)
Platonic Representations in the Human Brain: Unsupervised Recovery of Universal Geometry
by: Marcos-Manchón, Pablo, et al.
Published: (2026)
by: Marcos-Manchón, Pablo, et al.
Published: (2026)
Inducing Causal World Models in LLMs for Zero-Shot Physical Reasoning
by: Sharma, Aditya, et al.
Published: (2025)
by: Sharma, Aditya, et al.
Published: (2025)
Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution
by: Kim, Soyeon, et al.
Published: (2026)
by: Kim, Soyeon, et al.
Published: (2026)
Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution
by: Kim, Soyeon, et al.
Published: (2026)
by: Kim, Soyeon, et al.
Published: (2026)
Application of Sensitivity Analysis Methods for Studying Neural Network Models
by: Miao, Jiaxuan, et al.
Published: (2025)
by: Miao, Jiaxuan, et al.
Published: (2025)
Method of UAV Inspection of Photovoltaic Modules Using Thermal and RGB Data Fusion
by: Lysyi, Andrii, et al.
Published: (2025)
by: Lysyi, Andrii, et al.
Published: (2025)
Multimodal Generative AI for Story Point Estimation in Software Development
by: Islam, Mohammad Rubyet, et al.
Published: (2025)
by: Islam, Mohammad Rubyet, et al.
Published: (2025)
Rethinking Visual Intelligence: Insights from Video Pretraining
by: Acuaviva, Pablo, et al.
Published: (2025)
by: Acuaviva, Pablo, et al.
Published: (2025)
A Survey on Vision-Language-Action Models for Embodied AI
by: Ma, Yueen, et al.
Published: (2024)
by: Ma, Yueen, et al.
Published: (2024)
Contrastive Consolidation of Top-Down Modulations Achieves Sparsely Supervised Continual Learning
by: Tran, Viet Anh Khoa, et al.
Published: (2025)
by: Tran, Viet Anh Khoa, et al.
Published: (2025)
Tricks and Plug-ins for Gradient Boosting in Image Classification
by: Fang, Biyi, et al.
Published: (2025)
by: Fang, Biyi, et al.
Published: (2025)
ExpReS-VLA: Specializing Vision-Language-Action Models Through Experience Replay and Retrieval
by: Syed, Shahram Najam, et al.
Published: (2025)
by: Syed, Shahram Najam, et al.
Published: (2025)
Multi-Scale Graph Learning for Anti-Sparse Downscaling
by: Fan, Yingda, et al.
Published: (2025)
by: Fan, Yingda, et al.
Published: (2025)
Low Dose CT for Stroke Diagnosis: A Dual Pipeline Deep Learning Framework for Portable Neuroimaging
by: Ghosal, Rhea, et al.
Published: (2026)
by: Ghosal, Rhea, et al.
Published: (2026)
Sat-JEPA-Diff: Bridging Self-Supervised Learning and Generative Diffusion for Remote Sensing
by: Komurcu, Kursat, et al.
Published: (2026)
by: Komurcu, Kursat, et al.
Published: (2026)
OR-VSKC: Resolving Visual-Semantic Knowledge Conflicts in Operating Rooms with Synthetic Data-Guided Alignment
by: Zhao, Weiyi, et al.
Published: (2025)
by: Zhao, Weiyi, et al.
Published: (2025)
Akasha 2: Hamiltonian State Space Duality and Visual-Language Joint Embedding Predictive Architectur
by: Meziani, Yani
Published: (2026)
by: Meziani, Yani
Published: (2026)
Task-Aligned Self-Supervised Learning for Medical Image Analysis: A Systematic Review and Practical Design Guidelines
by: Wimalasiri, Chathura
Published: (2026)
by: Wimalasiri, Chathura
Published: (2026)
Robust Noise Attenuation via Adaptive Pooling of Transformer Outputs
by: Brothers, Greyson
Published: (2025)
by: Brothers, Greyson
Published: (2025)
Enhancing Low-Altitude Airspace Security: MLLM-Enabled UAV Intent Recognition
by: Lei, Guangyu, et al.
Published: (2025)
by: Lei, Guangyu, et al.
Published: (2025)
APT: Adaptive Personalized Training for Diffusion Models with Limited Data
by: Chae, JungWoo, et al.
Published: (2025)
by: Chae, JungWoo, et al.
Published: (2025)
Similar Items
-
Perception-Consistency Multimodal Large Language Models Reasoning via Caption-Regularized Policy Optimization
by: Tu, Songjun, et al.
Published: (2025) -
Training for X-Ray Vision: Amodal Segmentation, Amodal Content Completion, and View-Invariant Object Representation from Multi-Camera Video
by: Moore, Alexander, et al.
Published: (2025) -
CaLoRAify: Calorie Estimation with Visual-Text Pairing and LoRA-Driven Visual Language Models
by: Yao, Dongyu, et al.
Published: (2024) -
Unpacking Hateful Memes: Presupposed Context and False Claims
by: Cai, Weibin, et al.
Published: (2025) -
GLL: A Differentiable Graph Learning Layer for Neural Networks
by: Brown, Jason, et al.
Published: (2024)