:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Winterbottom, Thomas, Hudson, G. Thomas, Kluvanec, Daniel, Slack, Dean, Sterling, Jamie, Shentu, Junjie, Xiao, Chenghao, Zhou, Zheming, Moubayed, Noura Al
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Machine Learning 68T45 I.2.6; I.2.10
Online Access:	https://arxiv.org/abs/2405.17450
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Tricks and Plug-ins for Gradient Boosting in Image Classification
by: Fang, Biyi, et al.
Published: (2025)

FT-NCFM: An Influence-Aware Data Distillation Framework for Efficient VLA Models
by: Chen, Kewei, et al.
Published: (2025)

ATAAT: Adaptive Threat-Aware Adversarial Tuning Framework against Backdoor Attacks on Vision-Language-Action Models
by: Chen, Kewei, et al.
Published: (2026)

VA-$π$: Variational Policy Alignment for Pixel-Aware Autoregressive Generation
by: Liao, Xinyao, et al.
Published: (2025)

Unpacking Hateful Memes: Presupposed Context and False Claims
by: Cai, Weibin, et al.
Published: (2025)

CulinaryCut-VLAP: A Vision-Language-Action-Physics Framework for Food Cutting via a Force-Aware Material Point Method
by: Koh, Hyunseo, et al.
Published: (2026)

Perception-Consistency Multimodal Large Language Models Reasoning via Caption-Regularized Policy Optimization
by: Tu, Songjun, et al.
Published: (2025)

Training a Student Expert via Semi-Supervised Foundation Model Distillation
by: Taghavi, Pardis, et al.
Published: (2026)

SemanticFeels: Semantic Labeling during In-Hand Manipulation
by: Khalil, Anas Al Shikh, et al.
Published: (2026)

Physics-informed Variational Autoencoders for Improved Robustness to Environmental Factors of Variation
by: Thoreau, Romain, et al.
Published: (2022)

Predictive Modeling of Maritime Radar Data Using Transformer Architecture
by: Qesaraku, Bjorna, et al.
Published: (2025)

Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity
by: Marian, Vasile, et al.
Published: (2026)

Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling
by: Jung, Seoik, et al.
Published: (2025)

Training for X-Ray Vision: Amodal Segmentation, Amodal Content Completion, and View-Invariant Object Representation from Multi-Camera Video
by: Moore, Alexander, et al.
Published: (2025)

Cooperative Perception: A Resource-Efficient Framework for Multi-Drone 3D Scene Reconstruction Using Federated Diffusion and NeRF
by: Pourmandi, Massoud
Published: (2025)

Visible and Hyperspectral Imaging for Quality Assessment of Milk: Property Characterisation and Identification
by: Martinelli, Massimo, et al.
Published: (2026)

Balanced conic rectified flow
by: Kim, Shin Seong, et al.
Published: (2025)

AUTHENTICATION: Identifying Rare Failure Modes in Autonomous Vehicle Perception Systems using Adversarially Guided Diffusion Models
by: Zarei, Mohammad, et al.
Published: (2025)

Method of UAV Inspection of Photovoltaic Modules Using Thermal and RGB Data Fusion
by: Lysyi, Andrii, et al.
Published: (2025)

Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution
by: Kim, Soyeon, et al.
Published: (2026)

Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution
by: Kim, Soyeon, et al.
Published: (2026)

Supervised Learning Has a Necessary Geometric Blind Spot: Theory, Consequences, and Minimal Repair
by: Rajput, Vishal
Published: (2026)

Rethinking Visual Intelligence: Insights from Video Pretraining
by: Acuaviva, Pablo, et al.
Published: (2025)

Multimodal Generative AI for Story Point Estimation in Software Development
by: Islam, Mohammad Rubyet, et al.
Published: (2025)

Contrastive Consolidation of Top-Down Modulations Achieves Sparsely Supervised Continual Learning
by: Tran, Viet Anh Khoa, et al.
Published: (2025)

IDOL: Instant Photorealistic 3D Human Creation from a Single Image
by: Zhuang, Yiyu, et al.
Published: (2024)

Sat-JEPA-Diff: Bridging Self-Supervised Learning and Generative Diffusion for Remote Sensing
by: Komurcu, Kursat, et al.
Published: (2026)

A Survey on Vision-Language-Action Models for Embodied AI
by: Ma, Yueen, et al.
Published: (2024)

ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation
by: Nauen, Tobias Christian, et al.
Published: (2025)

Akasha 2: Hamiltonian State Space Duality and Visual-Language Joint Embedding Predictive Architectur
by: Meziani, Yani
Published: (2026)

Clarification as Supervision: Reinforcement Learning for Vision-Language Interfaces
by: Gkountouras, John, et al.
Published: (2025)

Evaluating Visual Mathematics in Multimodal LLMs: A Multilingual Benchmark Based on the Kangaroo Tests
by: Sáez, Arnau Igualde, et al.
Published: (2025)

Robust Noise Attenuation via Adaptive Pooling of Transformer Outputs
by: Brothers, Greyson
Published: (2025)

Complex Facial Expression Recognition Using Deep Knowledge Distillation of Basic Features
by: Maiden, Angus, et al.
Published: (2023)

ExpReS-VLA: Specializing Vision-Language-Action Models Through Experience Replay and Retrieval
by: Syed, Shahram Najam, et al.
Published: (2025)

TextTeacher: What Can Language Teach About Images?
by: Nauen, Tobias Christian, et al.
Published: (2026)

TACIT: Transformation-Aware Capturing of Implicit Thought
by: Nobrega, Daniel
Published: (2026)

Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models
by: Popov, Alexander, et al.
Published: (2024)

Salient Concept-Aware Generative Data Augmentation
by: Zhao, Tianchen, et al.
Published: (2025)

Hateful Meme Detection through Context-Sensitive Prompting and Fine-Grained Labeling
by: Ouyang, Rongxin, et al.
Published: (2024)