:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	A., Eshwar R., Pal, Debnath
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2605.14631
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Force Matching with Relativistic Constraints: A Physics-Inspired Approach to Stable and Efficient Generative Modeling
by: Cao, Yang, et al.
Published: (2025)

ActionParty: Multi-Subject Action Binding in Generative Video Games
by: Pondaven, Alexander, et al.
Published: (2026)

Generative Image as Action Models
by: Shridhar, Mohit, et al.
Published: (2024)

Zero-Shot Action Generalization with Limited Observations
by: Alchihabi, Abdullah, et al.
Published: (2025)

Task-conditioned Ensemble of Expert Models for Continuous Learning
by: Sharma, Renu, et al.
Published: (2025)

Towards Generalizing Temporal Action Segmentation to Unseen Views
by: Bahrami, Emad, et al.
Published: (2025)

Enhancing Generalization in Vision-Language-Action Models by Preserving Pretrained Representations
by: Grover, Shresth, et al.
Published: (2025)

FlyPrompt: Brain-Inspired Random-Expanded Routing with Temporal-Ensemble Experts for General Continual Learning
by: Yan, Hongwei, et al.
Published: (2026)

Generative AI in Depth: A Survey of Recent Advances, Model Variants, and Real-World Applications
by: Yazdani, Shamim, et al.
Published: (2025)

PhiNet v2: A Mask-Free Brain-Inspired Vision Foundation Model from Video
by: Yamada, Makoto, et al.
Published: (2025)

Communication-Inspired Tokenization for Structured Image Representations
by: Davtyan, Aram, et al.
Published: (2026)

MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models
by: Chakrabarty, Sayak, et al.
Published: (2024)

Vision-Language Models Unlock Task-Centric Latent Actions
by: Nikulin, Alexander, et al.
Published: (2026)

Olaf-World: Orienting Latent Actions for Video World Modeling
by: Jiang, Yuxin, et al.
Published: (2026)

Action-Agnostic Point-Level Supervision for Temporal Action Detection
by: Yoshida, Shuhei M., et al.
Published: (2024)

Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality
by: Luo, Ge Ya, et al.
Published: (2024)

One-Frame Calibration with Siamese Network in Facial Action Unit Recognition
by: Feng, Shuangquan, et al.
Published: (2024)

Video Action Differencing
by: Burgess, James, et al.
Published: (2025)

From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors
by: Zhang, Zhengshen, et al.
Published: (2025)

NinA: Normalizing Flows in Action. Training VLA Models with Normalizing Flows
by: Tarasov, Denis, et al.
Published: (2025)

Conformal uncertainty quantification to evaluate predictive fairness of foundation AI model for skin lesion classes across patient demographics
by: Bhattacharyya, Swarnava, et al.
Published: (2025)

Cognitive Science-Inspired Evaluation of Core Capabilities for Object Understanding in AI
by: Rutar, Danaja, et al.
Published: (2025)

Semantically Guided Action Anticipation
by: Diko, Anxhelo, et al.
Published: (2024)

RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model
by: Alomar, Khaled, et al.
Published: (2024)

StarFlow: Generating Structured Workflow Outputs From Sketch Images
by: Bechard, Patrice, et al.
Published: (2025)

Hybrid Training for Vision-Language-Action Models
by: Mazzaglia, Pietro, et al.
Published: (2025)

SkelMamba: A State Space Model for Efficient Skeleton Action Recognition of Neurological Disorders
by: Martinel, Niki, et al.
Published: (2024)

Skeleton-based Action Recognition with Non-linear Dependency Modeling and Hilbert-Schmidt Independence Criterion
by: Yang, Yuheng
Published: (2024)

SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
by: Jiang, Jianping, et al.
Published: (2024)

A Survey on Efficient Vision-Language-Action Models
by: Yu, Zhaoshu, et al.
Published: (2025)

Interactive Post-Training for Vision-Language-Action Models
by: Tan, Shuhan, et al.
Published: (2025)

Compositional Entailment Learning for Hyperbolic Vision-Language Models
by: Pal, Avik, et al.
Published: (2024)

Feature Hallucination for Self-supervised Action Recognition
by: Wang, Lei, et al.
Published: (2025)

Evolving Skeletons: Motion Dynamics in Action Recognition
by: Qiu, Jushang, et al.
Published: (2025)

Semantically Guided Representation Learning For Action Anticipation
by: Diko, Anxhelo, et al.
Published: (2024)

Fly-CL: A Fly-Inspired Framework for Enhancing Efficient Decorrelation and Reduced Training Time in Pre-trained Model-based Continual Representation Learning
by: Zou, Heming, et al.
Published: (2025)

AdaWorld: Learning Adaptable World Models with Latent Actions
by: Gao, Shenyuan, et al.
Published: (2025)

Grounding Video Models to Actions through Goal Conditioned Exploration
by: Luo, Yunhao, et al.
Published: (2024)

Latent Action Learning Requires Supervision in the Presence of Distractors
by: Nikulin, Alexander, et al.
Published: (2025)

VISAGE: Video Synthesis using Action Graphs for Surgery
by: Yeganeh, Yousef, et al.
Published: (2024)