:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Guruprasad, Pranav, Chowdhury, Sudipta, Sikka, Harsh, Sharma, Mridul, Lu, Helen, Rivera, Sean, Khurana, Aryan, Ren, Hangliang, Wang, Yangyue
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2512.11315
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
by: Guruprasad, Pranav, et al.
Published: (2025)

An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models
by: Guruprasad, Pranav, et al.
Published: (2025)

Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks
by: Guruprasad, Pranav, et al.
Published: (2024)

GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models
by: Wang, Yangyue, et al.
Published: (2026)

Beyond Loss Guidance: Using PDE Residuals as Spectral Attention in Diffusion Neural Operators
by: Sawhney, Medha, et al.
Published: (2025)

MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
by: Sharma, Akshat, et al.
Published: (2024)

Development of Pre-Trained Transformer-based Models for the Nepali Language
by: Thapa, Prajwal, et al.
Published: (2024)

TextAge: A Curated and Diverse Text Dataset for Age Classification
by: Cheekati, Shravan, et al.
Published: (2024)

Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
by: Chen, Yangyi, et al.
Published: (2023)

TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation
by: Monsefi, Amin Karimi, et al.
Published: (2025)

DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
by: Chen, Yangyi, et al.
Published: (2023)

KULCQ: An Unsupervised Keyword-based Utterance Level Clustering Quality Metric
by: Guruprasad, Pranav, et al.
Published: (2024)

Grounding Multimodal Large Language Models in Actions
by: Szot, Andrew, et al.
Published: (2024)

Uncertainty Quantification of Large Language Models using Approximate Bayesian Computation
by: Sharma, Mridul, et al.
Published: (2025)

Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
by: Driess, Danny, et al.
Published: (2025)

FAST: Efficient Action Tokenization for Vision-Language-Action Models
by: Pertsch, Karl, et al.
Published: (2025)

RE-RFME: Real-Estate RFME Model for customer segmentation
by: Pandey, Anurag Kumar, et al.
Published: (2024)

Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust
by: Hancock, Asher J., et al.
Published: (2024)

Causal Reflection with Language Models
by: Aryan, Abi, et al.
Published: (2025)

$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
by: Intelligence, Physical, et al.
Published: (2025)

Towards Practical World Model-based Reinforcement Learning for Vision-Language-Action Models
by: Zhang, Zhilong, et al.
Published: (2026)

Confidence Calibration in Vision-Language-Action Models
by: Zollo, Thomas P, et al.
Published: (2025)

Jill Watson: A Virtual Teaching Assistant powered by ChatGPT
by: Taneja, Karan, et al.
Published: (2024)

Advancing Vision-based Human Action Recognition: Exploring Vision-Language CLIP Model for Generalisation in Domain-Independent Tasks
by: Shandilya, Utkarsh, et al.
Published: (2025)

H-Probes: Extracting Hierarchical Structures From Latent Representations of Language Models
by: Dawes, Cutter, et al.
Published: (2026)

Surveying Facial Recognition Models for Diverse Indian Demographics: A Comparative Analysis on LFW and Custom Dataset
by: Pant, Pranav, et al.
Published: (2024)

MEM: Multi-Scale Embodied Memory for Vision Language Action Models
by: Torne, Marcel, et al.
Published: (2026)

Learning POMDP World Models from Observations with Language-Model Priors
by: Six, Valentin, et al.
Published: (2026)

APPLV: Adaptive Planner Parameter Learning from Vision-Language-Action Model
by: Lu, Yuanjie, et al.
Published: (2026)

DebateBench: A Challenging Long Context Reasoning Benchmark For Large Language Models
by: Tiwari, Utkarsh, et al.
Published: (2025)

Characterizing Paraphrase-Induced Failures in Lean 4 Autoformalization
by: Feng, William, et al.
Published: (2026)

World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems
by: Li, Runze, et al.
Published: (2026)

$π_0$: A Vision-Language-Action Flow Model for General Robot Control
by: Black, Kevin, et al.
Published: (2024)

Tactile-VLA: Unlocking Vision-Language-Action Model's Physical Knowledge for Tactile Generalization
by: Huang, Jialei, et al.
Published: (2025)

Curriculum-Guided Reinforcement Learning for Synthesizing Gas-Efficient Financial Derivatives Contracts
by: Mridul, Maruf Ahmed, et al.
Published: (2025)

A Hierarchical Language Model For Interpretable Graph Reasoning
by: Khurana, Sambhav, et al.
Published: (2024)

villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
by: Chen, Xiaoyu, et al.
Published: (2025)

PARSE: LLM Driven Schema Optimization for Reliable Entity Extraction
by: Shrimal, Anubhav, et al.
Published: (2025)

The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models
by: Jeong, Daniel P., et al.
Published: (2024)

Generative Kaleidoscopic Networks
by: Shrivastava, Harsh
Published: (2024)