Saved in:
| Main Authors: | Guruprasad, Pranav, Chowdhury, Sudipta, Sikka, Harsh, Sharma, Mridul, Lu, Helen, Rivera, Sean, Khurana, Aryan, Ren, Hangliang, Wang, Yangyue |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.11315 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
by: Guruprasad, Pranav, et al.
Published: (2025)
by: Guruprasad, Pranav, et al.
Published: (2025)
An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models
by: Guruprasad, Pranav, et al.
Published: (2025)
by: Guruprasad, Pranav, et al.
Published: (2025)
Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks
by: Guruprasad, Pranav, et al.
Published: (2024)
by: Guruprasad, Pranav, et al.
Published: (2024)
GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models
by: Wang, Yangyue, et al.
Published: (2026)
by: Wang, Yangyue, et al.
Published: (2026)
Beyond Loss Guidance: Using PDE Residuals as Spectral Attention in Diffusion Neural Operators
by: Sawhney, Medha, et al.
Published: (2025)
by: Sawhney, Medha, et al.
Published: (2025)
MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
by: Sharma, Akshat, et al.
Published: (2024)
by: Sharma, Akshat, et al.
Published: (2024)
Development of Pre-Trained Transformer-based Models for the Nepali Language
by: Thapa, Prajwal, et al.
Published: (2024)
by: Thapa, Prajwal, et al.
Published: (2024)
TextAge: A Curated and Diverse Text Dataset for Age Classification
by: Cheekati, Shravan, et al.
Published: (2024)
by: Cheekati, Shravan, et al.
Published: (2024)
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
by: Chen, Yangyi, et al.
Published: (2023)
by: Chen, Yangyi, et al.
Published: (2023)
TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation
by: Monsefi, Amin Karimi, et al.
Published: (2025)
by: Monsefi, Amin Karimi, et al.
Published: (2025)
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
by: Chen, Yangyi, et al.
Published: (2023)
by: Chen, Yangyi, et al.
Published: (2023)
KULCQ: An Unsupervised Keyword-based Utterance Level Clustering Quality Metric
by: Guruprasad, Pranav, et al.
Published: (2024)
by: Guruprasad, Pranav, et al.
Published: (2024)
Grounding Multimodal Large Language Models in Actions
by: Szot, Andrew, et al.
Published: (2024)
by: Szot, Andrew, et al.
Published: (2024)
Uncertainty Quantification of Large Language Models using Approximate Bayesian Computation
by: Sharma, Mridul, et al.
Published: (2025)
by: Sharma, Mridul, et al.
Published: (2025)
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
by: Driess, Danny, et al.
Published: (2025)
by: Driess, Danny, et al.
Published: (2025)
FAST: Efficient Action Tokenization for Vision-Language-Action Models
by: Pertsch, Karl, et al.
Published: (2025)
by: Pertsch, Karl, et al.
Published: (2025)
RE-RFME: Real-Estate RFME Model for customer segmentation
by: Pandey, Anurag Kumar, et al.
Published: (2024)
by: Pandey, Anurag Kumar, et al.
Published: (2024)
Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust
by: Hancock, Asher J., et al.
Published: (2024)
by: Hancock, Asher J., et al.
Published: (2024)
Causal Reflection with Language Models
by: Aryan, Abi, et al.
Published: (2025)
by: Aryan, Abi, et al.
Published: (2025)
$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
by: Intelligence, Physical, et al.
Published: (2025)
by: Intelligence, Physical, et al.
Published: (2025)
Towards Practical World Model-based Reinforcement Learning for Vision-Language-Action Models
by: Zhang, Zhilong, et al.
Published: (2026)
by: Zhang, Zhilong, et al.
Published: (2026)
Confidence Calibration in Vision-Language-Action Models
by: Zollo, Thomas P, et al.
Published: (2025)
by: Zollo, Thomas P, et al.
Published: (2025)
Jill Watson: A Virtual Teaching Assistant powered by ChatGPT
by: Taneja, Karan, et al.
Published: (2024)
by: Taneja, Karan, et al.
Published: (2024)
Advancing Vision-based Human Action Recognition: Exploring Vision-Language CLIP Model for Generalisation in Domain-Independent Tasks
by: Shandilya, Utkarsh, et al.
Published: (2025)
by: Shandilya, Utkarsh, et al.
Published: (2025)
H-Probes: Extracting Hierarchical Structures From Latent Representations of Language Models
by: Dawes, Cutter, et al.
Published: (2026)
by: Dawes, Cutter, et al.
Published: (2026)
Surveying Facial Recognition Models for Diverse Indian Demographics: A Comparative Analysis on LFW and Custom Dataset
by: Pant, Pranav, et al.
Published: (2024)
by: Pant, Pranav, et al.
Published: (2024)
MEM: Multi-Scale Embodied Memory for Vision Language Action Models
by: Torne, Marcel, et al.
Published: (2026)
by: Torne, Marcel, et al.
Published: (2026)
Learning POMDP World Models from Observations with Language-Model Priors
by: Six, Valentin, et al.
Published: (2026)
by: Six, Valentin, et al.
Published: (2026)
APPLV: Adaptive Planner Parameter Learning from Vision-Language-Action Model
by: Lu, Yuanjie, et al.
Published: (2026)
by: Lu, Yuanjie, et al.
Published: (2026)
DebateBench: A Challenging Long Context Reasoning Benchmark For Large Language Models
by: Tiwari, Utkarsh, et al.
Published: (2025)
by: Tiwari, Utkarsh, et al.
Published: (2025)
Characterizing Paraphrase-Induced Failures in Lean 4 Autoformalization
by: Feng, William, et al.
Published: (2026)
by: Feng, William, et al.
Published: (2026)
World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems
by: Li, Runze, et al.
Published: (2026)
by: Li, Runze, et al.
Published: (2026)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control
by: Black, Kevin, et al.
Published: (2024)
by: Black, Kevin, et al.
Published: (2024)
Tactile-VLA: Unlocking Vision-Language-Action Model's Physical Knowledge for Tactile Generalization
by: Huang, Jialei, et al.
Published: (2025)
by: Huang, Jialei, et al.
Published: (2025)
Curriculum-Guided Reinforcement Learning for Synthesizing Gas-Efficient Financial Derivatives Contracts
by: Mridul, Maruf Ahmed, et al.
Published: (2025)
by: Mridul, Maruf Ahmed, et al.
Published: (2025)
A Hierarchical Language Model For Interpretable Graph Reasoning
by: Khurana, Sambhav, et al.
Published: (2024)
by: Khurana, Sambhav, et al.
Published: (2024)
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
by: Chen, Xiaoyu, et al.
Published: (2025)
by: Chen, Xiaoyu, et al.
Published: (2025)
PARSE: LLM Driven Schema Optimization for Reliable Entity Extraction
by: Shrimal, Anubhav, et al.
Published: (2025)
by: Shrimal, Anubhav, et al.
Published: (2025)
The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models
by: Jeong, Daniel P., et al.
Published: (2024)
by: Jeong, Daniel P., et al.
Published: (2024)
Generative Kaleidoscopic Networks
by: Shrivastava, Harsh
Published: (2024)
by: Shrivastava, Harsh
Published: (2024)
Similar Items
-
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
by: Guruprasad, Pranav, et al.
Published: (2025) -
An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models
by: Guruprasad, Pranav, et al.
Published: (2025) -
Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks
by: Guruprasad, Pranav, et al.
Published: (2024) -
GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models
by: Wang, Yangyue, et al.
Published: (2026) -
Beyond Loss Guidance: Using PDE Residuals as Spectral Attention in Diffusion Neural Operators
by: Sawhney, Medha, et al.
Published: (2025)