Saved in:
| Main Authors: | Yuvraj, Pritish, Devarakonda, Siva |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.18400 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Benchmarking Harmonized Tariff Schedule Classification Models
by: Judy, Bryce
Published: (2024)
by: Judy, Bryce
Published: (2024)
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
by: Bandarkar, Lucas, et al.
Published: (2024)
by: Bandarkar, Lucas, et al.
Published: (2024)
REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry
by: Agrawal, Yuvraj
Published: (2026)
by: Agrawal, Yuvraj
Published: (2026)
Benchmarking and Adapting On-Device LLMs for Clinical Decision Support
by: Munim, Alif, et al.
Published: (2025)
by: Munim, Alif, et al.
Published: (2025)
DistShap: Scalable GNN Explanations with Distributed Shapley Values
by: Akkas, Selahattin, et al.
Published: (2025)
by: Akkas, Selahattin, et al.
Published: (2025)
A Deterministic Agentic Workflow for HS Tariff Classification: Multi-Dimensional Rule Reasoning with Interpretable Decisions
by: Zhang, Yu, et al.
Published: (2026)
by: Zhang, Yu, et al.
Published: (2026)
AdaptEval: A Benchmark for Evaluating Large Language Models on Code Snippet Adaptation
by: Zhang, Tanghaoran, et al.
Published: (2026)
by: Zhang, Tanghaoran, et al.
Published: (2026)
RAIL in the Wild: Operationalizing Responsible AI Evaluation Using Anthropic's Value Dataset
by: Verma, Sumit, et al.
Published: (2025)
by: Verma, Sumit, et al.
Published: (2025)
Can We Make Code Green? Understanding Trade-Offs in LLMs vs. Human Code Optimizations
by: Rani, Pooja, et al.
Published: (2025)
by: Rani, Pooja, et al.
Published: (2025)
Classification-Based Automatic HDL Code Generation Using LLMs
by: Sun, Wenhao, et al.
Published: (2024)
by: Sun, Wenhao, et al.
Published: (2024)
ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code
by: Hua, Tianyu, et al.
Published: (2025)
by: Hua, Tianyu, et al.
Published: (2025)
Hallucination by Code Generation LLMs: Taxonomy, Benchmarks, Mitigation, and Challenges
by: Lee, Yunseo, et al.
Published: (2025)
by: Lee, Yunseo, et al.
Published: (2025)
Benchmarking Multimodal LLMs on Code Generation for Complex Interactive Webpages
by: Wu, Fan, et al.
Published: (2026)
by: Wu, Fan, et al.
Published: (2026)
PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs
by: Yadav, Ankit, et al.
Published: (2024)
by: Yadav, Ankit, et al.
Published: (2024)
Adapting Multilingual LLMs to Low-Resource Languages with Knowledge Graphs via Adapters
by: Gurgurov, Daniil, et al.
Published: (2024)
by: Gurgurov, Daniil, et al.
Published: (2024)
Multimodal Approach for Harmonized System Code Prediction
by: Amel, Otmane, et al.
Published: (2024)
by: Amel, Otmane, et al.
Published: (2024)
When Developer Aid Becomes Security Debt: A Systematic Analysis of Insecure Behaviors in LLM Coding Agents
by: Kozak, Matous, et al.
Published: (2025)
by: Kozak, Matous, et al.
Published: (2025)
ACT: Bridging the Gap in Code Translation through Synthetic Data Generation & Adaptive Training
by: Saxena, Shreya, et al.
Published: (2025)
by: Saxena, Shreya, et al.
Published: (2025)
Adapting LLMs to Time Series Forecasting via Temporal Heterogeneity Modeling and Semantic Alignment
by: Sun, Yanru, et al.
Published: (2025)
by: Sun, Yanru, et al.
Published: (2025)
Benchmarking LLMs for Fine-Grained Code Review with Enriched Context in Practice
by: Hu, Ruida, et al.
Published: (2025)
by: Hu, Ruida, et al.
Published: (2025)
Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering
by: Alebachew, Yoseph Berhanu, et al.
Published: (2026)
by: Alebachew, Yoseph Berhanu, et al.
Published: (2026)
Harmonic LLMs are Trustworthy
by: Kersting, Nicholas S., et al.
Published: (2024)
by: Kersting, Nicholas S., et al.
Published: (2024)
EduAdapt: A Question Answer Benchmark Dataset for Evaluating Grade-Level Adaptability in LLMs
by: Naeem, Numaan, et al.
Published: (2025)
by: Naeem, Numaan, et al.
Published: (2025)
ATLAS: Adaptive Trading with LLM AgentS Through Dynamic Prompt Optimization and Multi-Agent Coordination
by: Papadakis, Charidimos, et al.
Published: (2025)
by: Papadakis, Charidimos, et al.
Published: (2025)
Adapting LLMs for Minimal-edit Grammatical Error Correction
by: Staruch, Ryszard, et al.
Published: (2025)
by: Staruch, Ryszard, et al.
Published: (2025)
Are LLMs Ready for TOON? Benchmarking Structural Correctness-Sustainability Trade-offs in Novel Structured Output Formats
by: Masciari, Elio, et al.
Published: (2026)
by: Masciari, Elio, et al.
Published: (2026)
Smaller = Weaker? Benchmarking Robustness of Quantized LLMs in Code Generation
by: Fang, Sen, et al.
Published: (2025)
by: Fang, Sen, et al.
Published: (2025)
Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs
by: Patel, Harsh, et al.
Published: (2024)
by: Patel, Harsh, et al.
Published: (2024)
Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code
by: Galimzyanov, Timur, et al.
Published: (2024)
by: Galimzyanov, Timur, et al.
Published: (2024)
SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing
by: Quan, Pengrui, et al.
Published: (2024)
by: Quan, Pengrui, et al.
Published: (2024)
Humans and LLMs Diverge on Probabilistic Inferences
by: Kamath, Gaurav, et al.
Published: (2026)
by: Kamath, Gaurav, et al.
Published: (2026)
Feature Selection Empowered BERT for Detection of Hate Speech with Vocabulary Augmentation
by: Desai, Pritish N., et al.
Published: (2025)
by: Desai, Pritish N., et al.
Published: (2025)
EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
by: Zala, Abhay, et al.
Published: (2024)
by: Zala, Abhay, et al.
Published: (2024)
Rectifier: Code Translation with Corrector via LLMs
by: Yin, Xin, et al.
Published: (2024)
by: Yin, Xin, et al.
Published: (2024)
HardSecBench: Benchmarking the Security Awareness of LLMs for Hardware Code Generation
by: Chen, Qirui, et al.
Published: (2026)
by: Chen, Qirui, et al.
Published: (2026)
Validate Your Authority: Benchmarking LLMs on Multi-Label Precedent Treatment Classification
by: Demir, M. Mikail, et al.
Published: (2026)
by: Demir, M. Mikail, et al.
Published: (2026)
Do LLMs Really Adapt to Domains? An Ontology Learning Perspective
by: Mai, Huu Tan, et al.
Published: (2024)
by: Mai, Huu Tan, et al.
Published: (2024)
Quantum Artificial Intelligence for Mission-Critical Systems: Foundations, Architectural Elements, and Future Directions
by: Sai, Siva, et al.
Published: (2025)
by: Sai, Siva, et al.
Published: (2025)
Unsupervised Learning of Harmonic Analysis Based on Neural HSMM with Code Quality Templates
by: Uehara, Yui
Published: (2024)
by: Uehara, Yui
Published: (2024)
SayCoNav: Utilizing Large Language Models for Adaptive Collaboration in Decentralized Multi-Robot Navigation
by: Rajvanshi, Abhinav, et al.
Published: (2025)
by: Rajvanshi, Abhinav, et al.
Published: (2025)
Similar Items
-
Benchmarking Harmonized Tariff Schedule Classification Models
by: Judy, Bryce
Published: (2024) -
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
by: Bandarkar, Lucas, et al.
Published: (2024) -
REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry
by: Agrawal, Yuvraj
Published: (2026) -
Benchmarking and Adapting On-Device LLMs for Clinical Decision Support
by: Munim, Alif, et al.
Published: (2025) -
DistShap: Scalable GNN Explanations with Distributed Shapley Values
by: Akkas, Selahattin, et al.
Published: (2025)