Saved in:
| Main Authors: | Bhusal, Jatin, Mahatha, Nancy, Acharya, Aayush, Regmi, Raunak |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.26607 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback
by: Jana, Prithwish, et al.
Published: (2026)
by: Jana, Prithwish, et al.
Published: (2026)
Automated Bug Triaging using Instruction-Tuned Large Language Models
by: Kiashemshaki, Kiana, et al.
Published: (2025)
by: Kiashemshaki, Kiana, et al.
Published: (2025)
LLMs as Architects and Critics for Multi-Source Opinion Summarization
by: Attri, Anuj, et al.
Published: (2025)
by: Attri, Anuj, et al.
Published: (2025)
PARNESS: A Paper Harness for End-to-End Automated Scientific Research with Dynamic Workflows, Full-Text Indexing, and Cross-Run Knowledge Accumulation
by: Wang, Yuchen, et al.
Published: (2026)
by: Wang, Yuchen, et al.
Published: (2026)
CIDR: A Large-Scale Industrial Source Code Dataset for Software Engineering Research
by: Savenkov, Vladislav
Published: (2026)
by: Savenkov, Vladislav
Published: (2026)
ContractBench: Can LLM Agents Preserve Observation Contracts?
by: Wang, Jicheng, et al.
Published: (2026)
by: Wang, Jicheng, et al.
Published: (2026)
IntelliCode: A Multi-Agent LLM Tutoring System with Centralized Learner Modeling
by: David, Jones, et al.
Published: (2025)
by: David, Jones, et al.
Published: (2025)
Why We Feel What We Feel: Joint Detection of Emotions and Their Opinion Triggers in E-commerce
by: Attri, Arnav, et al.
Published: (2025)
by: Attri, Arnav, et al.
Published: (2025)
UrduBench: An Urdu Reasoning Benchmark using Contextually Ensembled Translations with Human-in-the-Loop
by: Shafique, Muhammad Ali, et al.
Published: (2026)
by: Shafique, Muhammad Ali, et al.
Published: (2026)
LLMCup: Ranking-Enhanced Comment Updating with LLMs
by: Ge, Hua, et al.
Published: (2025)
by: Ge, Hua, et al.
Published: (2025)
Tool-Genesis: A Task-Driven Tool Creation Benchmark for Self-Evolving Language Agent
by: Xia, Bowei, et al.
Published: (2026)
by: Xia, Bowei, et al.
Published: (2026)
Tokenization Standards for Linguistic Integrity: Turkish as a Benchmark
by: Bayram, M. Ali, et al.
Published: (2025)
by: Bayram, M. Ali, et al.
Published: (2025)
MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization
by: Tanjim, Md Mehrab, et al.
Published: (2026)
by: Tanjim, Md Mehrab, et al.
Published: (2026)
Software Defined Vehicle Code Generation: A Few-Shot Prompting Approach
by: Nguyen, Quang-Dung, et al.
Published: (2025)
by: Nguyen, Quang-Dung, et al.
Published: (2025)
The Single-File Test: A Longitudinal Public-Interface Evaluation of First-Output LLM Web Generation with Social Reach Tracking
by: Palacios, Diego Cabezas
Published: (2026)
by: Palacios, Diego Cabezas
Published: (2026)
AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair
by: Hu, Yuelin, et al.
Published: (2026)
by: Hu, Yuelin, et al.
Published: (2026)
EyeLayer: Integrating Human Attention Patterns into LLM-Based Code Summarization
by: Zhang, Jiahao, et al.
Published: (2026)
by: Zhang, Jiahao, et al.
Published: (2026)
Predictive Analytics for Collaborators Answers, Code Quality, and Dropout on Stack Overflow
by: Zolduoarrati, Elijah, et al.
Published: (2025)
by: Zolduoarrati, Elijah, et al.
Published: (2025)
The Syntactic Acceptability Dataset (Preview): A Resource for Machine Learning and Linguistic Analysis of English
by: Juzek, Tom S
Published: (2025)
by: Juzek, Tom S
Published: (2025)
Tokens with Meaning: A Hybrid Tokenization Approach for Turkish
by: Bayram, M. Ali, et al.
Published: (2025)
by: Bayram, M. Ali, et al.
Published: (2025)
Do Small Language Models Know When They're Wrong? Confidence-Based Cascade Scoring for Educational Assessment
by: Burleigh, Tyler
Published: (2026)
by: Burleigh, Tyler
Published: (2026)
SiliconMind-V1: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation
by: Chen, Mu-Chi, et al.
Published: (2026)
by: Chen, Mu-Chi, et al.
Published: (2026)
Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures
by: Oskooei, Amirkia Rafiei, et al.
Published: (2025)
by: Oskooei, Amirkia Rafiei, et al.
Published: (2025)
5C Prompt Contracts: A Minimalist, Creative-Friendly, Token-Efficient Design Framework for Individual and SME LLM Usage
by: Ari, Ugur
Published: (2025)
by: Ari, Ugur
Published: (2025)
Generative AI and the Transformation of Software Development Practices
by: Acharya, Vivek
Published: (2025)
by: Acharya, Vivek
Published: (2025)
Thinking Machines: Mathematical Reasoning in the Age of LLMs
by: Asperti, Andrea, et al.
Published: (2025)
by: Asperti, Andrea, et al.
Published: (2025)
AgentAtlas: Beyond Outcome Leaderboards for LLM Agents
by: Mazaheri, Parsa, et al.
Published: (2026)
by: Mazaheri, Parsa, et al.
Published: (2026)
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
by: Agrawal, Lakshya A, et al.
Published: (2025)
by: Agrawal, Lakshya A, et al.
Published: (2025)
EdgeJury: Cross-Reviewed Small-Model Ensembles for Truthful Question Answering on Serverless Edge Inference
by: Kumar, Aayush
Published: (2025)
by: Kumar, Aayush
Published: (2025)
TRACE: A taxonomy-grounded synthetic dataset for teaching-program generation and session interpretation in Applied Behavior Analysis
by: Kahunla, Festus
Published: (2026)
by: Kahunla, Festus
Published: (2026)
Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback
by: Du, Yishan, et al.
Published: (2025)
by: Du, Yishan, et al.
Published: (2025)
VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs
by: Daneshvar, Seyed Shayan, et al.
Published: (2024)
by: Daneshvar, Seyed Shayan, et al.
Published: (2024)
Can AI Read Between The Lines? Benchmarking LLMs On Financial Nuance
by: Kubica, Dominick, et al.
Published: (2025)
by: Kubica, Dominick, et al.
Published: (2025)
Automated Circuit Interpretation via Probe Prompting
by: Birardi, Giuseppe
Published: (2025)
by: Birardi, Giuseppe
Published: (2025)
IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text
by: Pall, Rajveer Singh
Published: (2026)
by: Pall, Rajveer Singh
Published: (2026)
Low-Resource English-Tigrinya MT: Leveraging Multilingual Models, Custom Tokenizers, and Clean Evaluation Benchmarks
by: Teklehaymanot, Hailay Kidu, et al.
Published: (2025)
by: Teklehaymanot, Hailay Kidu, et al.
Published: (2025)
From Scientific Texts to Verifiable Code: Automating the Process with Transformers
by: Wang, Changjie, et al.
Published: (2025)
by: Wang, Changjie, et al.
Published: (2025)
Entropy-Based Measurement of Value Drift and Alignment Work in Large Language Models
by: Fadli, Samih
Published: (2025)
by: Fadli, Samih
Published: (2025)
Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels
by: Rath, Plawan Kumar, et al.
Published: (2026)
by: Rath, Plawan Kumar, et al.
Published: (2026)
Fact Grounded Attention: Eliminating Hallucination in Large Language Models Through Attention Level Knowledge Integration
by: Gupta, Aayush
Published: (2025)
by: Gupta, Aayush
Published: (2025)
Similar Items
-
TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback
by: Jana, Prithwish, et al.
Published: (2026) -
Automated Bug Triaging using Instruction-Tuned Large Language Models
by: Kiashemshaki, Kiana, et al.
Published: (2025) -
LLMs as Architects and Critics for Multi-Source Opinion Summarization
by: Attri, Anuj, et al.
Published: (2025) -
PARNESS: A Paper Harness for End-to-End Automated Scientific Research with Dynamic Workflows, Full-Text Indexing, and Cross-Run Knowledge Accumulation
by: Wang, Yuchen, et al.
Published: (2026) -
CIDR: A Large-Scale Industrial Source Code Dataset for Software Engineering Research
by: Savenkov, Vladislav
Published: (2026)