:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bhusal, Jatin, Mahatha, Nancy, Acharya, Aayush, Regmi, Raunak
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Computers and Society Software Engineering K.3.1; I.2.7; I.2.6
Online Access:	https://arxiv.org/abs/2604.26607
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback
by: Jana, Prithwish, et al.
Published: (2026)

Automated Bug Triaging using Instruction-Tuned Large Language Models
by: Kiashemshaki, Kiana, et al.
Published: (2025)

LLMs as Architects and Critics for Multi-Source Opinion Summarization
by: Attri, Anuj, et al.
Published: (2025)

PARNESS: A Paper Harness for End-to-End Automated Scientific Research with Dynamic Workflows, Full-Text Indexing, and Cross-Run Knowledge Accumulation
by: Wang, Yuchen, et al.
Published: (2026)

CIDR: A Large-Scale Industrial Source Code Dataset for Software Engineering Research
by: Savenkov, Vladislav
Published: (2026)

ContractBench: Can LLM Agents Preserve Observation Contracts?
by: Wang, Jicheng, et al.
Published: (2026)

IntelliCode: A Multi-Agent LLM Tutoring System with Centralized Learner Modeling
by: David, Jones, et al.
Published: (2025)

Why We Feel What We Feel: Joint Detection of Emotions and Their Opinion Triggers in E-commerce
by: Attri, Arnav, et al.
Published: (2025)

UrduBench: An Urdu Reasoning Benchmark using Contextually Ensembled Translations with Human-in-the-Loop
by: Shafique, Muhammad Ali, et al.
Published: (2026)

LLMCup: Ranking-Enhanced Comment Updating with LLMs
by: Ge, Hua, et al.
Published: (2025)

Tool-Genesis: A Task-Driven Tool Creation Benchmark for Self-Evolving Language Agent
by: Xia, Bowei, et al.
Published: (2026)

Tokenization Standards for Linguistic Integrity: Turkish as a Benchmark
by: Bayram, M. Ali, et al.
Published: (2025)

MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization
by: Tanjim, Md Mehrab, et al.
Published: (2026)

Software Defined Vehicle Code Generation: A Few-Shot Prompting Approach
by: Nguyen, Quang-Dung, et al.
Published: (2025)

The Single-File Test: A Longitudinal Public-Interface Evaluation of First-Output LLM Web Generation with Social Reach Tracking
by: Palacios, Diego Cabezas
Published: (2026)

AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair
by: Hu, Yuelin, et al.
Published: (2026)

EyeLayer: Integrating Human Attention Patterns into LLM-Based Code Summarization
by: Zhang, Jiahao, et al.
Published: (2026)

Predictive Analytics for Collaborators Answers, Code Quality, and Dropout on Stack Overflow
by: Zolduoarrati, Elijah, et al.
Published: (2025)

The Syntactic Acceptability Dataset (Preview): A Resource for Machine Learning and Linguistic Analysis of English
by: Juzek, Tom S
Published: (2025)

Tokens with Meaning: A Hybrid Tokenization Approach for Turkish
by: Bayram, M. Ali, et al.
Published: (2025)

Do Small Language Models Know When They're Wrong? Confidence-Based Cascade Scoring for Educational Assessment
by: Burleigh, Tyler
Published: (2026)

SiliconMind-V1: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation
by: Chen, Mu-Chi, et al.
Published: (2026)

Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures
by: Oskooei, Amirkia Rafiei, et al.
Published: (2025)

5C Prompt Contracts: A Minimalist, Creative-Friendly, Token-Efficient Design Framework for Individual and SME LLM Usage
by: Ari, Ugur
Published: (2025)

Generative AI and the Transformation of Software Development Practices
by: Acharya, Vivek
Published: (2025)

Thinking Machines: Mathematical Reasoning in the Age of LLMs
by: Asperti, Andrea, et al.
Published: (2025)

AgentAtlas: Beyond Outcome Leaderboards for LLM Agents
by: Mazaheri, Parsa, et al.
Published: (2026)

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
by: Agrawal, Lakshya A, et al.
Published: (2025)

EdgeJury: Cross-Reviewed Small-Model Ensembles for Truthful Question Answering on Serverless Edge Inference
by: Kumar, Aayush
Published: (2025)

TRACE: A taxonomy-grounded synthetic dataset for teaching-program generation and session interpretation in Applied Behavior Analysis
by: Kahunla, Festus
Published: (2026)

Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback
by: Du, Yishan, et al.
Published: (2025)

VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs
by: Daneshvar, Seyed Shayan, et al.
Published: (2024)

Can AI Read Between The Lines? Benchmarking LLMs On Financial Nuance
by: Kubica, Dominick, et al.
Published: (2025)

Automated Circuit Interpretation via Probe Prompting
by: Birardi, Giuseppe
Published: (2025)

IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text
by: Pall, Rajveer Singh
Published: (2026)

Low-Resource English-Tigrinya MT: Leveraging Multilingual Models, Custom Tokenizers, and Clean Evaluation Benchmarks
by: Teklehaymanot, Hailay Kidu, et al.
Published: (2025)

From Scientific Texts to Verifiable Code: Automating the Process with Transformers
by: Wang, Changjie, et al.
Published: (2025)

Entropy-Based Measurement of Value Drift and Alignment Work in Large Language Models
by: Fadli, Samih
Published: (2025)

Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels
by: Rath, Plawan Kumar, et al.
Published: (2026)

Fact Grounded Attention: Eliminating Hallucination in Large Language Models Through Attention Level Knowledge Integration
by: Gupta, Aayush
Published: (2025)