Saved in:
| Main Authors: | Mu, Wenchuan, Lim, Kwan Hui |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.16457 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Reliable Evaluation of Neural Program Repair with Natural Robustness Testing
by: Le-Cong, Thanh, et al.
Published: (2024)
by: Le-Cong, Thanh, et al.
Published: (2024)
Towards Better Correctness and Efficiency in Code Generation
by: Feng, Yunlong, et al.
Published: (2025)
by: Feng, Yunlong, et al.
Published: (2025)
Label-Free Topic-Focused Summarization Using Query Augmentation
by: Mu, Wenchuan, et al.
Published: (2024)
by: Mu, Wenchuan, et al.
Published: (2024)
OLAF: Towards Robust LLM-Based Annotation Framework in Empirical Software Engineering
by: Imran, Mia Mohammad, et al.
Published: (2025)
by: Imran, Mia Mohammad, et al.
Published: (2025)
Benchmarking Harmonized Tariff Schedule Classification Models
by: Judy, Bryce
Published: (2024)
by: Judy, Bryce
Published: (2024)
Model Provenance via Model DNA
by: Mu, Xin, et al.
Published: (2023)
by: Mu, Xin, et al.
Published: (2023)
Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning
by: Zhang, Lingzhe, et al.
Published: (2026)
by: Zhang, Lingzhe, et al.
Published: (2026)
Human-Machine Co-Boosted Bug Report Identification with Mutualistic Neural Active Learning
by: Long, Guoming, et al.
Published: (2026)
by: Long, Guoming, et al.
Published: (2026)
Towards a Classification of Open-Source ML Models and Datasets for Software Engineering
by: González, Alexandra, et al.
Published: (2024)
by: González, Alexandra, et al.
Published: (2024)
Beyond Retrieval: A Multitask Benchmark and Model for Code Search
by: Xue, Siqiao, et al.
Published: (2026)
by: Xue, Siqiao, et al.
Published: (2026)
Conventional Commit Classification using Large Language Models and Prompt Engineering
by: Quadir, H. M. Sazzad, et al.
Published: (2026)
by: Quadir, H. M. Sazzad, et al.
Published: (2026)
CodeFort: Robust Training for Code Generation Models
by: Zhang, Yuhao, et al.
Published: (2024)
by: Zhang, Yuhao, et al.
Published: (2024)
Exploring the Potential of Large Language Models in Fine-Grained Review Comment Classification
by: Nguyen, Linh, et al.
Published: (2025)
by: Nguyen, Linh, et al.
Published: (2025)
DRAGON: Robust Classification for Very Large Collections of Software Repositories
by: Balla, Stefano, et al.
Published: (2026)
by: Balla, Stefano, et al.
Published: (2026)
Towards a General Framework for HTN Modeling with LLMs
by: Puerta-Merino, Israel, et al.
Published: (2025)
by: Puerta-Merino, Israel, et al.
Published: (2025)
Towards Leveraging Large Language Model Summaries for Topic Modeling in Source Code
by: Carissimi, Michele, et al.
Published: (2025)
by: Carissimi, Michele, et al.
Published: (2025)
Unveiling Project-Specific Bias in Neural Code Models
by: Li, Zhiming, et al.
Published: (2022)
by: Li, Zhiming, et al.
Published: (2022)
Precision in Practice: Knowledge Guided Code Summarizing Grounded in Industrial Expectations
by: Li, Jintai, et al.
Published: (2026)
by: Li, Jintai, et al.
Published: (2026)
Towards a Neural Debugger for Python
by: Beck, Maximilian, et al.
Published: (2026)
by: Beck, Maximilian, et al.
Published: (2026)
Enhancing Deployment-Time Predictive Model Robustness for Code Analysis and Optimization
by: Wang, Huanting, et al.
Published: (2024)
by: Wang, Huanting, et al.
Published: (2024)
Post-Incorporating Code Structural Knowledge into Pretrained Models via ICL for Code Translation
by: Du, Yali, et al.
Published: (2025)
by: Du, Yali, et al.
Published: (2025)
Towards a Domain-Specific Modelling Environment for Reinforcement Learning
by: Sinani, Natalie, et al.
Published: (2024)
by: Sinani, Natalie, et al.
Published: (2024)
CodeSSM: Towards State Space Models for Code Understanding
by: Verma, Shweta, et al.
Published: (2025)
by: Verma, Shweta, et al.
Published: (2025)
Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations?
by: Orvalho, Pedro, et al.
Published: (2025)
by: Orvalho, Pedro, et al.
Published: (2025)
Towards Better Code Understanding in Decoder-Only Models with Contrastive Learning
by: Lin, Jiayi, et al.
Published: (2024)
by: Lin, Jiayi, et al.
Published: (2024)
Towards a Digital Twin Modeling Method for Container Terminal Port
by: Hakimi, Faouzi, et al.
Published: (2025)
by: Hakimi, Faouzi, et al.
Published: (2025)
An Experience Report on Regression-Free Repair of Deep Neural Network Model
by: Nakagawa, Takao, et al.
Published: (2025)
by: Nakagawa, Takao, et al.
Published: (2025)
Robustness and Reasoning Fidelity of Large Language Models in Long-Context Code Question Answering
by: Maharaj, Kishan, et al.
Published: (2026)
by: Maharaj, Kishan, et al.
Published: (2026)
A-ProS: Towards Reliable Autonomous Programming Through Multi-Model Feedback
by: Tabassum, Anika, et al.
Published: (2026)
by: Tabassum, Anika, et al.
Published: (2026)
Rethinking Scientific Modeling: Toward Physically Consistent and Simulation-Executable Programmatic Generation
by: Jiang, Yongqing, et al.
Published: (2026)
by: Jiang, Yongqing, et al.
Published: (2026)
WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models
by: Lei, Xinping, et al.
Published: (2026)
by: Lei, Xinping, et al.
Published: (2026)
Towards Advancing Code Generation with Large Language Models: A Research Roadmap
by: Jin, Haolin, et al.
Published: (2025)
by: Jin, Haolin, et al.
Published: (2025)
Toward a Theory of Causation for Interpreting Neural Code Models
by: Palacio, David N., et al.
Published: (2023)
by: Palacio, David N., et al.
Published: (2023)
PACIFIC: a framework for generating benchmarks to check Precise Automatically Checked Instruction Following In Code
by: Dreyfuss, Itay, et al.
Published: (2025)
by: Dreyfuss, Itay, et al.
Published: (2025)
Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization
by: Lange, Robert Tjarko, et al.
Published: (2025)
by: Lange, Robert Tjarko, et al.
Published: (2025)
Toward Automated Validation of Language Model Synthesized Test Cases using Semantic Entropy
by: Taherkhani, Hamed, et al.
Published: (2024)
by: Taherkhani, Hamed, et al.
Published: (2024)
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
by: Xu, Jingxuan, et al.
Published: (2025)
by: Xu, Jingxuan, et al.
Published: (2025)
RobuNFR: Evaluating the Robustness of Large Language Models on Non-Functional Requirements Aware Code Generation
by: Lin, Feng, et al.
Published: (2025)
by: Lin, Feng, et al.
Published: (2025)
When Prompts Go Wrong: Evaluating Code Model Robustness to Ambiguous, Contradictory, and Incomplete Task Descriptions
by: Larbi, Maya, et al.
Published: (2025)
by: Larbi, Maya, et al.
Published: (2025)
The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management
by: Lindenbauer, Tobias, et al.
Published: (2025)
by: Lindenbauer, Tobias, et al.
Published: (2025)
Similar Items
-
Towards Reliable Evaluation of Neural Program Repair with Natural Robustness Testing
by: Le-Cong, Thanh, et al.
Published: (2024) -
Towards Better Correctness and Efficiency in Code Generation
by: Feng, Yunlong, et al.
Published: (2025) -
Label-Free Topic-Focused Summarization Using Query Augmentation
by: Mu, Wenchuan, et al.
Published: (2024) -
OLAF: Towards Robust LLM-Based Annotation Framework in Empirical Software Engineering
by: Imran, Mia Mohammad, et al.
Published: (2025) -
Benchmarking Harmonized Tariff Schedule Classification Models
by: Judy, Bryce
Published: (2024)