Saved in:
| Main Authors: | Rawal, Ruchit, Chiang, Jeffrey Yang Fan, Shen, Chihao, Tian, Jeffery Siyuan, Mahajan, Aastha, Goldstein, Tom, Chen, Yizheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.13859 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hints Help Finding and Fixing Bugs Differently in Python and Text-based Program Representations
by: Rawal, Ruchit, et al.
Published: (2024)
by: Rawal, Ruchit, et al.
Published: (2024)
Locus: Agentic Predicate Synthesis for Directed Fuzzing
by: Zhu, Jie, et al.
Published: (2025)
by: Zhu, Jie, et al.
Published: (2025)
Constrained Decoding for Secure Code Generation
by: Fu, Yanjun, et al.
Published: (2024)
by: Fu, Yanjun, et al.
Published: (2024)
A Hierarchical and Evolvable Benchmark for Fine-Grained Code Instruction Following with Multi-Turn Feedback
by: Duan, Guoliang, et al.
Published: (2025)
by: Duan, Guoliang, et al.
Published: (2025)
Codexity: Secure AI-assisted Code Generation
by: Kim, Sung Yong, et al.
Published: (2024)
by: Kim, Sung Yong, et al.
Published: (2024)
Decoding Human-LLM Collaboration in Coding: An Empirical Study of Multi-Turn Conversations in the Wild
by: Zhang, Binquan, et al.
Published: (2025)
by: Zhang, Binquan, et al.
Published: (2025)
An Empirical Study of Interaction Smells in Multi-Turn Human-LLM Collaborative Code Generation
by: Zhang, Binquan, et al.
Published: (2026)
by: Zhang, Binquan, et al.
Published: (2026)
CodeMirage: A Multi-Lingual Benchmark for Detecting AI-Generated and Paraphrased Source Code from Production-Level LLMs
by: Guo, Hanxi, et al.
Published: (2025)
by: Guo, Hanxi, et al.
Published: (2025)
Secure-Instruct: An Automated Pipeline for Synthesizing Instruction-Tuning Datasets Using LLMs for Secure Code Generation
by: Li, Junjie, et al.
Published: (2025)
by: Li, Junjie, et al.
Published: (2025)
CFCEval: Evaluating Security Aspects in Code Generated by Large Language Models
by: Cheng, Cheng, et al.
Published: (2025)
by: Cheng, Cheng, et al.
Published: (2025)
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models
by: Zheng, Jiasheng, et al.
Published: (2024)
by: Zheng, Jiasheng, et al.
Published: (2024)
DUALGUAGE: Automated Joint Security-Functionality Benchmarking for Secure Code Generation
by: Pathak, Abhijeet, et al.
Published: (2025)
by: Pathak, Abhijeet, et al.
Published: (2025)
Benchmarking and Revisiting Code Generation Assessment: A Mutation-Based Approach
by: Wang, Longtian, et al.
Published: (2025)
by: Wang, Longtian, et al.
Published: (2025)
Show and Tell: Prompt Strategies for Style Control in Multi-Turn LLM Code Generation
by: Bohr, Jeremiah
Published: (2025)
by: Bohr, Jeremiah
Published: (2025)
COFFE: A Code Efficiency Benchmark for Code Generation
by: Peng, Yun, et al.
Published: (2025)
by: Peng, Yun, et al.
Published: (2025)
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
by: Lian, Keke, et al.
Published: (2025)
by: Lian, Keke, et al.
Published: (2025)
Turning the Tide: Repository-based Code Reflection
by: Zhang, Wei, et al.
Published: (2025)
by: Zhang, Wei, et al.
Published: (2025)
Raising AI Ethics Awareness through an AI Ethics Quiz for Software Practitioners
by: Pant, Aastha, et al.
Published: (2024)
by: Pant, Aastha, et al.
Published: (2024)
Benchmarking Multimodal LLMs on Code Generation for Complex Interactive Webpages
by: Wu, Fan, et al.
Published: (2026)
by: Wu, Fan, et al.
Published: (2026)
Requirements Development and Formalization for Reliable Code Generation: A Multi-Agent Vision
by: Lu, Xu, et al.
Published: (2025)
by: Lu, Xu, et al.
Published: (2025)
Exploring the Security Threats of Retriever Backdoors in Retrieval-Augmented Code Generation
by: Li, Tian, et al.
Published: (2025)
by: Li, Tian, et al.
Published: (2025)
CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models
by: Yu, Hao, et al.
Published: (2023)
by: Yu, Hao, et al.
Published: (2023)
Detect Repair Verify for Securing LLM Generated Code: A Multi-Language Empirical Study
by: Cheng, Cheng
Published: (2026)
by: Cheng, Cheng
Published: (2026)
CodeCSE: A Simple Multilingual Model for Code and Comment Sentence Embeddings
by: Varkey, Anthony, et al.
Published: (2024)
by: Varkey, Anthony, et al.
Published: (2024)
An Exploratory Study on Fine-Tuning Large Language Models for Secure Code Generation
by: Li, Junjie, et al.
Published: (2024)
by: Li, Junjie, et al.
Published: (2024)
RedCoder: Automated Multi-Turn Red Teaming for Code LLMs
by: Mo, Wenjie Jacky, et al.
Published: (2025)
by: Mo, Wenjie Jacky, et al.
Published: (2025)
Ethics in AI through the Practitioner's View: A Grounded Theory Literature Review
by: Pant, Aastha, et al.
Published: (2022)
by: Pant, Aastha, et al.
Published: (2022)
What do AI/ML practitioners think about AI/ML bias?
by: Pant, Aastha, et al.
Published: (2024)
by: Pant, Aastha, et al.
Published: (2024)
Automating the Correctness Assessment of AI-generated Code for Security Contexts
by: Cotroneo, Domenico, et al.
Published: (2023)
by: Cotroneo, Domenico, et al.
Published: (2023)
Towards Better Correctness and Efficiency in Code Generation
by: Feng, Yunlong, et al.
Published: (2025)
by: Feng, Yunlong, et al.
Published: (2025)
Changes in Coding Behavior and Performance Since the Introduction of LLMs
by: Zhang, Yufan, et al.
Published: (2026)
by: Zhang, Yufan, et al.
Published: (2026)
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation
by: Zhu, Qiming, et al.
Published: (2024)
by: Zhu, Qiming, et al.
Published: (2024)
Towards Secure Logging: Characterizing and Benchmarking Logging Code Security Issues with LLMs
by: Yuan, He Yang, et al.
Published: (2026)
by: Yuan, He Yang, et al.
Published: (2026)
RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories
by: Wang, Yanlin, et al.
Published: (2026)
by: Wang, Yanlin, et al.
Published: (2026)
CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation
by: Wang, Sizhe, et al.
Published: (2025)
by: Wang, Sizhe, et al.
Published: (2025)
CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding & Reasoning Capabilities of CodeLLMs
by: Manh, Dung Nguyen, et al.
Published: (2024)
by: Manh, Dung Nguyen, et al.
Published: (2024)
Structured Safety Auditing for Balancing Code Correctness and Content Safety in LLM-Generated Code
by: Tan, Honghao, et al.
Published: (2026)
by: Tan, Honghao, et al.
Published: (2026)
An Empirical Security Evaluation of LLM-Generated Cryptographic Rust Code
by: Elsayed, Mohamed, et al.
Published: (2026)
by: Elsayed, Mohamed, et al.
Published: (2026)
Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models
by: Bruni, Marc, et al.
Published: (2025)
by: Bruni, Marc, et al.
Published: (2025)
SweRank+: Multilingual, Multi-Turn Code Ranking for Software Issue Localization
by: Reddy, Revanth Gangi, et al.
Published: (2025)
by: Reddy, Revanth Gangi, et al.
Published: (2025)
Similar Items
-
Hints Help Finding and Fixing Bugs Differently in Python and Text-based Program Representations
by: Rawal, Ruchit, et al.
Published: (2024) -
Locus: Agentic Predicate Synthesis for Directed Fuzzing
by: Zhu, Jie, et al.
Published: (2025) -
Constrained Decoding for Secure Code Generation
by: Fu, Yanjun, et al.
Published: (2024) -
A Hierarchical and Evolvable Benchmark for Fine-Grained Code Instruction Following with Multi-Turn Feedback
by: Duan, Guoliang, et al.
Published: (2025) -
Codexity: Secure AI-assisted Code Generation
by: Kim, Sung Yong, et al.
Published: (2024)