Saved in:
| Main Authors: | Yang, Jian, Guo, Shawn, Jing, Lin, Zhang, Wei, Liu, Aishan, Hao, Chuan, Li, Zhoujun, Zhao, Wayne Xin, Liu, Xianglong, Lv, Weifeng, Dai, Bryan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.13472 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CodeSimpleQA: Scaling Factuality in Code Large Language Models
by: Yang, Jian, et al.
Published: (2025)
by: Yang, Jian, et al.
Published: (2025)
M2G-Eval: Enhancing and Evaluating Multi-granularity Multilingual Code Generation
by: Xu, Fanglin, et al.
Published: (2025)
by: Xu, Fanglin, et al.
Published: (2025)
InCoder-32B: Code Foundation Model for Industrial Scenarios
by: Yang, Jian, et al.
Published: (2026)
by: Yang, Jian, et al.
Published: (2026)
InCoder-32B-Thinking: Industrial Code World Model for Thinking
by: Yang, Jian, et al.
Published: (2026)
by: Yang, Jian, et al.
Published: (2026)
UCoder: Unsupervised Code Generation by Internal Probing of Large Language Models
by: Wu, Jiajun, et al.
Published: (2025)
by: Wu, Jiajun, et al.
Published: (2025)
CodeChemist: Functional Knowledge Transfer for Low-Resource Code Generation via Test-Time Scaling
by: Wang, Kaixin, et al.
Published: (2025)
by: Wang, Kaixin, et al.
Published: (2025)
SLMQuant:Benchmarking Small Language Model Quantization for Practical Deployment
by: Wang, Jiacheng, et al.
Published: (2025)
by: Wang, Jiacheng, et al.
Published: (2025)
Fairness Mediator: Neutralize Stereotype Associations to Mitigate Bias in Large Language Models
by: Xiao, Yisong, et al.
Published: (2025)
by: Xiao, Yisong, et al.
Published: (2025)
UniCoder: Scaling Code Large Language Model via Universal Code
by: Sun, Tao, et al.
Published: (2024)
by: Sun, Tao, et al.
Published: (2024)
Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak Attacks
by: Ying, Zonghao, et al.
Published: (2024)
by: Ying, Zonghao, et al.
Published: (2024)
Latent Imitator: Generating Natural Individual Discriminatory Instances for Black-Box Fairness Testing
by: Xiao, Yisong, et al.
Published: (2023)
by: Xiao, Yisong, et al.
Published: (2023)
From Context to Intent: Reasoning-Guided Function-Level Code Completion
by: Li, Yanzhou, et al.
Published: (2025)
by: Li, Yanzhou, et al.
Published: (2025)
Detoxifying Large Language Models via Autoregressive Reward Guided Representation Editing
by: Xiao, Yisong, et al.
Published: (2025)
by: Xiao, Yisong, et al.
Published: (2025)
Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models
by: Ying, Zonghao, et al.
Published: (2026)
by: Ying, Zonghao, et al.
Published: (2026)
Towards Robust Physical-world Backdoor Attacks on Lane Detection
by: Zhang, Xinwei, et al.
Published: (2024)
by: Zhang, Xinwei, et al.
Published: (2024)
Uncovering Strategic Egoism Behaviors in Large Language Models
by: Zhang, Yaoyuan, et al.
Published: (2025)
by: Zhang, Yaoyuan, et al.
Published: (2025)
SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models
by: Ying, Zonghao, et al.
Published: (2024)
by: Ying, Zonghao, et al.
Published: (2024)
Software Development Life Cycle Perspective: A Survey of Benchmarks for Code Large Language Models and Agents
by: Wang, Kaixin, et al.
Published: (2025)
by: Wang, Kaixin, et al.
Published: (2025)
Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt
by: Ying, Zonghao, et al.
Published: (2024)
by: Ying, Zonghao, et al.
Published: (2024)
Context as a Tool: Context Management for Long-Horizon SWE-Agents
by: Liu, Shukai, et al.
Published: (2025)
by: Liu, Shukai, et al.
Published: (2025)
CogMorph: Cognitive Morphing Attacks for Text-to-Image Models
by: Jing, Zonglei, et al.
Published: (2025)
by: Jing, Zonglei, et al.
Published: (2025)
PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation
by: Jing, Zonglei, et al.
Published: (2025)
by: Jing, Zonglei, et al.
Published: (2025)
Evolving Deception: When Agents Evolve, Deception Wins
by: Ying, Zonghao, et al.
Published: (2026)
by: Ying, Zonghao, et al.
Published: (2026)
Code2Bench: Scaling Source and Rigor for Dynamic Benchmark Construction
by: Zhang, Zhe, et al.
Published: (2025)
by: Zhang, Zhe, et al.
Published: (2025)
LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment
by: Yang, Ge, et al.
Published: (2024)
by: Yang, Ge, et al.
Published: (2024)
AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization
by: Ying, Zonghao, et al.
Published: (2026)
by: Ying, Zonghao, et al.
Published: (2026)
Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization
by: Bai, Fei, et al.
Published: (2026)
by: Bai, Fei, et al.
Published: (2026)
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing
by: Xiao, Yisong, et al.
Published: (2024)
by: Xiao, Yisong, et al.
Published: (2024)
Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game
by: Li, Simin, et al.
Published: (2023)
by: Li, Simin, et al.
Published: (2023)
Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving
by: Wang, Lu, et al.
Published: (2025)
by: Wang, Lu, et al.
Published: (2025)
Investigating Training Data Detection in AI Coders
by: Li, Tianlin, et al.
Published: (2025)
by: Li, Tianlin, et al.
Published: (2025)
PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models
by: Wnag, Zining, et al.
Published: (2024)
by: Wnag, Zining, et al.
Published: (2024)
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models
by: Ying, Zonghao, et al.
Published: (2025)
by: Ying, Zonghao, et al.
Published: (2025)
Adversarial Generation and Collaborative Evolution of Safety-Critical Scenarios for Autonomous Vehicles
by: Liu, Jiangfan, et al.
Published: (2025)
by: Liu, Jiangfan, et al.
Published: (2025)
How Emotion Shapes the Behavior of LLMs and Agents: A Mechanistic Study
by: Sun, Moran, et al.
Published: (2026)
by: Sun, Moran, et al.
Published: (2026)
Exploring Semantic-constrained Adversarial Example with Instruction Uncertainty Reduction
by: Hu, Jin, et al.
Published: (2025)
by: Hu, Jin, et al.
Published: (2025)
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation
by: Yan, Kaiwen, et al.
Published: (2025)
by: Yan, Kaiwen, et al.
Published: (2025)
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
by: Zhou, Fan, et al.
Published: (2024)
by: Zhou, Fan, et al.
Published: (2024)
SPARK: Jailbreaking T2V Models by Synergistically Prompting Auditory and Recontextualized Knowledge
by: Ying, Zonghao, et al.
Published: (2025)
by: Ying, Zonghao, et al.
Published: (2025)
Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning
by: Li, Simin, et al.
Published: (2025)
by: Li, Simin, et al.
Published: (2025)
Similar Items
-
CodeSimpleQA: Scaling Factuality in Code Large Language Models
by: Yang, Jian, et al.
Published: (2025) -
M2G-Eval: Enhancing and Evaluating Multi-granularity Multilingual Code Generation
by: Xu, Fanglin, et al.
Published: (2025) -
InCoder-32B: Code Foundation Model for Industrial Scenarios
by: Yang, Jian, et al.
Published: (2026) -
InCoder-32B-Thinking: Industrial Code World Model for Thinking
by: Yang, Jian, et al.
Published: (2026) -
UCoder: Unsupervised Code Generation by Internal Probing of Large Language Models
by: Wu, Jiajun, et al.
Published: (2025)