:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zalmanovici, Marcel, Raz, Orna, Farchi, Eitan, Freund, Iftach
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2407.19772
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PACIFIC: a framework for generating benchmarks to check Precise Automatically Checked Instruction Following In Code
by: Dreyfuss, Itay, et al.
Published: (2025)

Using Combinatorial Optimization to Design a High quality LLM Solution
by: Ackerman, Samuel, et al.
Published: (2024)

Evaluating perturbation robustness of generative systems that use COBOL code inputs
by: Ackerman, Samuel, et al.
Published: (2025)

How Safe is Your Safety Metric? Automatic Concatenation Tests for Metric Reliability
by: Fandina, Ora Nova, et al.
Published: (2024)

Automatic Generation of Benchmarks and Reliable LLM Judgment for Code Tasks
by: Farchi, Eitan, et al.
Published: (2024)

Exploring Straightforward Conversational Red-Teaming
by: Kour, George, et al.
Published: (2024)

Automated Validation of LLM-based Evaluators for Software Engineering Artifacts
by: Fandina, Ora Nova, et al.
Published: (2025)

Vintage Code, Modern Judges: Meta-Validation in Low Data Regimes
by: Fandina, Ora Nova, et al.
Published: (2025)

Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding
by: Nevo, Ziv, et al.
Published: (2025)

Statistical multi-metric evaluation and visualization of LLM system predictive performance
by: Ackerman, Samuel, et al.
Published: (2025)

Enhancing Formal Software Specification with Artificial Intelligence
by: Nassar, Antonio Abu, et al.
Published: (2026)

An Agent-Based Framework for the Automatic Validation of Mathematical Optimization Models
by: Zadorojniy, Alexander, et al.
Published: (2025)

Technique to Baseline QE Artefact Generation Aligned to Quality Metrics
by: Farchi, Eitan, et al.
Published: (2025)

A Practical Approach to Combinatorial Test Design
by: Farchi, Eitan, et al.
Published: (2024)

Beyond Blind Spots: Analytic Hints for Mitigating LLM-Based Evaluation Pitfalls
by: Fandina, Ora Nova, et al.
Published: (2025)

Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
by: Achintalwar, Swapnaja, et al.
Published: (2024)

LaajMeter: A Framework for LaaJ Evaluation
by: Ackerman, Samuel, et al.
Published: (2025)

TransformLLM: Adapting Large Language Models via LLM-Transformed Reading Comprehension Text
by: Arbel, Iftach, et al.
Published: (2024)

Unseen Horizons: Unveiling the Real Capability of LLM Code Generation Beyond the Familiar
by: Zhang, Yuanliang, et al.
Published: (2024)

Effective LLM-Driven Code Generation with Pythoness
by: Levin, Kyla H., et al.
Published: (2025)

CONTESTS: a Framework for Consistency Testing of Span Probabilities in Language Models
by: Wagner, Eitan, et al.
Published: (2024)

To See the Unseen: on the Generalization Ability of Transformers in Symbolic Reasoning
by: Lazić, Nevena, et al.
Published: (2026)

Effective Technical Reviews
by: Ballentine, Scott, et al.
Published: (2024)

Quality Engineering for Agile and DevOps on the Cloud and Edge
by: Farchi, Eitan, et al.
Published: (2023)

Klear-CodeTest: Scalable Test Case Generation for Code Reinforcement Learning
by: Fu, Jia, et al.
Published: (2025)

Instruction Diversity Drives Generalization To Unseen Tasks
by: Zhang, Dylan, et al.
Published: (2024)

Test-Driven Development for Code Generation
by: Mathews, Noble Saji, et al.
Published: (2024)

S*: Test Time Scaling for Code Generation
by: Li, Dacheng, et al.
Published: (2025)

Generalized Coverage Criteria for Combinatorial Sequence Testing
by: Elyasaf, Achiya, et al.
Published: (2022)

Budget Allocation Policies for Real-Time Multi-Agent Path Finding
by: Beck, Raz, et al.
Published: (2025)

Toward Generalizing Visual Brain Decoding to Unseen Subjects
by: Kong, Xiangtao, et al.
Published: (2024)

Beyond Memorization: Testing LLM Reasoning on Unseen Theory of Computation Tasks
by: Shelat, Shlok, et al.
Published: (2026)

Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning
by: Bar-Lev, Daniella, et al.
Published: (2021)

Harnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varieties
by: Kim, Jinju, et al.
Published: (2026)

Backdoors in Conditional Diffusion: Threats to Responsible Synthetic Data Pipelines
by: Lapid, Raz, et al.
Published: (2025)

A Deep Inverse-Mapping Model for a Flapping Robotic Wing
by: Sharvit, Hadar, et al.
Published: (2025)

Revisit Self-Debugging with Self-Generated Tests for Code Generation
by: Chen, Xiancai, et al.
Published: (2025)

Express Your Doubts -- Probabilistic World Modeling Should not be Based on Token logprobs
by: Wagner, Eitan, et al.
Published: (2025)

Tests as Prompt: A Test-Driven-Development Benchmark for LLM Code Generation
by: Cui, Yi
Published: (2025)

Bias Testing and Mitigation in LLM-based Code Generation
by: Huang, Dong, et al.
Published: (2023)