Saved in:
| Main Authors: | Berger, Uri, Baumel, Tal, Stanovsky, Gabriel |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.13274 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games
by: Eckhaus, Niv, et al.
Published: (2025)
by: Eckhaus, Niv, et al.
Published: (2025)
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
by: Berger, Uri, et al.
Published: (2024)
by: Berger, Uri, et al.
Published: (2024)
Improving Image Captioning by Mimicking Human Reformulation Feedback at Inference-time
by: Berger, Uri, et al.
Published: (2025)
by: Berger, Uri, et al.
Published: (2025)
SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction
by: Neuberger, Shlomo, et al.
Published: (2024)
by: Neuberger, Shlomo, et al.
Published: (2024)
Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
by: Itzhak, Itay, et al.
Published: (2025)
by: Itzhak, Itay, et al.
Published: (2025)
The State and Fate of Summarization Datasets: A Survey
by: Dahan, Noam, et al.
Published: (2024)
by: Dahan, Noam, et al.
Published: (2024)
Comparing Humans and Models on a Similar Scale: Towards Cognitive Gender Bias Evaluation in Coreference Resolution
by: Lior, Gili, et al.
Published: (2023)
by: Lior, Gili, et al.
Published: (2023)
Do Zombies Understand? A Choose-Your-Own-Adventure Exploration of Machine Cognition
by: Goldstein, Ariel, et al.
Published: (2024)
by: Goldstein, Ariel, et al.
Published: (2024)
Looking Beyond The Top-1: Transformers Determine Top Tokens In Order
by: Lioubashevski, Daria, et al.
Published: (2024)
by: Lioubashevski, Daria, et al.
Published: (2024)
Multilingual Large Language Models and Curse of Multilinguality
by: Gurgurov, Daniil, et al.
Published: (2024)
by: Gurgurov, Daniil, et al.
Published: (2024)
Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction
by: Lior, Gili, et al.
Published: (2024)
by: Lior, Gili, et al.
Published: (2024)
Leveraging Digitized Newspapers to Collect Summarization Data in Low-Resource Languages
by: Dahan, Noam, et al.
Published: (2025)
by: Dahan, Noam, et al.
Published: (2025)
Beyond Memorization: Distinguishing between Reductive and Epistemic Reasoning in LLMs using Classic Logic Puzzles
by: Gabay, Adi, et al.
Published: (2026)
by: Gabay, Adi, et al.
Published: (2026)
Controllable Synthetic Clinical Note Generation with Privacy Guarantees
by: Baumel, Tal, et al.
Published: (2024)
by: Baumel, Tal, et al.
Published: (2024)
Comparing the Framing Effect in Humans and LLMs on Naturally Occurring Texts
by: Lior, Gili, et al.
Published: (2025)
by: Lior, Gili, et al.
Published: (2025)
PromptSuite: A Task-Agnostic Framework for Multi-Prompt Generation
by: Habba, Eliya, et al.
Published: (2025)
by: Habba, Eliya, et al.
Published: (2025)
Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
by: Iluz, Bar, et al.
Published: (2024)
by: Iluz, Bar, et al.
Published: (2024)
PRISM: PRIor from corpus Statistics for topic Modeling
by: Ishon, Tal, et al.
Published: (2026)
by: Ishon, Tal, et al.
Published: (2026)
Cross-Lingual and Cross-Cultural Variation in Image Descriptions
by: Berger, Uri, et al.
Published: (2024)
by: Berger, Uri, et al.
Published: (2024)
ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments
by: Lior, Gili, et al.
Published: (2025)
by: Lior, Gili, et al.
Published: (2025)
Can LLMs Help Uncover Insights about LLMs? A Large-Scale, Evolving Literature Analysis of Frontier LLMs
by: Park, Jungsoo, et al.
Published: (2025)
by: Park, Jungsoo, et al.
Published: (2025)
More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG
by: Levy, Shahar, et al.
Published: (2025)
by: Levy, Shahar, et al.
Published: (2025)
Anticipatory Evaluation of Language Models
by: Park, Jungsoo, et al.
Published: (2025)
by: Park, Jungsoo, et al.
Published: (2025)
Beyond Benchmarks: On The False Promise of AI Regulation
by: Stanovsky, Gabriel, et al.
Published: (2025)
by: Stanovsky, Gabriel, et al.
Published: (2025)
From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs
by: Itzhak, Itay, et al.
Published: (2026)
by: Itzhak, Itay, et al.
Published: (2026)
SEAM: A Stochastic Benchmark for Multi-Document Tasks
by: Lior, Gili, et al.
Published: (2024)
by: Lior, Gili, et al.
Published: (2024)
State of What Art? A Call for Multi-Prompt LLM Evaluation
by: Mizrahi, Moran, et al.
Published: (2023)
by: Mizrahi, Moran, et al.
Published: (2023)
In-Context Learning with Long-Context Models: An In-Depth Exploration
by: Bertsch, Amanda, et al.
Published: (2024)
by: Bertsch, Amanda, et al.
Published: (2024)
A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns
by: Yehudai, Asaf, et al.
Published: (2024)
by: Yehudai, Asaf, et al.
Published: (2024)
Learning Metadata-Agnostic Representations for Text-to-SQL In-Context Example Selection
by: Mai, Chuhong, et al.
Published: (2024)
by: Mai, Chuhong, et al.
Published: (2024)
ScheMatiQ: From Research Question to Structured Data through Interactive Schema Discovery
by: Levy, Shahar, et al.
Published: (2026)
by: Levy, Shahar, et al.
Published: (2026)
Schema-Driven Information Extraction from Heterogeneous Tables
by: Bai, Fan, et al.
Published: (2023)
by: Bai, Fan, et al.
Published: (2023)
Trust Me, I'm Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer
by: Simhi, Adi, et al.
Published: (2025)
by: Simhi, Adi, et al.
Published: (2025)
Emotion Classification In-Context in Spanish
by: Thapa, Bipul, et al.
Published: (2025)
by: Thapa, Bipul, et al.
Published: (2025)
Cooking Up Creativity: Enhancing LLM Creativity through Structured Recombination
by: Mizrahi, Moran, et al.
Published: (2025)
by: Mizrahi, Moran, et al.
Published: (2025)
Token-Budget-Aware LLM Reasoning
by: Han, Tingxu, et al.
Published: (2024)
by: Han, Tingxu, et al.
Published: (2024)
Evaluating In-Context Translation with Synchronous Context-Free Grammar Transduction
by: Petty, Jackson, et al.
Published: (2026)
by: Petty, Jackson, et al.
Published: (2026)
In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax
by: Mueller, Aaron, et al.
Published: (2023)
by: Mueller, Aaron, et al.
Published: (2023)
Language Models Struggle to Use Representations Learned In-Context
by: Lepori, Michael A., et al.
Published: (2026)
by: Lepori, Michael A., et al.
Published: (2026)
K-QA: A Real-World Medical Q&A Benchmark
by: Manes, Itay, et al.
Published: (2024)
by: Manes, Itay, et al.
Published: (2024)
Similar Items
-
Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games
by: Eckhaus, Niv, et al.
Published: (2025) -
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
by: Berger, Uri, et al.
Published: (2024) -
Improving Image Captioning by Mimicking Human Reformulation Feedback at Inference-time
by: Berger, Uri, et al.
Published: (2025) -
SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction
by: Neuberger, Shlomo, et al.
Published: (2024) -
Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
by: Itzhak, Itay, et al.
Published: (2025)