Saved in:
| Main Authors: | Jacovi, Alon, Bitton, Yonatan, Bohnet, Bernd, Herzig, Jonathan, Honovich, Or, Tseng, Michael, Collins, Michael, Aharoni, Roee, Geva, Mor |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.00559 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers
by: Yona, Gal, et al.
Published: (2024)
by: Yona, Gal, et al.
Published: (2024)
Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words?
by: Yona, Gal, et al.
Published: (2024)
by: Yona, Gal, et al.
Published: (2024)
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs
by: Gekhman, Zorik, et al.
Published: (2026)
by: Gekhman, Zorik, et al.
Published: (2026)
Keep Guessing? When Considering Inference Scaling, Mind the Baselines
by: Yona, Gal, et al.
Published: (2024)
by: Yona, Gal, et al.
Published: (2024)
DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs
by: Cattan, Arie, et al.
Published: (2025)
by: Cattan, Arie, et al.
Published: (2025)
CoverBench: A Challenging Benchmark for Complex Claim Verification
by: Jacovi, Alon, et al.
Published: (2024)
by: Jacovi, Alon, et al.
Published: (2024)
Marketing the Librarian: The Weakest Link in the Chain.
by: Kies, Cosette
Published: (1989)
by: Kies, Cosette
Published: (1989)
Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models
by: Krishna, Arjun, et al.
Published: (2025)
by: Krishna, Arjun, et al.
Published: (2025)
FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning
by: Xie, Zhuohan, et al.
Published: (2025)
by: Xie, Zhuohan, et al.
Published: (2025)
DoubleDipper: Improving Long-Context LLMs via Context Recycling
by: Cattan, Arie, et al.
Published: (2024)
by: Cattan, Arie, et al.
Published: (2024)
Verifying Chain-of-Thought Reasoning via Its Computational Graph
by: Zhao, Zheng, et al.
Published: (2025)
by: Zhao, Zheng, et al.
Published: (2025)
On Learning Verifiers and Implications to Chain-of-Thought Reasoning
by: Balcan, Maria-Florina, et al.
Published: (2025)
by: Balcan, Maria-Florina, et al.
Published: (2025)
Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps
by: Tutek, Martin, et al.
Published: (2025)
by: Tutek, Martin, et al.
Published: (2025)
Evaluating Chain-of-Thought Reasoning through Reusability and Verifiability
by: Aggarwal, Shashank, et al.
Published: (2026)
by: Aggarwal, Shashank, et al.
Published: (2026)
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
by: Katz, Shahar, et al.
Published: (2024)
by: Katz, Shahar, et al.
Published: (2024)
Accelerating the Global Aggregation of Local Explanations
by: Mor, Alon, et al.
Published: (2023)
by: Mor, Alon, et al.
Published: (2023)
MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification
by: Sun, Linzhuang, et al.
Published: (2025)
by: Sun, Linzhuang, et al.
Published: (2025)
NL-Eye: Abductive NLI for Images
by: Ventura, Mor, et al.
Published: (2024)
by: Ventura, Mor, et al.
Published: (2024)
Universal Jailbreak Suffixes Are Strong Attention Hijackers
by: Ben-Tov, Matan, et al.
Published: (2025)
by: Ben-Tov, Matan, et al.
Published: (2025)
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
by: Shaham, Uri, et al.
Published: (2024)
by: Shaham, Uri, et al.
Published: (2024)
mFACE: Multilingual Summarization with Factual Consistency Evaluation
by: Aharoni, Roee, et al.
Published: (2022)
by: Aharoni, Roee, et al.
Published: (2022)
Representation Surgery: Theory and Practice of Affine Steering
by: Singh, Shashwat, et al.
Published: (2024)
by: Singh, Shashwat, et al.
Published: (2024)
EgoCoT-Bench: Benchmarking Grounded and Verifiable Operation-Centric Chain of Thought Reasoning for MLLMs
by: Dai, Yang, et al.
Published: (2026)
by: Dai, Yang, et al.
Published: (2026)
Typed Chain-of-Thought: A Curry-Howard Framework for Verifying LLM Reasoning
by: Perrier, Elija
Published: (2025)
by: Perrier, Elija
Published: (2025)
Latent Reasoning with Supervised Thinking States
by: Amos, Ido, et al.
Published: (2026)
by: Amos, Ido, et al.
Published: (2026)
Editor's Choice: Evaluating Abstract Intent in Image Editing through Atomic Entity Analysis
by: Ventura, Mor, et al.
Published: (2026)
by: Ventura, Mor, et al.
Published: (2026)
TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools
by: Caciularu, Avi, et al.
Published: (2024)
by: Caciularu, Avi, et al.
Published: (2024)
Compositional Chain-of-Thought Prompting for Large Multimodal Models
by: Mitra, Chancharik, et al.
Published: (2023)
by: Mitra, Chancharik, et al.
Published: (2023)
How Well Can Reasoning Models Identify and Recover from Unhelpful Thoughts?
by: Yang, Sohee, et al.
Published: (2025)
by: Yang, Sohee, et al.
Published: (2025)
EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits
by: Yosef, Ron, et al.
Published: (2025)
by: Yosef, Ron, et al.
Published: (2025)
Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model
by: Eisenstein, Jacob, et al.
Published: (2022)
by: Eisenstein, Jacob, et al.
Published: (2022)
CoRGI: Verified Chain-of-Thought Reasoning with Post-hoc Visual Grounding
by: Yi, Shixin, et al.
Published: (2025)
by: Yi, Shixin, et al.
Published: (2025)
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
by: Gekhman, Zorik, et al.
Published: (2024)
by: Gekhman, Zorik, et al.
Published: (2024)
Estimating Knowledge in Large Language Models Without Generating a Single Token
by: Gottesman, Daniela, et al.
Published: (2024)
by: Gottesman, Daniela, et al.
Published: (2024)
Inferring Functionality of Attention Heads from their Parameters
by: Elhelo, Amit, et al.
Published: (2024)
by: Elhelo, Amit, et al.
Published: (2024)
The Weakest Link: Library Catalogs.
by: Young, Terrence E., Jr.
Published: (2002)
by: Young, Terrence E., Jr.
Published: (2002)
Generating Verifiable Chain of Thoughts from Exection-Traces
by: Thakur, Shailja, et al.
Published: (2025)
by: Thakur, Shailja, et al.
Published: (2025)
Fractured Chain-of-Thought Reasoning
by: Liao, Baohao, et al.
Published: (2025)
by: Liao, Baohao, et al.
Published: (2025)
GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning
by: Yerramilli, Sahiti, et al.
Published: (2025)
by: Yerramilli, Sahiti, et al.
Published: (2025)
LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning
by: Motwani, Sumeet Ramesh, et al.
Published: (2026)
by: Motwani, Sumeet Ramesh, et al.
Published: (2026)
Similar Items
-
Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers
by: Yona, Gal, et al.
Published: (2024) -
Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words?
by: Yona, Gal, et al.
Published: (2024) -
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs
by: Gekhman, Zorik, et al.
Published: (2026) -
Keep Guessing? When Considering Inference Scaling, Mind the Baselines
by: Yona, Gal, et al.
Published: (2024) -
DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs
by: Cattan, Arie, et al.
Published: (2025)