Saved in:
| Main Authors: | Khalifa, Muhammad, Wadden, David, Strubell, Emma, Lee, Honglak, Wang, Lu, Beltagy, Iz, Peng, Hao |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.01019 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GRACE: Discriminator-Guided Chain-of-Thought Reasoning
by: Khalifa, Muhammad, et al.
Published: (2023)
by: Khalifa, Muhammad, et al.
Published: (2023)
Process Reward Models That Think
by: Khalifa, Muhammad, et al.
Published: (2025)
by: Khalifa, Muhammad, et al.
Published: (2025)
Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation
by: Khalifa, Muhammad, et al.
Published: (2026)
by: Khalifa, Muhammad, et al.
Published: (2026)
Just CHOP: Embarrassingly Simple LLM Compression
by: Jha, Ananya Harsh, et al.
Published: (2023)
by: Jha, Ananya Harsh, et al.
Published: (2023)
If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs
by: Khalifa, Muhammad, et al.
Published: (2024)
by: Khalifa, Muhammad, et al.
Published: (2024)
Countdown-Code: A Testbed for Studying The Emergence and Generalization of Reward Hacking in RLVR
by: Khalifa, Muhammad, et al.
Published: (2026)
by: Khalifa, Muhammad, et al.
Published: (2026)
A Survey of Large Language Models for Arabic Language and its Dialects
by: Mashaabi, Malak, et al.
Published: (2024)
by: Mashaabi, Malak, et al.
Published: (2024)
Logit Arithmetic Elicits Long Reasoning Capabilities Without Training
by: Zhang, Yunxiang, et al.
Published: (2025)
by: Zhang, Yunxiang, et al.
Published: (2025)
HalluSearch at SemEval-2025 Task 3: A Search-Enhanced RAG Pipeline for Hallucination Detection
by: Abdallah, Mohamed A., et al.
Published: (2025)
by: Abdallah, Mohamed A., et al.
Published: (2025)
Exploring Retrieval Augmented Generation in Arabic
by: El-Beltagy, Samhaa R., et al.
Published: (2024)
by: El-Beltagy, Samhaa R., et al.
Published: (2024)
DAIQ: Auditing Demographic Attribute Inference from Question in LLMs
by: Panda, Srikant, et al.
Published: (2025)
by: Panda, Srikant, et al.
Published: (2025)
The Qiyas Benchmark: Measuring ChatGPT Mathematical and Language Understanding in Arabic
by: Al-Khalifa, Shahad, et al.
Published: (2024)
by: Al-Khalifa, Shahad, et al.
Published: (2024)
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
by: Ravichander, Abhilasha, et al.
Published: (2025)
by: Ravichander, Abhilasha, et al.
Published: (2025)
Paloma: A Benchmark for Evaluating Language Model Fit
by: Magnusson, Ian, et al.
Published: (2023)
by: Magnusson, Ian, et al.
Published: (2023)
Mitigating Biases for Instruction-following Language Models via Bias Neurons Elimination
by: Yang, Nakyeong, et al.
Published: (2023)
by: Yang, Nakyeong, et al.
Published: (2023)
Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents
by: Kim, Jaekyeom, et al.
Published: (2024)
by: Kim, Jaekyeom, et al.
Published: (2024)
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?
by: Zhang, Yunxiang, et al.
Published: (2025)
by: Zhang, Yunxiang, et al.
Published: (2025)
Pre-Trained Language Models for Keyphrase Prediction: A Review
by: Umair, Muhammad, et al.
Published: (2024)
by: Umair, Muhammad, et al.
Published: (2024)
Small Language Models Need Strong Verifiers to Self-Correct Reasoning
by: Zhang, Yunxiang, et al.
Published: (2024)
by: Zhang, Yunxiang, et al.
Published: (2024)
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
by: Chen, Mengzhao, et al.
Published: (2024)
by: Chen, Mengzhao, et al.
Published: (2024)
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2025)
by: Yang, Chih-Kai, et al.
Published: (2025)
TinyLlama: An Open-Source Small Language Model
by: Zhang, Peiyuan, et al.
Published: (2024)
by: Zhang, Peiyuan, et al.
Published: (2024)
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
by: Huang, Yukun, et al.
Published: (2025)
by: Huang, Yukun, et al.
Published: (2025)
Reviewing Clinical Knowledge in Medical Large Language Models: Training and Beyond
by: Li, Qiyuan, et al.
Published: (2025)
by: Li, Qiyuan, et al.
Published: (2025)
Noise-Aware Training of Layout-Aware Language Models
by: Sarkhel, Ritesh, et al.
Published: (2024)
by: Sarkhel, Ritesh, et al.
Published: (2024)
Mitigating Shortcut Reasoning in Language Models: A Gradient-Aware Training Approach
by: Cao, Hongyu, et al.
Published: (2026)
by: Cao, Hongyu, et al.
Published: (2026)
Knowledge Boundary Discovery for Large Language Models
by: Wang, Ziquan, et al.
Published: (2026)
by: Wang, Ziquan, et al.
Published: (2026)
Learning to Reason via Program Generation, Emulation, and Search
by: Weir, Nathaniel, et al.
Published: (2024)
by: Weir, Nathaniel, et al.
Published: (2024)
CiteEval: Principle-Driven Citation Evaluation for Source Attribution
by: Xu, Yumo, et al.
Published: (2025)
by: Xu, Yumo, et al.
Published: (2025)
KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision
by: Wu, Rong, et al.
Published: (2025)
by: Wu, Rong, et al.
Published: (2025)
Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution
by: Jin, Qiao, et al.
Published: (2026)
by: Jin, Qiao, et al.
Published: (2026)
Learning Fine-Grained Grounded Citations for Attributed Large Language Models
by: Huang, Lei, et al.
Published: (2024)
by: Huang, Lei, et al.
Published: (2024)
Meta-Cognitive Analysis: Evaluating Declarative and Procedural Knowledge in Datasets and Large Language Models
by: Li, Zhuoqun, et al.
Published: (2024)
by: Li, Zhuoqun, et al.
Published: (2024)
LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs
by: Chen, Xuan, et al.
Published: (2024)
by: Chen, Xuan, et al.
Published: (2024)
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
by: Wadden, David, et al.
Published: (2024)
by: Wadden, David, et al.
Published: (2024)
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
by: Alwajih, Fakhraddin, et al.
Published: (2024)
by: Alwajih, Fakhraddin, et al.
Published: (2024)
StruEdit: Structured Outputs Enable the Fast and Accurate Knowledge Editing for Large Language Models
by: Bi, Baolong, et al.
Published: (2024)
by: Bi, Baolong, et al.
Published: (2024)
Enhancing Building Semantics Preservation in AI Model Training with Large Language Model Encodings
by: Jang, Suhyung, et al.
Published: (2026)
by: Jang, Suhyung, et al.
Published: (2026)
Knowledge Reasoning Language Model: Unifying Knowledge and Language for Inductive Knowledge Graph Reasoning
by: Zhuo, Xingrui, et al.
Published: (2025)
by: Zhuo, Xingrui, et al.
Published: (2025)
Token-Level Uncertainty-Aware Objective for Language Model Post-Training
by: Liu, Tingkai, et al.
Published: (2025)
by: Liu, Tingkai, et al.
Published: (2025)
Similar Items
-
GRACE: Discriminator-Guided Chain-of-Thought Reasoning
by: Khalifa, Muhammad, et al.
Published: (2023) -
Process Reward Models That Think
by: Khalifa, Muhammad, et al.
Published: (2025) -
Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation
by: Khalifa, Muhammad, et al.
Published: (2026) -
Just CHOP: Embarrassingly Simple LLM Compression
by: Jha, Ananya Harsh, et al.
Published: (2023) -
If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs
by: Khalifa, Muhammad, et al.
Published: (2024)