:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gema, Aryo Pradipta, Leang, Joshua Ong Jun, Hong, Giwon, Devoto, Alessio, Mancino, Alberto Carlo Maria, Saxena, Rohit, He, Xuanli, Zhao, Yu, Du, Xiaotang, Madani, Mohammad Reza Ghasemi, Barale, Claire, McHardy, Robert, Harris, Joshua, Kaddour, Jean, van Krieken, Emile, Minervini, Pasquale
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2406.04127
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs
by: Saxena, Rohit, et al.
Published: (2025)

Self-Training Large Language Models for Tool-Use Without Demonstrations
by: Luo, Ne, et al.
Published: (2025)

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks
by: Kwan, Wai-Chung, et al.
Published: (2026)

Analysing the Residual Stream of Language Models Under Knowledge Conflicts
by: Zhao, Yu, et al.
Published: (2024)

Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
by: Zhao, Yu, et al.
Published: (2024)

The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models
by: Hong, Giwon, et al.
Published: (2024)

GRADA: Graph-based Reranking against Adversarial Documents Attack
by: Zheng, Jingjie, et al.
Published: (2025)

CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning
by: Leang, Joshua Ong Jun, et al.
Published: (2024)

Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4
by: Gema, Aryo Pradipta, et al.
Published: (2024)

Noiser: Bounded Input Perturbations for Attributing Large Language Models
by: Madani, Mohammad Reza Ghasemi, et al.
Published: (2025)

PiCSAR: Probabilistic Confidence Selection And Ranking for Reasoning Chains
by: Leang, Joshua Ong Jun, et al.
Published: (2025)

Mixtures of In-Context Learners
by: Hong, Giwon, et al.
Published: (2024)

Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain
by: Gema, Aryo Pradipta, et al.
Published: (2023)

Analyzing LLM Instruction Optimization for Tabular Fact Verification
by: Du, Xiaotang, et al.
Published: (2026)

Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints
by: Gema, Aryo Pradipta, et al.
Published: (2024)

Neurosymbolic Diffusion Models
by: van Krieken, Emile, et al.
Published: (2025)

Neurosymbolic Reasoning Shortcuts under the Independence Assumption
by: van Krieken, Emile, et al.
Published: (2025)

Enhancing Long Document Long Form Summarisation with Self-Planning
by: Du, Xiaotang, et al.
Published: (2025)

Same Answer, Different Representations: Hidden instability in VLMs
by: Wani, Farooq Ahmad, et al.
Published: (2026)

On the Independence Assumption in Neurosymbolic Learning
by: van Krieken, Emile, et al.
Published: (2024)

Theorem Prover as a Judge for Synthetic Data Generation
by: Leang, Joshua Ong Jun, et al.
Published: (2025)

VLM-RobustBench: A Comprehensive Benchmark for Robustness of Vision-Language Models
by: Saxena, Rohit, et al.
Published: (2026)

PosterSum: A Multimodal Benchmark for Scientific Poster Summarization
by: Saxena, Rohit, et al.
Published: (2025)

DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations
by: Gema, Aryo Pradipta, et al.
Published: (2024)

An Analysis of Decoding Methods for LLM-based Agents for Faithful Multi-Hop Question Answering
by: Murphy, Alexander, et al.
Published: (2025)

Do Composed Image Retrieval Benchmarks Require Multimodal Composition?
by: Attimonelli, Matteo, et al.
Published: (2026)

OpenSIR: Open-Ended Self-Improving Reasoner
by: Kwan, Wai-Chung, et al.
Published: (2025)

Optimisation in Neurosymbolic Learning Systems
by: van Krieken, Emile
Published: (2024)

Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models
by: Tyukin, Georgy, et al.
Published: (2024)

A Simple and Effective $L_2$ Norm-Based Strategy for KV Cache Compression
by: Devoto, Alessio, et al.
Published: (2024)

An Auditing Test To Detect Behavioral Shift in Language Models
by: Richter, Leo, et al.
Published: (2024)

Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them
by: Rajani, Neel, et al.
Published: (2025)

Intellectual Property Rights: A Comparative Perspective on Asia, the EU, and North America
by: David McHardy Reid
Published: (2012)

NAFTA, Mexico and the China factor / David McHardy Reid, Alethia Jimenez and Peter Rahmer
by: McHardy Reid, David

Intellectual property rights : a comparative perspective on Asia, the EU, and North America / David McHardy Reid
by: McHardy Reid, David
Published: (2012)

Gradient-Based Optimization on Gödel Logic as Discrete Local Search
by: Daniele, Alessandro, et al.
Published: (2025)

Using Natural Language Explanations to Improve Robustness of In-context Learning
by: He, Xuanli, et al.
Published: (2023)

Dot Product is All You Need: Bridging the Gap Between Item Recommendation and Link Prediction
by: Malitesta, Daniele, et al.
Published: (2024)

The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?
by: Hägele, Alexander, et al.
Published: (2026)

Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference
by: Wójcik, Bartosz, et al.
Published: (2023)