:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kumar, Vanya Bannihatti, Goyal, Divyanshu, Eppa, Akhil, Bhandari, Neel
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2510.05135
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DistortBench: Benchmarking Vision Language Models on Image Distortion Identification
by: Goyal, Divyanshu, et al.
Published: (2026)

Syntax Without Semantics: Teaching Large Language Models to Code in an Unseen Language
by: Kumar, Vinayshekhar Bannihatti, et al.
Published: (2026)

Preference Leakage: A Contamination Problem in LLM-as-a-judge
by: Li, Dawei, et al.
Published: (2025)

Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper)
by: Dobariya, Om, et al.
Published: (2025)

Transcoders Find Interpretable LLM Feature Circuits
by: Dunefsky, Jacob, et al.
Published: (2024)

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
by: Dai, Runpeng, et al.
Published: (2025)

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
by: Li, Dawei, et al.
Published: (2024)

Quriosity: Analyzing Human Questioning Behavior and Causal Inquiry through Curiosity-Driven Queries
by: Ceraolo, Roberto, et al.
Published: (2024)

Improving Self Consistency in LLMs through Probabilistic Tokenization
by: Sathe, Ashutosh, et al.
Published: (2024)

What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Questioning
by: Javaji, Shashidhar Reddy, et al.
Published: (2024)

AtP*: An efficient and scalable method for localizing LLM behaviour to components
by: Kramár, János, et al.
Published: (2024)

Agents Explore but Agents Ignore: LLMs Lack Environmental Curiosity
by: Engländer, Leon, et al.
Published: (2026)

Bridging Human and LLM Judgments: Understanding and Narrowing the Gap
by: Polo, Felipe Maia, et al.
Published: (2025)

Learning to Translate from Soft to Hard LLM Prompts
by: Kongsomjit, Pitipat, et al.
Published: (2026)

Cooking Up Creativity: Enhancing LLM Creativity through Structured Recombination
by: Mizrahi, Moran, et al.
Published: (2025)

Addressing LLM Diversity by Infusing Random Concepts
by: Agrawal, Pulin, et al.
Published: (2026)

Curiosity-driven Red-teaming for Large Language Models
by: Hong, Zhang-Wei, et al.
Published: (2024)

Geometry of Decision Making in Language Models
by: Joshi, Abhinav, et al.
Published: (2025)

Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment
by: Kim, Dongyoung, et al.
Published: (2024)

MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
by: Sharma, Akshat, et al.
Published: (2024)

Benchmarking LLMs' Judgments with No Gold Standard
by: Xu, Shengwei, et al.
Published: (2024)

Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning
by: Kaur, Simran, et al.
Published: (2024)

Thought Branches: Interpreting LLM Reasoning Requires Resampling
by: Macar, Uzay, et al.
Published: (2025)

Thought Anchors: Which LLM Reasoning Steps Matter?
by: Bogdan, Paul C., et al.
Published: (2025)

LimTopic: LLM-based Topic Modeling and Text Summarization for Analyzing Scientific Articles limitations
by: Azhar, Ibrahim Al, et al.
Published: (2025)

Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models
by: Bhandari, Pranav, et al.
Published: (2026)

Joint Detection of Fraud and Concept Drift inOnline Conversations with LLM-Assisted Judgment
by: Senol, Ali, et al.
Published: (2025)

Synthesizing Behaviorally-Grounded Reasoning Chains: A Data-Generation Framework for Personal Finance LLMs
by: Theerthala, Akhil
Published: (2025)

LLM vs. Lawyers: Identifying a Subset of Summary Judgments in a Large UK Case Law Dataset
by: Izzidien, Ahmed, et al.
Published: (2024)

LLMGuard: Guarding Against Unsafe LLM Behavior
by: Goyal, Shubh, et al.
Published: (2024)

Compositional Instruction Following with Language Models and Reinforcement Learning
by: Cohen, Vanya, et al.
Published: (2025)

Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs
by: Chughtai, Bilal, et al.
Published: (2024)

CosmicFish-HRM: Adaptive Reasoning via Hierarchical Recurrent Mechanisms in Compact Language Models
by: Lakkapragada, Venkat Akhil
Published: (2026)

A Korean Legal Judgment Prediction Dataset for Insurance Disputes
by: Kwak, Alice Saebom, et al.
Published: (2024)

Data Cartography for Detecting Memorization Hotspots and Guiding Data Interventions in Generative Models
by: Patel, Laksh, et al.
Published: (2025)

Explorations of Self-Repair in Language Models
by: Rushing, Cody, et al.
Published: (2024)

Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
by: Zhang, Fred, et al.
Published: (2023)

ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning
by: Potamitis, Nearchos, et al.
Published: (2025)

Legal Judgment Reimagined: PredEx and the Rise of Intelligent AI Interpretation in Indian Courts
by: Nigam, Shubham Kumar, et al.
Published: (2024)

On the Way to LLM Personalization: Learning to Remember User Conversations
by: Magister, Lucie Charlotte, et al.
Published: (2024)