Saved in:
| Main Authors: | Java, Abhinav, Shahid, Simra, Agarwal, Chirag |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.08506 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Understanding the Robustness of Sparse Autoencoders
by: Saiyed, Ahson, et al.
Published: (2026)
by: Saiyed, Ahson, et al.
Published: (2026)
Toward Understanding Unlearning Difficulty: A Mechanistic Perspective and Circuit-Guided Difficulty Metric
by: Cheng, Jiali, et al.
Published: (2026)
by: Cheng, Jiali, et al.
Published: (2026)
In-Context Explainers: Harnessing LLMs for Explaining Black Box Models
by: Kroeger, Nicholas, et al.
Published: (2023)
by: Kroeger, Nicholas, et al.
Published: (2023)
Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods
by: Cheng, Jiali, et al.
Published: (2025)
by: Cheng, Jiali, et al.
Published: (2025)
Towards Quantifying Commonsense Reasoning with Mechanistic Insights
by: Joshi, Abhinav, et al.
Published: (2025)
by: Joshi, Abhinav, et al.
Published: (2025)
Towards Robust Evaluation of Unlearning in LLMs via Data Transformations
by: Joshi, Abhinav, et al.
Published: (2024)
by: Joshi, Abhinav, et al.
Published: (2024)
Agnostic Language Identification and Generation
by: Høgsgaard, Mikael Møller, et al.
Published: (2026)
by: Høgsgaard, Mikael Møller, et al.
Published: (2026)
Meursault as a Data Point
by: Pratap, Abhinav
Published: (2025)
by: Pratap, Abhinav
Published: (2025)
Certifying LLM Safety against Adversarial Prompting
by: Kumar, Aounon, et al.
Published: (2023)
by: Kumar, Aounon, et al.
Published: (2023)
Neural Networks for Learnable and Scalable Influence Estimation of Instruction Fine-Tuning Data
by: Agarwal, Ishika, et al.
Published: (2025)
by: Agarwal, Ishika, et al.
Published: (2025)
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models
by: Majumder, Bodhisattwa Prasad, et al.
Published: (2024)
by: Majumder, Bodhisattwa Prasad, et al.
Published: (2024)
Thinking Fair and Slow: On the Efficacy of Structured Prompts for Debiasing Language Models
by: Furniturewala, Shaz, et al.
Published: (2024)
by: Furniturewala, Shaz, et al.
Published: (2024)
Feedback-Aware Monte Carlo Tree Search for Efficient Information Seeking in Goal-Oriented Conversations
by: Chopra, Harshita, et al.
Published: (2025)
by: Chopra, Harshita, et al.
Published: (2025)
How Reliable are Causal Probing Interventions?
by: Canby, Marc, et al.
Published: (2024)
by: Canby, Marc, et al.
Published: (2024)
LEAST: "Local" text-conditioned image style transfer
by: Singh, Silky, et al.
Published: (2024)
by: Singh, Silky, et al.
Published: (2024)
Rethinking Explainability in the Era of Multimodal AI
by: Agarwal, Chirag
Published: (2025)
by: Agarwal, Chirag
Published: (2025)
The Right Time Matters: Data Arrangement Affects Zero-Shot Generalization in Instruction Tuning
by: He, Bingxiang, et al.
Published: (2024)
by: He, Bingxiang, et al.
Published: (2024)
COLD: Causal reasOning in cLosed Daily activities
by: Joshi, Abhinav, et al.
Published: (2024)
by: Joshi, Abhinav, et al.
Published: (2024)
Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits
by: Ahmad, Areeb, et al.
Published: (2025)
by: Ahmad, Areeb, et al.
Published: (2025)
Geometry of Decision Making in Language Models
by: Joshi, Abhinav, et al.
Published: (2025)
by: Joshi, Abhinav, et al.
Published: (2025)
Calibration Across Layers: Understanding Calibration Evolution in LLMs
by: Joshi, Abhinav, et al.
Published: (2025)
by: Joshi, Abhinav, et al.
Published: (2025)
Exploring Facets of Language Generation in the Limit
by: Charikar, Moses, et al.
Published: (2024)
by: Charikar, Moses, et al.
Published: (2024)
Pareto-optimal Non-uniform Language Generation
by: Charikar, Moses, et al.
Published: (2025)
by: Charikar, Moses, et al.
Published: (2025)
Data-driven Discovery with Large Generative Models
by: Majumder, Bodhisattwa Prasad, et al.
Published: (2024)
by: Majumder, Bodhisattwa Prasad, et al.
Published: (2024)
Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies
by: Liu, Terrance, et al.
Published: (2025)
by: Liu, Terrance, et al.
Published: (2025)
Operationalizing the Blueprint for an AI Bill of Rights: Recommendations for Practitioners, Researchers, and Policy Makers
by: Oesterling, Alex, et al.
Published: (2024)
by: Oesterling, Alex, et al.
Published: (2024)
AutoEval Done Right: Using Synthetic Data for Model Evaluation
by: Boyeau, Pierre, et al.
Published: (2024)
by: Boyeau, Pierre, et al.
Published: (2024)
Languages are Modalities: Cross-Lingual Alignment via Encoder Injection
by: Agarwal, Rajan, et al.
Published: (2025)
by: Agarwal, Rajan, et al.
Published: (2025)
TrICy: Trigger-guided Data-to-text Generation with Intent aware Attention-Copy
by: Agarwal, Vibhav, et al.
Published: (2024)
by: Agarwal, Vibhav, et al.
Published: (2024)
Operationalizing AI: Empirical Evidence on MLOps Practices, User Satisfaction, and Organizational Context
by: Pasch, Stefan
Published: (2025)
by: Pasch, Stefan
Published: (2025)
Towards Compute-Optimal Many-Shot In-Context Learning
by: Golchin, Shahriar, et al.
Published: (2025)
by: Golchin, Shahriar, et al.
Published: (2025)
Representation Learning of Structured Data for Medical Foundation Models
by: Dwivedi, Vijay Prakash, et al.
Published: (2024)
by: Dwivedi, Vijay Prakash, et al.
Published: (2024)
AcquisitionSynthesis: Targeted Data Generation using Acquisition Functions
by: Agarwal, Ishika, et al.
Published: (2026)
by: Agarwal, Ishika, et al.
Published: (2026)
RAG-Modulo: Solving Sequential Tasks using Experience, Critics, and Language Models
by: Jain, Abhinav, et al.
Published: (2024)
by: Jain, Abhinav, et al.
Published: (2024)
On the Emergence of Thinking in LLMs I: Searching for the Right Intuition
by: Ye, Guanghao, et al.
Published: (2025)
by: Ye, Guanghao, et al.
Published: (2025)
SAEs Are Good for Steering -- If You Select the Right Features
by: Arad, Dana, et al.
Published: (2025)
by: Arad, Dana, et al.
Published: (2025)
Perplexity Cannot Always Tell Right from Wrong
by: Veličković, Petar, et al.
Published: (2026)
by: Veličković, Petar, et al.
Published: (2026)
Think Inside the JSON: Reinforcement Strategy for Strict LLM Schema Adherence
by: Agarwal, Bhavik, et al.
Published: (2025)
by: Agarwal, Bhavik, et al.
Published: (2025)
G-Loss: Graph-Guided Fine-Tuning of Language Models
by: Sharma, Aditya, et al.
Published: (2026)
by: Sharma, Aditya, et al.
Published: (2026)
A Characterization of List Language Identification in the Limit
by: Charikar, Moses, et al.
Published: (2025)
by: Charikar, Moses, et al.
Published: (2025)
Similar Items
-
Towards Understanding the Robustness of Sparse Autoencoders
by: Saiyed, Ahson, et al.
Published: (2026) -
Toward Understanding Unlearning Difficulty: A Mechanistic Perspective and Circuit-Guided Difficulty Metric
by: Cheng, Jiali, et al.
Published: (2026) -
In-Context Explainers: Harnessing LLMs for Explaining Black Box Models
by: Kroeger, Nicholas, et al.
Published: (2023) -
Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods
by: Cheng, Jiali, et al.
Published: (2025) -
Towards Quantifying Commonsense Reasoning with Mechanistic Insights
by: Joshi, Abhinav, et al.
Published: (2025)