:: Library Catalog

Image de couverture de livre

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Wu, Tianyu, Yao, Yu, Qi, Zhenting, Zheng, Han, Wang, Zhuohan, Ma, Haoran, Liao, Lawrence, Lakkaraju, Himabindu, Li, Ju, Du, Yilun
Format:	Preprint
Publié:	2026
Sujets:	Machine Learning Artificial Intelligence
Accès en ligne:	https://arxiv.org/abs/2605.18810
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Documents similaires

Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
par: Xiong, Zidi, et autres
Publié: (2025)

Self-Improving Language Models with Bidirectional Evolutionary Search
par: Xu, Guowei, et autres
Publié: (2026)

EvoLM: In Search of Lost Language Model Training Dynamics
par: Qi, Zhenting, et autres
Publié: (2025)

Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models
par: Pawelczyk, Martin, et autres
Publié: (2024)

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
par: Qi, Zhenting, et autres
Publié: (2024)

SpecBranch: Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism
par: Shen, Yuhao, et autres
Publié: (2025)

Learning Recourse Costs from Pairwise Feature Comparisons
par: Rawal, Kaivalya, et autres
Publié: (2024)

Manipulating Large Language Models to Increase Product Visibility
par: Kumar, Aounon, et autres
Publié: (2024)

PEARL: Parallel Speculative Decoding with Adaptive Draft Length
par: Liu, Tianyu, et autres
Publié: (2024)

Characterizing Data Point Vulnerability via Average-Case Robustness
par: Han, Tessa, et autres
Publié: (2023)

Who Gets Credit or Blame? Attributing Accountability in Modern AI Systems
par: Zhang, Shichang, et autres
Publié: (2025)

Quantifying Generalization Complexity for Large Language Models
par: Qi, Zhenting, et autres
Publié: (2024)

In-Context Unlearning: Language Models as Few Shot Unlearners
par: Pawelczyk, Martin, et autres
Publié: (2023)

Understanding the Effects of Iterative Prompting on Truthfulness
par: Krishna, Satyapriya, et autres
Publié: (2024)

Monitorability as a Free Gift: How RLVR Spontaneously Aligns Reasoning
par: Xiong, Zidi, et autres
Publié: (2026)

On the Impact of Fine-Tuning on Chain-of-Thought Reasoning
par: Lobo, Elita, et autres
Publié: (2024)

Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability
par: Bhalla, Usha, et autres
Publié: (2023)

On the Trade-offs between Adversarial Robustness and Actionable Explanations
par: Krishna, Satyapriya, et autres
Publié: (2023)

Explaining the Model, Protecting Your Data: Revealing and Mitigating the Data Privacy Risks of Post-Hoc Model Explanations via Membership Inference
par: Huang, Catherine, et autres
Publié: (2024)

Towards Unified Attribution in Explainable AI, Data-Centric AI, and Mechanistic Interpretability
par: Zhang, Shichang, et autres
Publié: (2025)

MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models
par: Han, Tessa, et autres
Publié: (2024)

Confronting LLMs with Traditional ML: Rethinking the Fairness of Large Language Models in Tabular Classifications
par: Liu, Yanchen, et autres
Publié: (2023)

Draft-OPD: On-Policy Distillation for Speculative Draft Models
par: Lei, Haodi, et autres
Publié: (2026)

LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
par: Yang, Penghui, et autres
Publié: (2025)

More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
par: Li, Aaron J., et autres
Publié: (2024)

Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models
par: Agarwal, Chirag, et autres
Publié: (2024)

Towards Uncovering How Large Language Model Works: An Explainability Perspective
par: Zhao, Haiyan, et autres
Publié: (2024)

MineDraft: A Framework for Batch Parallel Speculative Decoding
par: Tang, Zhenwei, et autres
Publié: (2026)

Towards Unifying Interpretability and Control: Evaluation via Intervention
par: Bhalla, Usha, et autres
Publié: (2024)

Interpretability Needs a New Paradigm
par: Madsen, Andreas, et autres
Publié: (2024)

Operationalizing the Blueprint for an AI Bill of Rights: Recommendations for Practitioners, Researchers, and Policy Makers
par: Oesterling, Alex, et autres
Publié: (2024)

POSS: Position Specialist Generates Better Draft for Speculative Decoding
par: Huang, Langlin, et autres
Publié: (2025)

Cost-Aware Diffusion Draft Trees for Speculative Decoding
par: Zhang, Shuai, et autres
Publié: (2026)

DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding
par: Hu, Yunhai, et autres
Publié: (2025)

On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models
par: Tanneru, Sree Harsha, et autres
Publié: (2024)

Evaluating Adversarial Robustness of Concept Representations in Sparse Autoencoders
par: Li, Aaron J., et autres
Publié: (2025)

Detecting LLM-Generated Peer Reviews
par: Rao, Vishisht, et autres
Publié: (2025)

Draft, Verify, and Improve: Toward Training-Aware Speculative Decoding
par: Bhansali, Shrenik, et autres
Publié: (2025)

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective
par: Krishna, Satyapriya, et autres
Publié: (2022)

Soft Best-of-n Sampling for Model Alignment
par: Verdun, Claudio Mayrink, et autres
Publié: (2025)