:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Desai, Dev Arpan, Huang, Shaoyi, Zhu, Zining
Formato:	Preprint
Publicado:	2026
Materias:	Machine Learning Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2604.06483
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Feature-Guided SAE Steering for Refusal-Rate Control using Contrasting Prompts
por: Bhargav, Samaksh, et al.
Publicado: (2025)

Interpretable Physics Reasoning and Performance Taxonomy in Vision-Language Models
por: Pawar, Pranav, et al.
Publicado: (2025)

The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models
por: Wang, Yan, et al.
Publicado: (2026)

Using Large Language Models for Hyperparameter Optimization
por: Zhang, Michael R., et al.
Publicado: (2023)

What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Questioning
por: Javaji, Shashidhar Reddy, et al.
Publicado: (2024)

Plug and Play with Prompts: A Prompt Tuning Approach for Controlling Text Generation
por: Ajwani, Rohan Deepak, et al.
Publicado: (2024)

Interpretable Robot Control via Structured Behavior Trees and Large Language Models
por: Chekam, Ingrid Maéva, et al.
Publicado: (2025)

ToMA: Token Merge with Attention for Diffusion Models
por: Lu, Wenbo, et al.
Publicado: (2025)

An Interpretable and Scalable Framework for Evaluating Large Language Models
por: Qu, Xinhao, et al.
Publicado: (2026)

Medical Interpretability and Knowledge Maps of Large Language Models
por: Marinescu, Razvan, et al.
Publicado: (2025)

Model-Distributed Inference for Large Language Models at the Edge
por: Macario, Davide, et al.
Publicado: (2025)

InverseScope: Scalable Activation Inversion for Interpreting Large Language Models
por: Luo, Yifan, et al.
Publicado: (2025)

Rethinking Interpretability in the Era of Large Language Models
por: Singh, Chandan, et al.
Publicado: (2024)

A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms
por: Gong, Ruihao, et al.
Publicado: (2024)

PALADIN: Self-Correcting Language Model Agents to Cure Tool-Failure Cases
por: Vuddanti, Sri Vatsa, et al.
Publicado: (2025)

GSR-GNN: Training Acceleration and Memory-Saving Framework of Deep GNNs on Circuit Graph
por: Luo, Yuebo, et al.
Publicado: (2026)

REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model
por: Li, Bo, et al.
Publicado: (2025)

Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
por: Winninger, Thomas, et al.
Publicado: (2025)

Evidence-based Distributional Alignment for Large Language Models
por: Pham, Viet-Thanh, et al.
Publicado: (2026)

Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM
por: Li, Bingbing, et al.
Publicado: (2024)

Large Language Model Predicts Above Normal All India Summer Monsoon Rainfall in 2024
por: Sharma, Ujjawal, et al.
Publicado: (2024)

Inverse Reinforcement Learning With Constraint Recovery
por: Das, Nirjhar, et al.
Publicado: (2023)

Binary Autoencoder for Mechanistic Interpretability of Large Language Models
por: Cho, Hakaze, et al.
Publicado: (2025)

RLAX: Large-Scale, Distributed Reinforcement Learning for Large Language Models on TPUs
por: Zhou, Runlong, et al.
Publicado: (2025)

Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing
por: Bick, Aviv, et al.
Publicado: (2025)

Larimar: Large Language Models with Episodic Memory Control
por: Das, Payel, et al.
Publicado: (2024)

SelfIE: Self-Interpretation of Large Language Model Embeddings
por: Chen, Haozhe, et al.
Publicado: (2024)

TracrBench: Generating Interpretability Testbeds with Large Language Models
por: Thurnherr, Hannes, et al.
Publicado: (2024)

Fine-Grained Interpretation of Political Opinions in Large Language Models
por: Hu, Jingyu, et al.
Publicado: (2025)

Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge
por: Plecko, Drago, et al.
Publicado: (2025)

Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models
por: Sun, Shaoning, et al.
Publicado: (2026)

Robust Multi-Objective Controlled Decoding of Large Language Models
por: Son, Seongho, et al.
Publicado: (2025)

Unlocking Emergent Modularity in Large Language Models
por: Qiu, Zihan, et al.
Publicado: (2023)

Interpretable Steering of Large Language Models with Feature Guided Activation Additions
por: Soo, Samuel, et al.
Publicado: (2025)

Tequila: Trapping-free Ternary Quantization for Large Language Models
por: Huang, Hong, et al.
Publicado: (2025)

Self-Tuning Sparse Attention: Multi-Fidelity Hyperparameter Optimization for Transformer Acceleration
por: Dev, Arundhathi, et al.
Publicado: (2026)

Foundations of Large Language Models
por: Xiao, Tong, et al.
Publicado: (2025)

From Narratives to Probabilistic Reasoning: Predicting and Interpreting Drivers' Hazardous Actions in Crashes Using Large Language Model
por: Chen, Boyou, et al.
Publicado: (2025)

Doubly Robust Alignment for Large Language Models
por: Xu, Erhan, et al.
Publicado: (2025)

Are Large Language Models In-Context Graph Learners?
por: Li, Jintang, et al.
Publicado: (2025)