Guardado en:
| Autores principales: | Desai, Dev Arpan, Huang, Shaoyi, Zhu, Zining |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2604.06483 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Feature-Guided SAE Steering for Refusal-Rate Control using Contrasting Prompts
por: Bhargav, Samaksh, et al.
Publicado: (2025)
por: Bhargav, Samaksh, et al.
Publicado: (2025)
Interpretable Physics Reasoning and Performance Taxonomy in Vision-Language Models
por: Pawar, Pranav, et al.
Publicado: (2025)
por: Pawar, Pranav, et al.
Publicado: (2025)
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models
por: Wang, Yan, et al.
Publicado: (2026)
por: Wang, Yan, et al.
Publicado: (2026)
Using Large Language Models for Hyperparameter Optimization
por: Zhang, Michael R., et al.
Publicado: (2023)
por: Zhang, Michael R., et al.
Publicado: (2023)
What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Questioning
por: Javaji, Shashidhar Reddy, et al.
Publicado: (2024)
por: Javaji, Shashidhar Reddy, et al.
Publicado: (2024)
Plug and Play with Prompts: A Prompt Tuning Approach for Controlling Text Generation
por: Ajwani, Rohan Deepak, et al.
Publicado: (2024)
por: Ajwani, Rohan Deepak, et al.
Publicado: (2024)
Interpretable Robot Control via Structured Behavior Trees and Large Language Models
por: Chekam, Ingrid Maéva, et al.
Publicado: (2025)
por: Chekam, Ingrid Maéva, et al.
Publicado: (2025)
ToMA: Token Merge with Attention for Diffusion Models
por: Lu, Wenbo, et al.
Publicado: (2025)
por: Lu, Wenbo, et al.
Publicado: (2025)
An Interpretable and Scalable Framework for Evaluating Large Language Models
por: Qu, Xinhao, et al.
Publicado: (2026)
por: Qu, Xinhao, et al.
Publicado: (2026)
Medical Interpretability and Knowledge Maps of Large Language Models
por: Marinescu, Razvan, et al.
Publicado: (2025)
por: Marinescu, Razvan, et al.
Publicado: (2025)
Model-Distributed Inference for Large Language Models at the Edge
por: Macario, Davide, et al.
Publicado: (2025)
por: Macario, Davide, et al.
Publicado: (2025)
InverseScope: Scalable Activation Inversion for Interpreting Large Language Models
por: Luo, Yifan, et al.
Publicado: (2025)
por: Luo, Yifan, et al.
Publicado: (2025)
Rethinking Interpretability in the Era of Large Language Models
por: Singh, Chandan, et al.
Publicado: (2024)
por: Singh, Chandan, et al.
Publicado: (2024)
A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms
por: Gong, Ruihao, et al.
Publicado: (2024)
por: Gong, Ruihao, et al.
Publicado: (2024)
PALADIN: Self-Correcting Language Model Agents to Cure Tool-Failure Cases
por: Vuddanti, Sri Vatsa, et al.
Publicado: (2025)
por: Vuddanti, Sri Vatsa, et al.
Publicado: (2025)
GSR-GNN: Training Acceleration and Memory-Saving Framework of Deep GNNs on Circuit Graph
por: Luo, Yuebo, et al.
Publicado: (2026)
por: Luo, Yuebo, et al.
Publicado: (2026)
REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model
por: Li, Bo, et al.
Publicado: (2025)
por: Li, Bo, et al.
Publicado: (2025)
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
por: Winninger, Thomas, et al.
Publicado: (2025)
por: Winninger, Thomas, et al.
Publicado: (2025)
Evidence-based Distributional Alignment for Large Language Models
por: Pham, Viet-Thanh, et al.
Publicado: (2026)
por: Pham, Viet-Thanh, et al.
Publicado: (2026)
Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM
por: Li, Bingbing, et al.
Publicado: (2024)
por: Li, Bingbing, et al.
Publicado: (2024)
Large Language Model Predicts Above Normal All India Summer Monsoon Rainfall in 2024
por: Sharma, Ujjawal, et al.
Publicado: (2024)
por: Sharma, Ujjawal, et al.
Publicado: (2024)
Inverse Reinforcement Learning With Constraint Recovery
por: Das, Nirjhar, et al.
Publicado: (2023)
por: Das, Nirjhar, et al.
Publicado: (2023)
Binary Autoencoder for Mechanistic Interpretability of Large Language Models
por: Cho, Hakaze, et al.
Publicado: (2025)
por: Cho, Hakaze, et al.
Publicado: (2025)
RLAX: Large-Scale, Distributed Reinforcement Learning for Large Language Models on TPUs
por: Zhou, Runlong, et al.
Publicado: (2025)
por: Zhou, Runlong, et al.
Publicado: (2025)
Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing
por: Bick, Aviv, et al.
Publicado: (2025)
por: Bick, Aviv, et al.
Publicado: (2025)
Larimar: Large Language Models with Episodic Memory Control
por: Das, Payel, et al.
Publicado: (2024)
por: Das, Payel, et al.
Publicado: (2024)
SelfIE: Self-Interpretation of Large Language Model Embeddings
por: Chen, Haozhe, et al.
Publicado: (2024)
por: Chen, Haozhe, et al.
Publicado: (2024)
TracrBench: Generating Interpretability Testbeds with Large Language Models
por: Thurnherr, Hannes, et al.
Publicado: (2024)
por: Thurnherr, Hannes, et al.
Publicado: (2024)
Fine-Grained Interpretation of Political Opinions in Large Language Models
por: Hu, Jingyu, et al.
Publicado: (2025)
por: Hu, Jingyu, et al.
Publicado: (2025)
Epidemiology of Large Language Models: A Benchmark for Observational Distribution Knowledge
por: Plecko, Drago, et al.
Publicado: (2025)
por: Plecko, Drago, et al.
Publicado: (2025)
Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models
por: Sun, Shaoning, et al.
Publicado: (2026)
por: Sun, Shaoning, et al.
Publicado: (2026)
Robust Multi-Objective Controlled Decoding of Large Language Models
por: Son, Seongho, et al.
Publicado: (2025)
por: Son, Seongho, et al.
Publicado: (2025)
Unlocking Emergent Modularity in Large Language Models
por: Qiu, Zihan, et al.
Publicado: (2023)
por: Qiu, Zihan, et al.
Publicado: (2023)
Interpretable Steering of Large Language Models with Feature Guided Activation Additions
por: Soo, Samuel, et al.
Publicado: (2025)
por: Soo, Samuel, et al.
Publicado: (2025)
Tequila: Trapping-free Ternary Quantization for Large Language Models
por: Huang, Hong, et al.
Publicado: (2025)
por: Huang, Hong, et al.
Publicado: (2025)
Self-Tuning Sparse Attention: Multi-Fidelity Hyperparameter Optimization for Transformer Acceleration
por: Dev, Arundhathi, et al.
Publicado: (2026)
por: Dev, Arundhathi, et al.
Publicado: (2026)
Foundations of Large Language Models
por: Xiao, Tong, et al.
Publicado: (2025)
por: Xiao, Tong, et al.
Publicado: (2025)
From Narratives to Probabilistic Reasoning: Predicting and Interpreting Drivers' Hazardous Actions in Crashes Using Large Language Model
por: Chen, Boyou, et al.
Publicado: (2025)
por: Chen, Boyou, et al.
Publicado: (2025)
Doubly Robust Alignment for Large Language Models
por: Xu, Erhan, et al.
Publicado: (2025)
por: Xu, Erhan, et al.
Publicado: (2025)
Are Large Language Models In-Context Graph Learners?
por: Li, Jintang, et al.
Publicado: (2025)
por: Li, Jintang, et al.
Publicado: (2025)
Ejemplares similares
-
Feature-Guided SAE Steering for Refusal-Rate Control using Contrasting Prompts
por: Bhargav, Samaksh, et al.
Publicado: (2025) -
Interpretable Physics Reasoning and Performance Taxonomy in Vision-Language Models
por: Pawar, Pranav, et al.
Publicado: (2025) -
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models
por: Wang, Yan, et al.
Publicado: (2026) -
Using Large Language Models for Hyperparameter Optimization
por: Zhang, Michael R., et al.
Publicado: (2023) -
What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Questioning
por: Javaji, Shashidhar Reddy, et al.
Publicado: (2024)