Guardado en:
| Autores principales: | Ding, Jianrong, Chen, Muxi, Zhao, Chenchen, Xu, Qiang |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2509.22015 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
\textit{FocaLogic}: Logic-Based Interpretation of Visual Model Decisions
por: Zhao, Chenchen, et al.
Publicado: (2026)
por: Zhao, Chenchen, et al.
Publicado: (2026)
HiBug2: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging
por: Chen, Muxi, et al.
Publicado: (2025)
por: Chen, Muxi, et al.
Publicado: (2025)
AlignSAE: Concept-Aligned Sparse Autoencoders
por: Yang, Minglai, et al.
Publicado: (2025)
por: Yang, Minglai, et al.
Publicado: (2025)
FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution
por: Ding, Jianrong, et al.
Publicado: (2026)
por: Ding, Jianrong, et al.
Publicado: (2026)
Behavioral Steering in a 35B MoE Language Model via SAE-Decoded Probe Vectors: One Agency Axis, Not Five Traits
por: Yap, Jia Qing
Publicado: (2026)
por: Yap, Jia Qing
Publicado: (2026)
Structural Disentanglement of Causal and Correlated Concepts
por: Zhao, Qilong, et al.
Publicado: (2024)
por: Zhao, Qilong, et al.
Publicado: (2024)
VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set
por: Shen, Shufan, et al.
Publicado: (2025)
por: Shen, Shufan, et al.
Publicado: (2025)
Learning Latent and Hierarchical Structures in Cognitive Diagnosis Models
por: Ma, Chenchen, et al.
Publicado: (2021)
por: Ma, Chenchen, et al.
Publicado: (2021)
Causal-INSIGHT: Probing Temporal Models to Extract Causal Structure
por: Redden, Benjamin, et al.
Publicado: (2026)
por: Redden, Benjamin, et al.
Publicado: (2026)
Tokenized SAEs: Disentangling SAE Reconstructions
por: Dooms, Thomas, et al.
Publicado: (2025)
por: Dooms, Thomas, et al.
Publicado: (2025)
Evaluating SAE interpretability without explanations
por: Paulo, Gonçalo, et al.
Publicado: (2025)
por: Paulo, Gonçalo, et al.
Publicado: (2025)
MCCE: Missingness-aware Causal Concept Explainer
por: Gao, Jifan, et al.
Publicado: (2024)
por: Gao, Jifan, et al.
Publicado: (2024)
Causally Reliable Concept Bottleneck Models
por: De Felice, Giovanni, et al.
Publicado: (2025)
por: De Felice, Giovanni, et al.
Publicado: (2025)
Interpretable Reward Modeling with Active Concept Bottlenecks
por: Laguna, Sonia, et al.
Publicado: (2025)
por: Laguna, Sonia, et al.
Publicado: (2025)
Evolution of SAE Features Across Layers in LLMs
por: Balcells, Daniel, et al.
Publicado: (2024)
por: Balcells, Daniel, et al.
Publicado: (2024)
SAE: Single Architecture Ensemble Neural Networks
por: Ferianc, Martin, et al.
Publicado: (2024)
por: Ferianc, Martin, et al.
Publicado: (2024)
Frequency Enhanced Pre-training for Cross-city Few-shot Traffic Forecasting
por: Liu, Zhanyu, et al.
Publicado: (2024)
por: Liu, Zhanyu, et al.
Publicado: (2024)
Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning
por: Dominici, Gabriele, et al.
Publicado: (2024)
por: Dominici, Gabriele, et al.
Publicado: (2024)
Unveiling and Causalizing CoT: A Causal Pespective
por: Fu, Jiarun, et al.
Publicado: (2025)
por: Fu, Jiarun, et al.
Publicado: (2025)
SAE-V: Interpreting Multimodal Models for Enhanced Alignment
por: Lou, Hantao, et al.
Publicado: (2025)
por: Lou, Hantao, et al.
Publicado: (2025)
Structural Causality-based Generalizable Concept Discovery Models
por: Sinha, Sanchit, et al.
Publicado: (2024)
por: Sinha, Sanchit, et al.
Publicado: (2024)
SCALAR: Benchmarking SAE Interaction Sparsity in Toy LLMs
por: Fillingham, Sean P., et al.
Publicado: (2025)
por: Fillingham, Sean P., et al.
Publicado: (2025)
ActiveCQ: Active Estimation of Causal Quantities
por: Gao, Erdun, et al.
Publicado: (2025)
por: Gao, Erdun, et al.
Publicado: (2025)
CircuitProbe: Tracing Visual Temporal Evidence Flow in Video Language Models
por: Zhang, Yiming, et al.
Publicado: (2025)
por: Zhang, Yiming, et al.
Publicado: (2025)
TimeSAE: Sparse Decoding for Faithful Explanations of Black-Box Time Series Models
por: Oublal, Khalid, et al.
Publicado: (2026)
por: Oublal, Khalid, et al.
Publicado: (2026)
SAE-FD: Sparse Autoencoder Feature Distillation for Continual Learning of Large Language Models
por: Zhang, Mingxu, et al.
Publicado: (2026)
por: Zhang, Mingxu, et al.
Publicado: (2026)
Dense SAE Latents Are Features, Not Bugs
por: Sun, Xiaoqing, et al.
Publicado: (2025)
por: Sun, Xiaoqing, et al.
Publicado: (2025)
OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features
por: Korznikov, Anton, et al.
Publicado: (2025)
por: Korznikov, Anton, et al.
Publicado: (2025)
Evaluating Synthetic Activations composed of SAE Latents in GPT-2
por: Giglemiani, Giorgi, et al.
Publicado: (2024)
por: Giglemiani, Giorgi, et al.
Publicado: (2024)
Tree SAE: Learning Hierarchical Feature Structures in Sparse Autoencoders
por: Cao, Tue M., et al.
Publicado: (2026)
por: Cao, Tue M., et al.
Publicado: (2026)
BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning
por: Lin, Haohong, et al.
Publicado: (2024)
por: Lin, Haohong, et al.
Publicado: (2024)
Mind Dreamer: Untethering Imagination via Active Causal Intervention on Latent Manifolds
por: Xu, Shaojun, et al.
Publicado: (2026)
por: Xu, Shaojun, et al.
Publicado: (2026)
Testing for Causal Fairness
por: Fu, Jiarun, et al.
Publicado: (2025)
por: Fu, Jiarun, et al.
Publicado: (2025)
PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding
por: Koromilas, Panagiotis, et al.
Publicado: (2026)
por: Koromilas, Panagiotis, et al.
Publicado: (2026)
A Survey of Deep Causal Models and Their Industrial Applications
por: Li, Zongyu, et al.
Publicado: (2022)
por: Li, Zongyu, et al.
Publicado: (2022)
Learning Concept-Based Causal Transition and Symbolic Reasoning for Visual Planning
por: Qian, Yilue, et al.
Publicado: (2023)
por: Qian, Yilue, et al.
Publicado: (2023)
Integrating Active Learning in Causal Inference with Interference: A Novel Approach in Online Experiments
por: Zhu, Hongtao, et al.
Publicado: (2024)
por: Zhu, Hongtao, et al.
Publicado: (2024)
Active and Passive Causal Inference Learning
por: Im, Daniel Jiwoong, et al.
Publicado: (2023)
por: Im, Daniel Jiwoong, et al.
Publicado: (2023)
Causal Evidence that Language Models use Confidence to Drive Behavior
por: Kumaran, Dharshan, et al.
Publicado: (2026)
por: Kumaran, Dharshan, et al.
Publicado: (2026)
HH-SAE: Discovering and Steering Hierarchical Knowledge of Complex Manifolds
por: Wu, Honghan, et al.
Publicado: (2026)
por: Wu, Honghan, et al.
Publicado: (2026)
Ejemplares similares
-
\textit{FocaLogic}: Logic-Based Interpretation of Visual Model Decisions
por: Zhao, Chenchen, et al.
Publicado: (2026) -
HiBug2: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging
por: Chen, Muxi, et al.
Publicado: (2025) -
AlignSAE: Concept-Aligned Sparse Autoencoders
por: Yang, Minglai, et al.
Publicado: (2025) -
FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution
por: Ding, Jianrong, et al.
Publicado: (2026) -
Behavioral Steering in a 35B MoE Language Model via SAE-Decoded Probe Vectors: One Agency Axis, Not Five Traits
por: Yap, Jia Qing
Publicado: (2026)