:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Deng, Boyi, Wan, Yu, Yang, Baosong, Huang, Fei, Wang, Wenjie, Feng, Fuli
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computation and Language
Online-Zugang:	https://arxiv.org/abs/2507.14894
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders
von: Deng, Boyi, et al.
Veröffentlicht: (2025)

Controllable LLM Reasoning via Sparse Autoencoder-Based Steering
von: Fang, Yi, et al.
Veröffentlicht: (2026)

CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG
von: Deng, Boyi, et al.
Veröffentlicht: (2024)

P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs
von: Zhang, Yidan, et al.
Veröffentlicht: (2024)

DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders
von: Wang, Xu, et al.
Veröffentlicht: (2026)

Towards Cross-lingual Values Judgment: A Consensus-Pluralism Perspective
von: Chen, Yukun, et al.
Veröffentlicht: (2026)

Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models
von: Deng, Boyi, et al.
Veröffentlicht: (2026)

Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs
von: Fang, Yi, et al.
Veröffentlicht: (2024)

Assistant-Guided Mitigation of Teacher Preference Bias in LLM-as-a-Judge
von: Liu, Zhuo, et al.
Veröffentlicht: (2025)

PART: Progressive Alignment Representation Training for Multilingual Speech-To-Text with LLMs
von: Zhang, Pei, et al.
Veröffentlicht: (2025)

CultureSynth: A Hierarchical Taxonomy-Guided and Retrieval-Augmented Framework for Cultural Question-Answer Synthesis
von: Zhang, Xinyu, et al.
Veröffentlicht: (2025)

Constrain Alignment with Sparse Autoencoders
von: Yin, Qingyu, et al.
Veröffentlicht: (2024)

SAFE: A Sparse Autoencoder-Based Framework for Robust Query Enrichment and Hallucination Mitigation in LLMs
von: Abdaljalil, Samir, et al.
Veröffentlicht: (2025)

Understanding the Effects of Domain Finetuning on LLMs
von: Tanwar, Eshaan, et al.
Veröffentlicht: (2025)

Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment
von: Li, Moxin, et al.
Veröffentlicht: (2025)

Improving Task Diversity in Label Efficient Supervised Finetuning of LLMs
von: Arabelly, Abhinav, et al.
Veröffentlicht: (2025)

Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
von: Goyal, Agam, et al.
Veröffentlicht: (2025)

Sparse Autoencoders are Capable LLM Jailbreak Mitigators
von: Assogba, Yannick, et al.
Veröffentlicht: (2026)

One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment
von: Cai, Hongru, et al.
Veröffentlicht: (2026)

CultureForest: Understanding and Evaluating Cultural Norm Grounded Reasoning in LLMs
von: Ye, Yangfan, et al.
Veröffentlicht: (2026)

Uncovering Cross-Linguistic Disparities in LLMs using Sparse Autoencoders
von: Xuan, Richmond Sin Jing, et al.
Veröffentlicht: (2025)

Locking Down the Finetuned LLMs Safety
von: Zhu, Minjun, et al.
Veröffentlicht: (2024)

Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
von: Wang, Yiming, et al.
Veröffentlicht: (2025)

Improving Sparse Memory Finetuning
von: Goyal, Satyam, et al.
Veröffentlicht: (2026)

Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training
von: Wang, Zhijun, et al.
Veröffentlicht: (2025)

SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs
von: Härle, Ruben, et al.
Veröffentlicht: (2024)

Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation
von: Hua, Zhenglin, et al.
Veröffentlicht: (2025)

Randomized Masked Finetuning: An Efficient Way to Mitigate Memorization of PIIs in LLMs
von: Joshi, Kunj, et al.
Veröffentlicht: (2025)

HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning
von: Li, Xiaoyuan, et al.
Veröffentlicht: (2025)

Right Is Not Enough: The Pitfalls of Outcome Supervision in Training LLMs for Math Reasoning
von: Guo, Jiaxing, et al.
Veröffentlicht: (2025)

Mitigating Large Language Model Hallucination with Faithful Finetuning
von: Hu, Minda, et al.
Veröffentlicht: (2024)

Sparse Autoencoder Insights on Voice Embeddings
von: Pluth, Daniel, et al.
Veröffentlicht: (2025)

Sparse Upcycling: Inference Inefficient Finetuning
von: Doubov, Sasha, et al.
Veröffentlicht: (2024)

Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders
von: Wu, Xuansheng, et al.
Veröffentlicht: (2025)

Think-While-Generating: On-the-Fly Reasoning for Personalized Long-Form Generation
von: Wang, Chengbing, et al.
Veröffentlicht: (2025)

EAVE: Efficient Product Attribute Value Extraction via Lightweight Sparse-layer Interaction
von: Yang, Li, et al.
Veröffentlicht: (2024)

Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection
von: Li, Moxin, et al.
Veröffentlicht: (2024)

Can Code-Switched Texts Activate a Knowledge Switch in LLMs? A Case Study on English-Korean Code-Switching
von: Kim, Seoyeon, et al.
Veröffentlicht: (2024)

Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs
von: Hu, Zichao, et al.
Veröffentlicht: (2024)

Enhancing LLM Language Adaption through Cross-lingual In-Context Pre-training
von: Wu, Linjuan, et al.
Veröffentlicht: (2025)