:: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Qiu, Zihan, Huang, Zeyu, Huang, Youcheng, Fu, Jie
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computation and Language
Accesso online:	https://arxiv.org/abs/2402.12233
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

A Closer Look into Mixture-of-Experts in Large Language Models
di: Lo, Ka Man, et al.
Pubblicazione: (2024)

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
di: Du, Wenyu, et al.
Pubblicazione: (2024)

Layerwise Recurrent Router for Mixture-of-Experts
di: Qiu, Zihan, et al.
Pubblicazione: (2024)

Post-hoc Reward Calibration: A Case Study on Length Bias
di: Huang, Zeyu, et al.
Pubblicazione: (2024)

Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers
di: Chen, Lei, et al.
Pubblicazione: (2024)

F-MALLOC: Feed-forward Memory Allocation for Continual Learning in Neural Machine Translation
di: Wu, Junhong, et al.
Pubblicazione: (2024)

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
di: Brandon, William, et al.
Pubblicazione: (2024)

Unlocking Continual Learning Abilities in Language Models
di: Du, Wenyu, et al.
Pubblicazione: (2024)

A Controllable Examination for Long-Context Language Models
di: Yang, Yijun, et al.
Pubblicazione: (2025)

Cross-model Transferability among Large Language Models on the Platonic Representations of Concepts
di: Huang, Youcheng, et al.
Pubblicazione: (2025)

Key-Value Means: Transformers with Expandable Block-Recurrent Compressed Memory
di: Goldstein, Daniel, et al.
Pubblicazione: (2026)

Trellis: Learning to Compress Key-Value Memory in Attention Models
di: Karami, Mahdi, et al.
Pubblicazione: (2025)

See the Unseen: Better Context-Consistent Knowledge-Editing by Noises
di: Huang, Youcheng, et al.
Pubblicazione: (2024)

Steering Information Utility in Key-Value Memory for Language Model Post-Training
di: Deng, Chunyuan, et al.
Pubblicazione: (2025)

Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage
di: Hu, Jinwei, et al.
Pubblicazione: (2026)

Evidence Absence Is Not Evidence Insufficiency: Diagnosing NEI Construction Artifacts in Fact Verification
di: Qiu, Jingxi, et al.
Pubblicazione: (2026)

Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
di: Feng, Duanyu, et al.
Pubblicazione: (2024)

Assessing Adversarial Robustness of Large Language Models: An Empirical Study
di: Yang, Zeyu, et al.
Pubblicazione: (2024)

Unlocking Emergent Modularity in Large Language Models
di: Qiu, Zihan, et al.
Pubblicazione: (2023)

Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling
di: Huang, Zeyu, et al.
Pubblicazione: (2025)

On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
di: Cai, Deng, et al.
Pubblicazione: (2024)

Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
di: Lin, Tzu-Quan, et al.
Pubblicazione: (2025)

SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs
di: Jie, Shibo, et al.
Pubblicazione: (2025)

Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning Models to Ask for Information
di: Huang, Youcheng, et al.
Pubblicazione: (2025)

Responsible Agentic AI Requires Explicit Provenance
di: Hu, Jinwei, et al.
Pubblicazione: (2026)

CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
di: Liu, Zeyu Leo, et al.
Pubblicazione: (2024)

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
di: Qiu, Zihan, et al.
Pubblicazione: (2025)

MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers
di: Ding, Ning, et al.
Pubblicazione: (2024)

Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers
di: Bozic, Vukasin, et al.
Pubblicazione: (2023)

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
di: Jiang, Ting, et al.
Pubblicazione: (2024)

Strong Memory, Weak Control: An Empirical Study of Executive Functioning in LLMs
di: de Langis, Karin, et al.
Pubblicazione: (2025)

A Unified View of Attention and Residual Sinks: Outlier-Driven Rescaling is Essential for Transformer Training
di: Qiu, Zihan, et al.
Pubblicazione: (2026)

Most Transformer Modifications Still Do Not Transfer at 1-3B: A 2020-2026 Update to Narang et al. (2021) with Downstream Evaluation and a Noise Floor
di: Zhao, Yang, et al.
Pubblicazione: (2026)

Think Before You Act: Decision Transformers with Working Memory
di: Kang, Jikun, et al.
Pubblicazione: (2023)

DictLLM: Harnessing Key-Value Data Structures with Large Language Models for Enhanced Medical Diagnostics
di: Guo, YiQiu, et al.
Pubblicazione: (2024)

What Training Data Teaches RL Memory Agents: An Empirical Study of Curriculum Effects in Memory-Augmented QA
di: He, Xinjie, et al.
Pubblicazione: (2026)

Working Memory Capacity of ChatGPT: An Empirical Study
di: Gong, Dongyu, et al.
Pubblicazione: (2023)

Signatures of human-like processing in Transformer forward passes
di: Hu, Jennifer, et al.
Pubblicazione: (2025)

ZoomR: Memory Efficient Reasoning through Multi-Granularity Key Value Retrieval
di: Yang, David H., et al.
Pubblicazione: (2026)

"You Are Rejected!": An Empirical Study of Large Language Models Taking Hiring Evaluations
di: Fu, Dingjie, et al.
Pubblicazione: (2025)