:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Yang, Hongru, Kailkhura, Bhavya, Wang, Zhangyang, Liang, Yingbin
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Machine Learning Computation and Language
Accesso online:	https://arxiv.org/abs/2410.09605
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
di: Huang, Ruiquan, et al.
Pubblicazione: (2025)

Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK
di: Yang, Hongru, et al.
Pubblicazione: (2023)

In-Context Learning with Representations: Contextual Generalization of Trained Transformers
di: Yang, Tong, et al.
Pubblicazione: (2024)

LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks
di: Pal, Soumyadeep, et al.
Pubblicazione: (2025)

Constrained Discrete Diffusion
di: Cardei, Michael, et al.
Pubblicazione: (2025)

Low-rank finetuning for LLMs: A fairness perspective
di: Das, Saswat, et al.
Pubblicazione: (2024)

UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making
di: Duan, Jinhao, et al.
Pubblicazione: (2025)

Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence
di: Nava, Andres, et al.
Pubblicazione: (2026)

Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion
di: Christopher, Jacob K, et al.
Pubblicazione: (2024)

SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning
di: Jia, Jinghan, et al.
Pubblicazione: (2024)

Constraint-Rectified Training for Efficient Chain-of-Thought
di: Wu, Qinhang, et al.
Pubblicazione: (2026)

LoCoCo: Dropping In Convolutions for Long Context Compression
di: Cai, Ruisi, et al.
Pubblicazione: (2024)

Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models
di: Duan, Jinhao, et al.
Pubblicazione: (2023)

Meta ControlNet: Enhancing Task Adaptation via Meta Learning
di: Yang, Junjie, et al.
Pubblicazione: (2023)

GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations
di: Duan, Jinhao, et al.
Pubblicazione: (2024)

Training Neural Networks as Recognizers of Formal Languages
di: Butoi, Alexandra, et al.
Pubblicazione: (2024)

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
di: Geiping, Jonas, et al.
Pubblicazione: (2025)

Reciprocal Co-Training (RCT): Coupling Gradient-Based and Non-Differentiable Models via Reinforcement Learning
di: Tian, Yunshuo, et al.
Pubblicazione: (2026)

World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurrence Statistics in Static Word Embeddings
di: Barenholtz, Elan
Pubblicazione: (2026)

Understanding Contextual Recall in Transformers: How Finetuning Enables In-Context Reasoning over Pretraining Knowledge
di: Vasudeva, Bhavya, et al.
Pubblicazione: (2026)

In-Context Occam's Razor: How Transformers Prefer Simpler Hypotheses on the Fly
di: Deora, Puneesh, et al.
Pubblicazione: (2025)

Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies
di: Bartoldson, Brian R., et al.
Pubblicazione: (2024)

Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
di: Cai, Ruisi, et al.
Pubblicazione: (2024)

Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
di: McLeish, Sean, et al.
Pubblicazione: (2025)

ElectroVizQA: How well do Multi-modal LLMs perform in Electronics Visual Question Answering?
di: Meshram, Pragati Shuddhodhan, et al.
Pubblicazione: (2024)

HIPO: Instruction Hierarchy via Constrained Reinforcement Learning
di: Chen, Keru, et al.
Pubblicazione: (2026)

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
di: Huang, Tianjin, et al.
Pubblicazione: (2025)

Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages
di: Yang, Andy, et al.
Pubblicazione: (2023)

Closed-Form Training Dynamics Reveal Learned Features and Linear Structure in Word2Vec-like Models
di: Karkada, Dhruva, et al.
Pubblicazione: (2025)

Certifiably-Robust Federated Adversarial Learning via Randomized Smoothing
di: Chen, Cheng, et al.
Pubblicazione: (2021)

Transformers Learn Low Sensitivity Functions: Investigations and Implications
di: Vasudeva, Bhavya, et al.
Pubblicazione: (2024)

Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
di: Zhao, Siyan, et al.
Pubblicazione: (2025)

Advanced Multimodal Deep Learning Architecture for Image-Text Matching
di: Wang, Jinyin, et al.
Pubblicazione: (2024)

Gated Linear Attention Transformers with Hardware-Efficient Training
di: Yang, Songlin, et al.
Pubblicazione: (2023)

Compressing LLMs: The Truth is Rarely Pure and Never Simple
di: Jaiswal, Ajay, et al.
Pubblicazione: (2023)

Non-asymptotic Convergence of Training Transformers for Next-token Prediction
di: Huang, Ruiquan, et al.
Pubblicazione: (2024)

Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
di: Li, Hongkang, et al.
Pubblicazione: (2024)

Synthetic Text Generation for Training Large Language Models via Gradient Matching
di: Nguyen, Dang, et al.
Pubblicazione: (2025)

Latent Concept Disentanglement in Transformer-based Language Models
di: Hong, Guan Zhe, et al.
Pubblicazione: (2025)

On the Duality between Gradient Transformations and Adapters
di: Torroba-Hennigen, Lucas, et al.
Pubblicazione: (2025)