Saved in:
| Main Authors: | Watanabe, Yusuke, Kobashi, Yohei, Kojima, Takeshi, Iwasawa, Yusuke, Okuno, Yasushi, Matsuo, Yutaka |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.22771 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Inconsistent Tokenizations Cause Language Models to be Perplexed by Japanese Grammar
by: Gambardella, Andrew, et al.
Published: (2025)
by: Gambardella, Andrew, et al.
Published: (2025)
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
by: Matsutani, Kohsei, et al.
Published: (2025)
by: Matsutani, Kohsei, et al.
Published: (2025)
Dynamic Injection of Entity Knowledge into Dense Retrievers
by: Yamada, Ikuya, et al.
Published: (2025)
by: Yamada, Ikuya, et al.
Published: (2025)
Automated Refinement of Essay Scoring Rubrics for Language Models via Reflect-and-Revise
by: Harada, Keno, et al.
Published: (2025)
by: Harada, Keno, et al.
Published: (2025)
On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons
by: Kojima, Takeshi, et al.
Published: (2024)
by: Kojima, Takeshi, et al.
Published: (2024)
Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models
by: Cao, Qi, et al.
Published: (2026)
by: Cao, Qi, et al.
Published: (2026)
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
by: Minegishi, Gouki, et al.
Published: (2025)
by: Minegishi, Gouki, et al.
Published: (2025)
Understanding Emergent Misalignment via Feature Superposition Geometry
by: Minegishi, Gouki, et al.
Published: (2026)
by: Minegishi, Gouki, et al.
Published: (2026)
$\infty$-MoE: Generalizing Mixture of Experts to Infinite Experts
by: Takashiro, Shota, et al.
Published: (2026)
by: Takashiro, Shota, et al.
Published: (2026)
Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training
by: Matsutani, Kohsei, et al.
Published: (2026)
by: Matsutani, Kohsei, et al.
Published: (2026)
Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning
by: Takashiro, Shota, et al.
Published: (2024)
by: Takashiro, Shota, et al.
Published: (2024)
Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?
by: Uchiyama, Fumiya, et al.
Published: (2024)
by: Uchiyama, Fumiya, et al.
Published: (2024)
Safe Transformer: An Explicit Safety Bit For Interpretable And Controllable Alignment
by: Feng, Jingyuan, et al.
Published: (2026)
by: Feng, Jingyuan, et al.
Published: (2026)
Language Models Do Hard Arithmetic Tasks Easily and Hardly Do Easy Arithmetic Tasks
by: Gambardella, Andrew, et al.
Published: (2024)
by: Gambardella, Andrew, et al.
Published: (2024)
Bridging Lottery Ticket and Grokking: Understanding Grokking from Inner Structure of Networks
by: Minegishi, Gouki, et al.
Published: (2023)
by: Minegishi, Gouki, et al.
Published: (2023)
Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words
by: Minegishi, Gouki, et al.
Published: (2025)
by: Minegishi, Gouki, et al.
Published: (2025)
When Instructions Multiply: Measuring and Estimating LLM Capabilities of Multiple Instructions Following
by: Harada, Keno, et al.
Published: (2025)
by: Harada, Keno, et al.
Published: (2025)
Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence
by: Minegishi, Gouki, et al.
Published: (2025)
by: Minegishi, Gouki, et al.
Published: (2025)
Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection
by: Yang, Bo, et al.
Published: (2025)
by: Yang, Bo, et al.
Published: (2025)
AQA-TTRL: Self-Adaptation in Audio Question Answering with Test-Time Reinforcement Learning
by: Zhang, Haoyu, et al.
Published: (2025)
by: Zhang, Haoyu, et al.
Published: (2025)
Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials
by: Furuta, Hiroki, et al.
Published: (2024)
by: Furuta, Hiroki, et al.
Published: (2024)
QuadNorm: Resolution-Robust Normalization for Neural Operators
by: Kim, Bum Jun, et al.
Published: (2026)
by: Kim, Bum Jun, et al.
Published: (2026)
Unlocking Noise-Resistant Vision: Key Architectural Secrets for Robust Models
by: Kim, Bum Jun, et al.
Published: (2025)
by: Kim, Bum Jun, et al.
Published: (2025)
How to Make the Most of LLMs' Grammatical Knowledge for Acceptability Judgments
by: Ide, Yusuke, et al.
Published: (2024)
by: Ide, Yusuke, et al.
Published: (2024)
EC-Bench: Enumeration and Counting Benchmark for Ultra-Long Videos
by: Tsuchiya, Fumihiko, et al.
Published: (2026)
by: Tsuchiya, Fumihiko, et al.
Published: (2026)
Realtime Data-Efficient Portrait Stylization Based On Geometric Alignment
by: Wang, Xinrui, et al.
Published: (2022)
by: Wang, Xinrui, et al.
Published: (2022)
WorldPack: Compressed Memory Improves Spatial Consistency in Video World Modeling
by: Oshima, Yuta, et al.
Published: (2025)
by: Oshima, Yuta, et al.
Published: (2025)
Leave No Observation Behind: Real-time Correction for VLA Action Chunks
by: Sendai, Kohei, et al.
Published: (2025)
by: Sendai, Kohei, et al.
Published: (2025)
Retrieve-Augmented Generation for Speeding up Diffusion Policy without Additional Training
by: Odonchimed, Sodtavilan, et al.
Published: (2025)
by: Odonchimed, Sodtavilan, et al.
Published: (2025)
SPARK: Graph-Based Online Semantic Integration System for Robot Task Planning
by: Shirasaka, Mimo, et al.
Published: (2025)
by: Shirasaka, Mimo, et al.
Published: (2025)
Reliable Text-to-SQL with Adaptive Abstention
by: Chen, Kaiwen, et al.
Published: (2025)
by: Chen, Kaiwen, et al.
Published: (2025)
Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability
by: Kim, Bum Jun, et al.
Published: (2026)
by: Kim, Bum Jun, et al.
Published: (2026)
ColorizeDiffusion v2: Enhancing Reference-based Sketch Colorization Through Separating Utilities
by: Yan, Dingkun, et al.
Published: (2025)
by: Yan, Dingkun, et al.
Published: (2025)
Thinking While Listening: Fast-Slow Recurrence for Long-Horizon Sequential Modeling
by: Takashiro, Shota, et al.
Published: (2026)
by: Takashiro, Shota, et al.
Published: (2026)
C-voting: Confidence-Based Test-Time Voting without Explicit Energy Functions
by: Kubo, Kenji, et al.
Published: (2026)
by: Kubo, Kenji, et al.
Published: (2026)
Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learning
by: Wang, Ru, et al.
Published: (2025)
by: Wang, Ru, et al.
Published: (2025)
Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization
by: Wang, Ru, et al.
Published: (2025)
by: Wang, Ru, et al.
Published: (2025)
Theoretical and Empirical Analysis of Adaptive Entry Point Selection for Graph-based Approximate Nearest Neighbor Search
by: Oguri, Yutaro, et al.
Published: (2024)
by: Oguri, Yutaro, et al.
Published: (2024)
A Comprehensive Survey on Physical Risk Control in the Era of Foundation Model-enabled Robotics
by: Kojima, Takeshi, et al.
Published: (2025)
by: Kojima, Takeshi, et al.
Published: (2025)
Mathematical Foundations of Poisoning Attacks on Linear Regression over Cumulative Distribution Functions
by: Sato, Atsuki, et al.
Published: (2026)
by: Sato, Atsuki, et al.
Published: (2026)
Similar Items
-
Inconsistent Tokenizations Cause Language Models to be Perplexed by Japanese Grammar
by: Gambardella, Andrew, et al.
Published: (2025) -
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
by: Matsutani, Kohsei, et al.
Published: (2025) -
Dynamic Injection of Entity Knowledge into Dense Retrievers
by: Yamada, Ikuya, et al.
Published: (2025) -
Automated Refinement of Essay Scoring Rubrics for Language Models via Reflect-and-Revise
by: Harada, Keno, et al.
Published: (2025) -
On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons
by: Kojima, Takeshi, et al.
Published: (2024)