Saved in:
| Main Authors: | Pătrăucean, Viorica, He, Xu Owen, Heyward, Joseph, Zhang, Chuhan, Sajjadi, Mehdi S. M., Muraru, George-Cristian, Zholus, Artem, Karami, Mahdi, Goroshin, Ross, Chen, Yutian, Osindero, Simon, Carreira, João, Pascanu, Razvan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.14294 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
by: Zholus, Artem, et al.
Published: (2025)
by: Zholus, Artem, et al.
Published: (2025)
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
by: Heyward, Joseph, et al.
Published: (2024)
by: Heyward, Joseph, et al.
Published: (2024)
Learning from Streaming Video with Orthogonal Gradients
by: Han, Tengda, et al.
Published: (2025)
by: Han, Tengda, et al.
Published: (2025)
How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models
by: Kumaran, Dharshan, et al.
Published: (2025)
by: Kumaran, Dharshan, et al.
Published: (2025)
Lattice: Learning to Efficiently Compress the Memory
by: Karami, Mahdi, et al.
Published: (2025)
by: Karami, Mahdi, et al.
Published: (2025)
Perception Test 2025: Challenge Summary and a Unified VQA Extension
by: Heyward, Joseph, et al.
Published: (2026)
by: Heyward, Joseph, et al.
Published: (2026)
How LLMs Detect and Correct Their Own Errors: The Role of Internal Confidence Signals
by: Kumaran, Dharshan, et al.
Published: (2026)
by: Kumaran, Dharshan, et al.
Published: (2026)
Causal Evidence that Language Models use Confidence to Drive Behavior
by: Kumaran, Dharshan, et al.
Published: (2026)
by: Kumaran, Dharshan, et al.
Published: (2026)
How do LLMs Compute Verbal Confidence
by: Kumaran, Dharshan, et al.
Published: (2026)
by: Kumaran, Dharshan, et al.
Published: (2026)
MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling
by: Karami, Mahdi, et al.
Published: (2025)
by: Karami, Mahdi, et al.
Published: (2025)
Perplexity Cannot Always Tell Right from Wrong
by: Veličković, Petar, et al.
Published: (2026)
by: Veličković, Petar, et al.
Published: (2026)
Learning from One Continuous Video Stream
by: Carreira, João, et al.
Published: (2023)
by: Carreira, João, et al.
Published: (2023)
BootsTAP: Bootstrapped Training for Tracking-Any-Point
by: Doersch, Carl, et al.
Published: (2024)
by: Doersch, Carl, et al.
Published: (2024)
TAPNext++: What's Next for Tracking Any Point (TAP)?
by: Jung, Sebastian, et al.
Published: (2026)
by: Jung, Sebastian, et al.
Published: (2026)
Continuous Histogram Loss: Beyond Neural Similarity
by: Zholus, Artem, et al.
Published: (2020)
by: Zholus, Artem, et al.
Published: (2020)
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
by: De, Soham, et al.
Published: (2024)
by: De, Soham, et al.
Published: (2024)
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
by: Papalampidi, Pinelopi, et al.
Published: (2023)
by: Papalampidi, Pinelopi, et al.
Published: (2023)
Dynamic Reflections: Probing Video Representations with Text Alignment
by: Zhu, Tyler, et al.
Published: (2025)
by: Zhu, Tyler, et al.
Published: (2025)
Scaling 4D Representations
by: Carreira, João, et al.
Published: (2024)
by: Carreira, João, et al.
Published: (2024)
Unique Lives, Shared World: Learning from Single-Life Videos
by: Han, Tengda, et al.
Published: (2025)
by: Han, Tengda, et al.
Published: (2025)
Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues
by: Orvieto, Antonio, et al.
Published: (2023)
by: Orvieto, Antonio, et al.
Published: (2023)
Reconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models
by: Nilaksh, et al.
Published: (2026)
by: Nilaksh, et al.
Published: (2026)
Deep Grokking: Would Deep Neural Networks Generalize Better?
by: Fan, Simin, et al.
Published: (2024)
by: Fan, Simin, et al.
Published: (2024)
Attention as a Hypernetwork
by: Schug, Simon, et al.
Published: (2024)
by: Schug, Simon, et al.
Published: (2024)
Filter Equivariant Functions: A symmetric account of length-general extrapolation on lists
by: Lewis, Owen, et al.
Published: (2025)
by: Lewis, Owen, et al.
Published: (2025)
HiGen: Hierarchical Graph Generative Networks
by: Karami, Mahdi
Published: (2023)
by: Karami, Mahdi
Published: (2023)
NoProp: Training Neural Networks without Full Back-propagation or Full Forward-propagation
by: Li, Qinyu, et al.
Published: (2025)
by: Li, Qinyu, et al.
Published: (2025)
Meta-learning how to Share Credit among Macro-Actions
by: Hosu, Ionel-Alexandru, et al.
Published: (2025)
by: Hosu, Ionel-Alexandru, et al.
Published: (2025)
Latent Space Representations of Neural Algorithmic Reasoners
by: Mirjanić, Vladimir V., et al.
Published: (2023)
by: Mirjanić, Vladimir V., et al.
Published: (2023)
Revisiting Adam for Streaming Reinforcement Learning
by: Gogianu, Florin, et al.
Published: (2026)
by: Gogianu, Florin, et al.
Published: (2026)
State Soup: In-Context Skill Learning, Retrieval and Mixing
by: Pióro, Maciej, et al.
Published: (2024)
by: Pióro, Maciej, et al.
Published: (2024)
Mastering Memory Tasks with World Models
by: Samsami, Mohammad Reza, et al.
Published: (2024)
by: Samsami, Mohammad Reza, et al.
Published: (2024)
Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers
by: Wei, Xiuying, et al.
Published: (2024)
by: Wei, Xiuying, et al.
Published: (2024)
What Can Grokking Teach Us About Learning Under Nonstationarity?
by: Lyle, Clare, et al.
Published: (2025)
by: Lyle, Clare, et al.
Published: (2025)
Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis
by: Wei, Xiuying, et al.
Published: (2024)
by: Wei, Xiuying, et al.
Published: (2024)
Softmax is not Enough (for Sharp Size Generalisation)
by: Veličković, Petar, et al.
Published: (2024)
by: Veličković, Petar, et al.
Published: (2024)
RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling
by: Wei, Xiuying, et al.
Published: (2025)
by: Wei, Xiuying, et al.
Published: (2025)
Fine-Tuned In-Context Learners for Efficient Adaptation
by: Bornschein, Jorg, et al.
Published: (2025)
by: Bornschein, Jorg, et al.
Published: (2025)
LayerLock: Non-collapsing Representation Learning with Progressive Freezing
by: Erdogan, Goker, et al.
Published: (2025)
by: Erdogan, Goker, et al.
Published: (2025)
Effect of stocking density on growth Gracilariopsis persica in Persian Gulf
by: Karami, Esmaeil, et al.
Published: (2013)
by: Karami, Esmaeil, et al.
Published: (2013)
Similar Items
-
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
by: Zholus, Artem, et al.
Published: (2025) -
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
by: Heyward, Joseph, et al.
Published: (2024) -
Learning from Streaming Video with Orthogonal Gradients
by: Han, Tengda, et al.
Published: (2025) -
How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models
by: Kumaran, Dharshan, et al.
Published: (2025) -
Lattice: Learning to Efficiently Compress the Memory
by: Karami, Mahdi, et al.
Published: (2025)