Saved in:
| Main Authors: | Ma, Xuezhe, Yang, Xiaomeng, Xiong, Wenhan, Chen, Beidi, Yu, Lili, Zhang, Hao, May, Jonathan, Zettlemoyer, Luke, Levy, Omer, Zhou, Chunting |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.08801 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
by: Zhou, Chunting, et al.
Published: (2024)
by: Zhou, Chunting, et al.
Published: (2024)
LMFusion: Adapting Pretrained Language Models for Multimodal Generation
by: Shi, Weijia, et al.
Published: (2024)
by: Shi, Weijia, et al.
Published: (2024)
Self-Alignment with Instruction Backtranslation
by: Li, Xian, et al.
Published: (2023)
by: Li, Xian, et al.
Published: (2023)
Gecko: An Efficient Neural Architecture Inherently Processing Sequences with Arbitrary Lengths
by: Ma, Xuezhe, et al.
Published: (2026)
by: Ma, Xuezhe, et al.
Published: (2026)
CAT: Content-Adaptive Image Tokenization
by: Shen, Junhong, et al.
Published: (2025)
by: Shen, Junhong, et al.
Published: (2025)
ALMA: Alignment with Minimal Annotation
by: Yasunaga, Michihiro, et al.
Published: (2024)
by: Yasunaga, Michihiro, et al.
Published: (2024)
In-context Pretraining: Language Modeling Beyond Document Boundaries
by: Shi, Weijia, et al.
Published: (2023)
by: Shi, Weijia, et al.
Published: (2023)
LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems
by: Xu, Nan, et al.
Published: (2024)
by: Xu, Nan, et al.
Published: (2024)
Towards Chapter-to-Chapter Context-Aware Literary Translation via Large Language Models
by: Jin, Linghao, et al.
Published: (2024)
by: Jin, Linghao, et al.
Published: (2024)
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices
by: Svirschevski, Ruslan, et al.
Published: (2024)
by: Svirschevski, Ruslan, et al.
Published: (2024)
Modeling Community Attitude through Reaction Tone: A Human-AI Collaborative Framework for Evaluating LLM Alignment with Linguistic Behaviors in Online Communities
by: Wen, Nuan, et al.
Published: (2026)
by: Wen, Nuan, et al.
Published: (2026)
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
by: Liang, Weixin, et al.
Published: (2024)
by: Liang, Weixin, et al.
Published: (2024)
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?
by: Zhou, Yang, et al.
Published: (2025)
by: Zhou, Yang, et al.
Published: (2025)
Efficient Pretraining Length Scaling
by: Wu, Bohong, et al.
Published: (2025)
by: Wu, Bohong, et al.
Published: (2025)
Beyond Length: Quantifying Long-Range Information for Long-Context LLM Pretraining Data
by: Deng, Haoran, et al.
Published: (2025)
by: Deng, Haoran, et al.
Published: (2025)
Craw4LLM: Efficient Web Crawling for LLM Pretraining
by: Yu, Shi, et al.
Published: (2025)
by: Yu, Shi, et al.
Published: (2025)
Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity
by: Liang, Weixin, et al.
Published: (2025)
by: Liang, Weixin, et al.
Published: (2025)
Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation
by: Dong, Harry, et al.
Published: (2024)
by: Dong, Harry, et al.
Published: (2024)
DecoPrompt : Decoding Prompts Reduces Hallucinations when Large Language Models Meet False Premises
by: Xu, Nan, et al.
Published: (2024)
by: Xu, Nan, et al.
Published: (2024)
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
by: Dong, Harry, et al.
Published: (2024)
by: Dong, Harry, et al.
Published: (2024)
Byte Latent Transformer: Patches Scale Better Than Tokens
by: Pagnoni, Artidoro, et al.
Published: (2024)
by: Pagnoni, Artidoro, et al.
Published: (2024)
Megalodon, mako shark and planktonic foraminifera from the continental shelf off Portugal and their age
by: M.T. ANTUNES
Published: (2015)
by: M.T. ANTUNES
Published: (2015)
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
by: Ren, Liliang, et al.
Published: (2024)
by: Ren, Liliang, et al.
Published: (2024)
Squeezed Attention: Accelerating Long Context Length LLM Inference
by: Hooper, Coleman, et al.
Published: (2024)
by: Hooper, Coleman, et al.
Published: (2024)
LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models
by: Han, Chi, et al.
Published: (2023)
by: Han, Chi, et al.
Published: (2023)
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
by: Sun, Hanshi, et al.
Published: (2024)
by: Sun, Hanshi, et al.
Published: (2024)
Art Unlimited?
by: Schultheis, Franz, et al.
Published: (2016)
by: Schultheis, Franz, et al.
Published: (2016)
Detecting Pretraining Data from Large Language Models
by: Shi, Weijia, et al.
Published: (2023)
by: Shi, Weijia, et al.
Published: (2023)
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
by: Luo, Cheng, et al.
Published: (2025)
by: Luo, Cheng, et al.
Published: (2025)
Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap
by: Yang, Wenhan, et al.
Published: (2025)
by: Yang, Wenhan, et al.
Published: (2025)
Keep Guessing? When Considering Inference Scaling, Mind the Baselines
by: Yona, Gal, et al.
Published: (2024)
by: Yona, Gal, et al.
Published: (2024)
MALI: Unlimited Mandate
Published: (2025)
Published: (2025)
Learning Center Unlimited.
by: Vivrette, Lyndon
Published: (1974)
by: Vivrette, Lyndon
Published: (1974)
Computational Tradeoffs in Image Synthesis: Diffusion, Masked-Token, and Next-Token Prediction
by: Kilian, Maciej, et al.
Published: (2024)
by: Kilian, Maciej, et al.
Published: (2024)
Comparing Hallucination Detection Metrics for Multilingual Generation
by: Kang, Haoqiang, et al.
Published: (2024)
by: Kang, Haoqiang, et al.
Published: (2024)
Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models
by: Yasunaga, Michihiro, et al.
Published: (2025)
by: Yasunaga, Michihiro, et al.
Published: (2025)
(Mis)Fitting: A Survey of Scaling Laws
by: Li, Margaret, et al.
Published: (2025)
by: Li, Margaret, et al.
Published: (2025)
PatentEdits: Framing Patent Novelty as Textual Entailment
by: Lee, Ryan, et al.
Published: (2024)
by: Lee, Ryan, et al.
Published: (2024)
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
by: Yang, Xinyu, et al.
Published: (2025)
by: Yang, Xinyu, et al.
Published: (2025)
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
by: Qin, Zhen, et al.
Published: (2024)
by: Qin, Zhen, et al.
Published: (2024)
Similar Items
-
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
by: Zhou, Chunting, et al.
Published: (2024) -
LMFusion: Adapting Pretrained Language Models for Multimodal Generation
by: Shi, Weijia, et al.
Published: (2024) -
Self-Alignment with Instruction Backtranslation
by: Li, Xian, et al.
Published: (2023) -
Gecko: An Efficient Neural Architecture Inherently Processing Sequences with Arbitrary Lengths
by: Ma, Xuezhe, et al.
Published: (2026) -
CAT: Content-Adaptive Image Tokenization
by: Shen, Junhong, et al.
Published: (2025)