Saved in:
| Main Authors: | Zuo, Jingwei, Velikanov, Maksim, Rhaiem, Dhia Eddine, Chahed, Ilyas, Belkada, Younes, Kunsch, Guillaume, Hacid, Hakim |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.05355 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers
by: Velikanov, Maksim, et al.
Published: (2026)
by: Velikanov, Maksim, et al.
Published: (2026)
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
by: Zuo, Jingwei, et al.
Published: (2025)
by: Zuo, Jingwei, et al.
Published: (2025)
Re-thinking Human Activity Recognition with Hierarchy-aware Label Relationship Modeling
by: Zuo, Jingwei, et al.
Published: (2024)
by: Zuo, Jingwei, et al.
Published: (2024)
Falcon2-11B Technical Report
by: Malartic, Quentin, et al.
Published: (2024)
by: Malartic, Quentin, et al.
Published: (2024)
Competitive Audio-Language Models with Data-Efficient Single-Stage Training on Public Data
by: Kumar, Gokul Karthik, et al.
Published: (2025)
by: Kumar, Gokul Karthik, et al.
Published: (2025)
NeurIPS 2025 E2LM Competition : Early Training Evaluation of Language Models
by: Yagoubi, Mouadh, et al.
Published: (2025)
by: Yagoubi, Mouadh, et al.
Published: (2025)
MAGNETO: Edge AI for Human Activity Recognition -- Privacy and Personalization
by: Zuo, Jingwei, et al.
Published: (2024)
by: Zuo, Jingwei, et al.
Published: (2024)
PORT: Preference Optimization on Reasoning Traces
by: Lahlou, Salem, et al.
Published: (2024)
by: Lahlou, Salem, et al.
Published: (2024)
WavLink: Compact Audio-Text Embeddings with a Global Whisper Token
by: Kumar, Gokul Karthik, et al.
Published: (2026)
by: Kumar, Gokul Karthik, et al.
Published: (2026)
Tight Convergence Rate Bounds for Optimization Under Power Law Spectral Conditions
by: Velikanov, Maksim, et al.
Published: (2022)
by: Velikanov, Maksim, et al.
Published: (2022)
SGD with memory: fundamental properties and stochastic acceleration
by: Yarotsky, Dmitry, et al.
Published: (2024)
by: Yarotsky, Dmitry, et al.
Published: (2024)
RELOOP: Recursive Retrieval with Multi-Hop Reasoner and Planners for Heterogeneous QA
by: Yang, Ruiyi, et al.
Published: (2025)
by: Yang, Ruiyi, et al.
Published: (2025)
Are Arabic Benchmarks Reliable? QIMMA's Quality-First Approach to LLM Evaluation
by: AlQadi, Leen, et al.
Published: (2026)
by: AlQadi, Leen, et al.
Published: (2026)
ViSpeR: Multilingual Audio-Visual Speech Recognition
by: Narayan, Sanath, et al.
Published: (2024)
by: Narayan, Sanath, et al.
Published: (2024)
Falcon Perception
by: Bevli, Aviraj, et al.
Published: (2026)
by: Bevli, Aviraj, et al.
Published: (2026)
ALRM: Agentic LLM for Robotic Manipulation
by: Santos, Vitor Gaboardi dos, et al.
Published: (2026)
by: Santos, Vitor Gaboardi dos, et al.
Published: (2026)
Evaluating Arabic Large Language Models: A Survey of Benchmarks, Methods, and Gaps
by: Alzubaidi, Ahmed, et al.
Published: (2025)
by: Alzubaidi, Ahmed, et al.
Published: (2025)
Generalization error of spectral algorithms
by: Velikanov, Maksim, et al.
Published: (2024)
by: Velikanov, Maksim, et al.
Published: (2024)
Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets
by: Younsi, Adam, et al.
Published: (2025)
by: Younsi, Adam, et al.
Published: (2025)
Analyzing Multi-Head Attention on Trojan BERT Models
by: Wang, Jingwei
Published: (2024)
by: Wang, Jingwei
Published: (2024)
3LM: Bridging Arabic, STEM, and Code through Benchmarking
by: Boussaha, Basma El Amel, et al.
Published: (2025)
by: Boussaha, Basma El Amel, et al.
Published: (2025)
Constrained Online Convex Optimization with Memory and Predictions
by: Abdullah, Mohammed, et al.
Published: (2026)
by: Abdullah, Mohammed, et al.
Published: (2026)
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
by: Xiao, Guangxuan, et al.
Published: (2024)
by: Xiao, Guangxuan, et al.
Published: (2024)
Brevity Constraints Reverse Performance Hierarchies in Language Models
by: Hakim, MD Azizul
Published: (2026)
by: Hakim, MD Azizul
Published: (2026)
Hydro‐Climatic Modelling for Water Resources: From Processes to Adaptive Management and Governance
by: Jamel Chahed
Published: (2025)
by: Jamel Chahed
Published: (2025)
DATASHI: A Parallel English-Tashlhiyt Corpus for Orthography Normalization and Low-Resource Language Processing
by: Monir, Nasser-Eddine, et al.
Published: (2026)
by: Monir, Nasser-Eddine, et al.
Published: (2026)
Falcon 7b for Software Mention Detection in Scholarly Documents
by: Khan, AmeerAli, et al.
Published: (2024)
by: Khan, AmeerAli, et al.
Published: (2024)
Exploring Attention Mechanisms in Integration of Multi-Modal Information for Sign Language Recognition and Translation
by: Hakim, Zaber Ibn Abdul, et al.
Published: (2023)
by: Hakim, Zaber Ibn Abdul, et al.
Published: (2023)
ChatGPT v.s. Media Bias: A Comparative Study of GPT-3.5 and Fine-tuned Language Models
by: Wen, Zehao, et al.
Published: (2024)
by: Wen, Zehao, et al.
Published: (2024)
Dream-Coder 7B: An Open Diffusion Language Model for Code
by: Xie, Zhihui, et al.
Published: (2025)
by: Xie, Zhihui, et al.
Published: (2025)
Ensemble of pre-trained language models and data augmentation for hate speech detection from Arabic tweets
by: Daouadi, Kheir Eddine, et al.
Published: (2024)
by: Daouadi, Kheir Eddine, et al.
Published: (2024)
Tiny Recursive Reasoning with Mamba-2 Attention Hybrid
by: Wang, Wenlong, et al.
Published: (2026)
by: Wang, Wenlong, et al.
Published: (2026)
MambaByte: Token-free Selective State Space Model
by: Wang, Junxiong, et al.
Published: (2024)
by: Wang, Junxiong, et al.
Published: (2024)
Attention to Mamba: A Recipe for Cross-Architecture Distillation
by: Moudgil, Abhinav, et al.
Published: (2026)
by: Moudgil, Abhinav, et al.
Published: (2026)
MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking
by: Younes, Mohamed T., et al.
Published: (2025)
by: Younes, Mohamed T., et al.
Published: (2025)
Multilingual Nonce Dependency Treebanks: Understanding how Language Models represent and process syntactic structure
by: Arps, David, et al.
Published: (2023)
by: Arps, David, et al.
Published: (2023)
Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance
by: Tamura, Takuya, et al.
Published: (2025)
by: Tamura, Takuya, et al.
Published: (2025)
Are LLMs Enough for Hyperpartisan, Fake, Polarized and Harmful Content Detection? Evaluating In-Context Learning vs. Fine-Tuning
by: Maggini, Michele Joshua, et al.
Published: (2025)
by: Maggini, Michele Joshua, et al.
Published: (2025)
Parallax: Parameterized Local Linear Attention for Language Modeling
by: Zuo, Yifei, et al.
Published: (2026)
by: Zuo, Yifei, et al.
Published: (2026)
Low-resource Machine Translation for Code-switched Kazakh-Russian Language Pair
by: Borisov, Maksim, et al.
Published: (2025)
by: Borisov, Maksim, et al.
Published: (2025)
Similar Items
-
Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers
by: Velikanov, Maksim, et al.
Published: (2026) -
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
by: Zuo, Jingwei, et al.
Published: (2025) -
Re-thinking Human Activity Recognition with Hierarchy-aware Label Relationship Modeling
by: Zuo, Jingwei, et al.
Published: (2024) -
Falcon2-11B Technical Report
by: Malartic, Quentin, et al.
Published: (2024) -
Competitive Audio-Language Models with Data-Efficient Single-Stage Training on Public Data
by: Kumar, Gokul Karthik, et al.
Published: (2025)