:: Library Catalog

Beeld op de omslag

Bewaard in:

Bibliografische gegevens
Hoofdauteurs:	Ma, Ming, Zheng, Bowen, Lin, Zhongqiao, Yang, Tianming
Formaat:	Preprint
Gepubliceerd in:	2025
Onderwerpen:	Computation and Language Performance
Online toegang:	https://arxiv.org/abs/2507.17618
Tags:	Voeg label toe Geen labels, Wees de eerste die dit record labelt!

Gelijkaardige items

Label Words as Local Task Vectors in In-Context Learning
door: Zheng, Bowen, et al.
Gepubliceerd in: (2024)

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing
door: Zheng, Wenhao, et al.
Gepubliceerd in: (2025)

Beyond Tokens: Semantic-Aware Speculative Decoding for Efficient Inference by Probing Internal States
door: Dong, Ximing, et al.
Gepubliceerd in: (2026)

LFED: A Literary Fiction Evaluation Dataset for Large Language Models
door: Yu, Linhao, et al.
Gepubliceerd in: (2024)

Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity
door: Song, Zichen, et al.
Gepubliceerd in: (2024)

Model Compression and Efficient Inference for Large Language Models: A Survey
door: Wang, Wenxiao, et al.
Gepubliceerd in: (2024)

Green AI: Exploring Carbon Footprints, Mitigation Strategies, and Trade Offs in Large Language Model Training
door: Liu, Vivian, et al.
Gepubliceerd in: (2024)

Priority Sampling of Large Language Models for Compilers
door: Grubisic, Dejan, et al.
Gepubliceerd in: (2024)

SuperCoder: Assembly Program Superoptimization with Large Language Models
door: Wei, Anjiang, et al.
Gepubliceerd in: (2025)

LOOPRAG: Enhancing Loop Transformation Optimization with Retrieval-Augmented Large Language Models
door: Zhi, Yijie, et al.
Gepubliceerd in: (2025)

R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
door: Fu, Tianyu, et al.
Gepubliceerd in: (2025)

DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention
door: Lee, Younjoo, et al.
Gepubliceerd in: (2026)

It's Not Easy Being Green: On the Energy Efficiency of Programming Languages
door: van Kempen, Nicolas, et al.
Gepubliceerd in: (2024)

Enhancing Inference Efficiency of Large Language Models: Investigating Optimization Strategies and Architectural Innovations
door: Tyukin, Georgy
Gepubliceerd in: (2024)

Systematic Evaluation of Optimization Techniques for Long-Context Language Models
door: Ahmed, Ammar, et al.
Gepubliceerd in: (2025)

SLM-Bench: A Comprehensive Benchmark of Small Language Models on Environmental Impacts--Extended Version
door: Pham, Nghiem Thanh, et al.
Gepubliceerd in: (2025)

LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits
door: Zhou, Zikai, et al.
Gepubliceerd in: (2025)

EPIC: Efficient Position-Independent Caching for Serving Large Language Models
door: Hu, Junhao, et al.
Gepubliceerd in: (2024)

LoPace: A Lossless Optimized Prompt Accurate Compression Engine for Large Language Model Applications
door: Ulla, Aman
Gepubliceerd in: (2026)

KG-EDAS: A Meta-Metric Framework for Evaluating Knowledge Graph Completion Models
door: Gul, Haji, et al.
Gepubliceerd in: (2025)

AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms
door: Chen, Feiyang, et al.
Gepubliceerd in: (2025)

Data Efficacy for Language Model Training
door: Dai, Yalun, et al.
Gepubliceerd in: (2025)

Morpheme Boundary Detection & Grammatical Feature Prediction for Gujarati : Dataset & Model
door: Baxi, Jatayu, et al.
Gepubliceerd in: (2021)

Investigating Execution-Aware Language Models for Code Optimization
door: Di Menna, Federico, et al.
Gepubliceerd in: (2025)

Accurate Performance Modeling And Uncertainty Analysis of Lossy Compression in Scientific Applications
door: Liu, Youyuan, et al.
Gepubliceerd in: (2024)

LOOPerSet: A Large-Scale Dataset for Data-Driven Polyhedral Compiler Optimization
door: Merouani, Massinissa, et al.
Gepubliceerd in: (2025)

Profiling Large Language Model Inference on Apple Silicon: A Quantization Perspective
door: Benazir, Afsara, et al.
Gepubliceerd in: (2025)

SCALE-Sim TPU: Validating and Extending SCALE-Sim for TPUs
door: Dang, Jingtian, et al.
Gepubliceerd in: (2026)

Iterative Layer Pruning for Efficient Translation Inference
door: Moslem, Yasmin, et al.
Gepubliceerd in: (2025)

Optimizing Layout of Recursive Datatypes with Marmoset
door: Singhal, Vidush, et al.
Gepubliceerd in: (2024)

Evaluating Compiler Optimization Impacts on zkVM Performance
door: Gassmann, Thomas, et al.
Gepubliceerd in: (2025)

L1RA: Dynamic Rank Assignment in LoRA Fine-Tuning
door: Singh, Raul, et al.
Gepubliceerd in: (2025)

Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums
door: Won, Jaeyeon, et al.
Gepubliceerd in: (2025)

Rule-Based Graph Programs Matching the Time Complexity of Imperative Algorithms
door: Alaoui, Ziad Ismaili, et al.
Gepubliceerd in: (2025)

Runtime Verification on Abstract Finite State Models
door: Jevitha, KP, et al.
Gepubliceerd in: (2024)

Stencil-Lifting: Hierarchical Recursive Lifting System for Extracting Summary of Stencil Kernel in Legacy Codes
door: Li, Mingyi, et al.
Gepubliceerd in: (2025)

Regression Language Models for Code
door: Akhauri, Yash, et al.
Gepubliceerd in: (2025)

PM2Lat: Highly Accurate and Generalized Prediction of DNN Execution Latency on GPUs
door: Le, Truong-Thanh, et al.
Gepubliceerd in: (2026)

Flex Attention: A Programming Model for Generating Optimized Attention Kernels
door: Dong, Juechu, et al.
Gepubliceerd in: (2024)

Repr Types: One Abstraction to Rule Them All
door: Palmkvist, Viktor, et al.
Gepubliceerd in: (2024)