:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Dasanaike, Noah
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2601.21138
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Large Language Models Naively Recover Ethnicity from Individual Records
by: Dasanaike, Noah
Published: (2026)

Using Embedding Models to Improve Probabilistic Race Prediction
by: Dasanaike, Noah, et al.
Published: (2026)

LinkTransformer: A Unified Package for Record Linkage with Transformer Language Models
by: Arora, Abhishek, et al.
Published: (2023)

Leveraging Large Language Models for Generating Labeled Mineral Site Record Linkage Data
by: Pyo, Jiyoon, et al.
Published: (2024)

Safe Training with Sensitive In-domain Data: Leveraging Data Fragmentation To Mitigate Linkage Attacks
by: Ignashina, Mariia, et al.
Published: (2024)

Lemma Dilemma: On Lemma Generation Without Domain- or Language-Specific Training Data
by: Toporkov, Olia, et al.
Published: (2025)

Distilling an End-to-End Voice Assistant Without Instruction Training Data
by: Held, William, et al.
Published: (2024)

Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems
by: Matsubara, Yoshitomo, et al.
Published: (2022)

Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
by: Hayase, Jonathan, et al.
Published: (2024)

Ensemble BERT for Medication Event Classification on Electronic Health Records (EHRs)
by: Sarker, Shouvon, et al.
Published: (2025)

Is Child-Directed Speech Effective Training Data for Language Models?
by: Feng, Steven Y., et al.
Published: (2024)

Logit Arithmetic Elicits Long Reasoning Capabilities Without Training
by: Zhang, Yunxiang, et al.
Published: (2025)

Ensemble Self-Training for Unsupervised Machine Translation
by: Aharon, Ido, et al.
Published: (2026)

Self-Training Large Language Models for Tool-Use Without Demonstrations
by: Luo, Ne, et al.
Published: (2025)

Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers
by: Ahuja, Kabir, et al.
Published: (2024)

Experience Retrieval-Augmentation with Electronic Health Records Enables Accurate Discharge QA
by: Ou, Justice, et al.
Published: (2025)

PRISM: A Unified Framework for Post-Training LLMs Without Verifiable Rewards
by: Ghimire, Mukesh, et al.
Published: (2026)

SmolKalam: Ensemble Quality-Filtered Translation at Scale for High Quality Arabic Post-Training Data
by: Alrashed, Sultan, et al.
Published: (2025)

DORA Explorer: Improving the Exploration Ability of LLMs Without Training
by: Gurjar, Priya, et al.
Published: (2026)

Logit Arithmetic Elicits Long Reasoning Capabilities Without Training
by: Zhang, Yunxiang, et al.
Published: (2025)

Training a Huggingface Model on AWS Sagemaker (Without Tears)
by: Tan, Liling
Published: (2025)

Dual-objective Language Models: Training Efficiency Without Overfitting
by: Samuel, David, et al.
Published: (2025)

ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations
by: Wang, Yubo, et al.
Published: (2025)

Recovering Diversity Without Losing Alignment: A DPO Recipe for Post-Trained LLMs
by: Samuel, Vinay, et al.
Published: (2026)

FreePRM: Training Process Reward Models Without Ground Truth Process Labels
by: Sun, Lin, et al.
Published: (2025)

Fast Bayesian Record Linkage for Streaming Data Contexts
by: Taylor, Ian, et al.
Published: (2023)

Global Contrastive Training for Multimodal Electronic Health Records with Language Supervision
by: Ma, Yingbo, et al.
Published: (2024)

Accurate and Data-Efficient Toxicity Prediction when Annotators Disagree
by: Jaggi, Harbani, et al.
Published: (2024)

Self-Ensembling Vision-Language Models for Chart Data Extraction
by: Berkane, Thomas, et al.
Published: (2026)

Labeling Free-text Data using Language Model Ensembles
by: Qiu, Jiaxing, et al.
Published: (2025)

Sparsity Induction for Accurate Post-Training Pruning of Large Language Models
by: Jiang, Minhao, et al.
Published: (2026)

FUSE: Ensembling Verifiers with Zero Labeled Data
by: Lee, Joonhyuk, et al.
Published: (2026)

Commentary Generation from Data Records of Multiplayer Strategy Esports Game
by: Wang, Zihan, et al.
Published: (2022)

Accent Vector: Controllable Accent Manipulation for Multilingual TTS Without Accented Data
by: Lertpetchpun, Thanathai, et al.
Published: (2026)

INSIGHTBUDDY-AI: Medication Extraction and Entity Linking using Large Language Models and Ensemble Learning
by: Romero, Pablo, et al.
Published: (2024)

Speculate Deep and Accurate: Lossless and Training-Free Acceleration for Offloaded LLMs via Substitute Speculative Decoding
by: Wang, Pei-Shuo, et al.
Published: (2025)

LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
by: Zhang, Sikui, et al.
Published: (2025)

AURORA:Automated Training Framework of Universal Process Reward Models via Ensemble Prompting and Reverse Verification
by: Tan, Xiaoyu, et al.
Published: (2025)

Speechless: Speech Instruction Training Without Speech for Low Resource Languages
by: Dao, Alan, et al.
Published: (2025)

Protecting De-identified Documents from Search-based Linkage Attacks
by: Lison, Pierre, et al.
Published: (2025)