Saved in:
| Main Author: | Dasanaike, Noah |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.21138 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Large Language Models Naively Recover Ethnicity from Individual Records
by: Dasanaike, Noah
Published: (2026)
by: Dasanaike, Noah
Published: (2026)
Using Embedding Models to Improve Probabilistic Race Prediction
by: Dasanaike, Noah, et al.
Published: (2026)
by: Dasanaike, Noah, et al.
Published: (2026)
LinkTransformer: A Unified Package for Record Linkage with Transformer Language Models
by: Arora, Abhishek, et al.
Published: (2023)
by: Arora, Abhishek, et al.
Published: (2023)
Leveraging Large Language Models for Generating Labeled Mineral Site Record Linkage Data
by: Pyo, Jiyoon, et al.
Published: (2024)
by: Pyo, Jiyoon, et al.
Published: (2024)
Safe Training with Sensitive In-domain Data: Leveraging Data Fragmentation To Mitigate Linkage Attacks
by: Ignashina, Mariia, et al.
Published: (2024)
by: Ignashina, Mariia, et al.
Published: (2024)
Lemma Dilemma: On Lemma Generation Without Domain- or Language-Specific Training Data
by: Toporkov, Olia, et al.
Published: (2025)
by: Toporkov, Olia, et al.
Published: (2025)
Distilling an End-to-End Voice Assistant Without Instruction Training Data
by: Held, William, et al.
Published: (2024)
by: Held, William, et al.
Published: (2024)
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems
by: Matsubara, Yoshitomo, et al.
Published: (2022)
by: Matsubara, Yoshitomo, et al.
Published: (2022)
Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
by: Hayase, Jonathan, et al.
Published: (2024)
by: Hayase, Jonathan, et al.
Published: (2024)
Ensemble BERT for Medication Event Classification on Electronic Health Records (EHRs)
by: Sarker, Shouvon, et al.
Published: (2025)
by: Sarker, Shouvon, et al.
Published: (2025)
Is Child-Directed Speech Effective Training Data for Language Models?
by: Feng, Steven Y., et al.
Published: (2024)
by: Feng, Steven Y., et al.
Published: (2024)
Logit Arithmetic Elicits Long Reasoning Capabilities Without Training
by: Zhang, Yunxiang, et al.
Published: (2025)
by: Zhang, Yunxiang, et al.
Published: (2025)
Ensemble Self-Training for Unsupervised Machine Translation
by: Aharon, Ido, et al.
Published: (2026)
by: Aharon, Ido, et al.
Published: (2026)
Self-Training Large Language Models for Tool-Use Without Demonstrations
by: Luo, Ne, et al.
Published: (2025)
by: Luo, Ne, et al.
Published: (2025)
Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers
by: Ahuja, Kabir, et al.
Published: (2024)
by: Ahuja, Kabir, et al.
Published: (2024)
Experience Retrieval-Augmentation with Electronic Health Records Enables Accurate Discharge QA
by: Ou, Justice, et al.
Published: (2025)
by: Ou, Justice, et al.
Published: (2025)
PRISM: A Unified Framework for Post-Training LLMs Without Verifiable Rewards
by: Ghimire, Mukesh, et al.
Published: (2026)
by: Ghimire, Mukesh, et al.
Published: (2026)
SmolKalam: Ensemble Quality-Filtered Translation at Scale for High Quality Arabic Post-Training Data
by: Alrashed, Sultan, et al.
Published: (2025)
by: Alrashed, Sultan, et al.
Published: (2025)
DORA Explorer: Improving the Exploration Ability of LLMs Without Training
by: Gurjar, Priya, et al.
Published: (2026)
by: Gurjar, Priya, et al.
Published: (2026)
Logit Arithmetic Elicits Long Reasoning Capabilities Without Training
by: Zhang, Yunxiang, et al.
Published: (2025)
by: Zhang, Yunxiang, et al.
Published: (2025)
Training a Huggingface Model on AWS Sagemaker (Without Tears)
by: Tan, Liling
Published: (2025)
by: Tan, Liling
Published: (2025)
Dual-objective Language Models: Training Efficiency Without Overfitting
by: Samuel, David, et al.
Published: (2025)
by: Samuel, David, et al.
Published: (2025)
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations
by: Wang, Yubo, et al.
Published: (2025)
by: Wang, Yubo, et al.
Published: (2025)
Recovering Diversity Without Losing Alignment: A DPO Recipe for Post-Trained LLMs
by: Samuel, Vinay, et al.
Published: (2026)
by: Samuel, Vinay, et al.
Published: (2026)
FreePRM: Training Process Reward Models Without Ground Truth Process Labels
by: Sun, Lin, et al.
Published: (2025)
by: Sun, Lin, et al.
Published: (2025)
Fast Bayesian Record Linkage for Streaming Data Contexts
by: Taylor, Ian, et al.
Published: (2023)
by: Taylor, Ian, et al.
Published: (2023)
Global Contrastive Training for Multimodal Electronic Health Records with Language Supervision
by: Ma, Yingbo, et al.
Published: (2024)
by: Ma, Yingbo, et al.
Published: (2024)
Accurate and Data-Efficient Toxicity Prediction when Annotators Disagree
by: Jaggi, Harbani, et al.
Published: (2024)
by: Jaggi, Harbani, et al.
Published: (2024)
Self-Ensembling Vision-Language Models for Chart Data Extraction
by: Berkane, Thomas, et al.
Published: (2026)
by: Berkane, Thomas, et al.
Published: (2026)
Labeling Free-text Data using Language Model Ensembles
by: Qiu, Jiaxing, et al.
Published: (2025)
by: Qiu, Jiaxing, et al.
Published: (2025)
Sparsity Induction for Accurate Post-Training Pruning of Large Language Models
by: Jiang, Minhao, et al.
Published: (2026)
by: Jiang, Minhao, et al.
Published: (2026)
FUSE: Ensembling Verifiers with Zero Labeled Data
by: Lee, Joonhyuk, et al.
Published: (2026)
by: Lee, Joonhyuk, et al.
Published: (2026)
Commentary Generation from Data Records of Multiplayer Strategy Esports Game
by: Wang, Zihan, et al.
Published: (2022)
by: Wang, Zihan, et al.
Published: (2022)
Accent Vector: Controllable Accent Manipulation for Multilingual TTS Without Accented Data
by: Lertpetchpun, Thanathai, et al.
Published: (2026)
by: Lertpetchpun, Thanathai, et al.
Published: (2026)
INSIGHTBUDDY-AI: Medication Extraction and Entity Linking using Large Language Models and Ensemble Learning
by: Romero, Pablo, et al.
Published: (2024)
by: Romero, Pablo, et al.
Published: (2024)
Speculate Deep and Accurate: Lossless and Training-Free Acceleration for Offloaded LLMs via Substitute Speculative Decoding
by: Wang, Pei-Shuo, et al.
Published: (2025)
by: Wang, Pei-Shuo, et al.
Published: (2025)
LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
by: Zhang, Sikui, et al.
Published: (2025)
by: Zhang, Sikui, et al.
Published: (2025)
AURORA:Automated Training Framework of Universal Process Reward Models via Ensemble Prompting and Reverse Verification
by: Tan, Xiaoyu, et al.
Published: (2025)
by: Tan, Xiaoyu, et al.
Published: (2025)
Speechless: Speech Instruction Training Without Speech for Low Resource Languages
by: Dao, Alan, et al.
Published: (2025)
by: Dao, Alan, et al.
Published: (2025)
Protecting De-identified Documents from Search-based Linkage Attacks
by: Lison, Pierre, et al.
Published: (2025)
by: Lison, Pierre, et al.
Published: (2025)
Similar Items
-
Large Language Models Naively Recover Ethnicity from Individual Records
by: Dasanaike, Noah
Published: (2026) -
Using Embedding Models to Improve Probabilistic Race Prediction
by: Dasanaike, Noah, et al.
Published: (2026) -
LinkTransformer: A Unified Package for Record Linkage with Transformer Language Models
by: Arora, Abhishek, et al.
Published: (2023) -
Leveraging Large Language Models for Generating Labeled Mineral Site Record Linkage Data
by: Pyo, Jiyoon, et al.
Published: (2024) -
Safe Training with Sensitive In-domain Data: Leveraging Data Fragmentation To Mitigate Linkage Attacks
by: Ignashina, Mariia, et al.
Published: (2024)