Saved in:
| Main Authors: | Krylov, Aleksei S., Somov, Oleg D. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.13739 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Confidence Estimation for Error Detection in Text-to-SQL Systems
by: Somov, Oleg, et al.
Published: (2025)
by: Somov, Oleg, et al.
Published: (2025)
Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers
by: Gong, Linyuan, et al.
Published: (2023)
by: Gong, Linyuan, et al.
Published: (2023)
The benefits of query-based KGQA systems for complex and temporal questions in LLM era
by: Alekseev, Artem, et al.
Published: (2025)
by: Alekseev, Artem, et al.
Published: (2025)
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs
by: Seleznyov, Mikhail, et al.
Published: (2025)
by: Seleznyov, Mikhail, et al.
Published: (2025)
Evolutionary Search for Automated Design of Uncertainty Quantification Methods
by: Seleznyov, Mikhail, et al.
Published: (2026)
by: Seleznyov, Mikhail, et al.
Published: (2026)
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
by: Zhang, Meishan, et al.
Published: (2025)
by: Zhang, Meishan, et al.
Published: (2025)
IT5: Text-to-text Pretraining for Italian Language Understanding and Generation
by: Sarti, Gabriele, et al.
Published: (2022)
by: Sarti, Gabriele, et al.
Published: (2022)
Private Synthetic Text Generation with Diffusion Models
by: Ochs, Sebastian, et al.
Published: (2024)
by: Ochs, Sebastian, et al.
Published: (2024)
Rethinking the Role of Text Complexity in Language Model Pretraining
by: Velasco, Dan John, et al.
Published: (2025)
by: Velasco, Dan John, et al.
Published: (2025)
HAMSA: Hijacking Aligned Compact Models via Stealthy Automation
by: Krylov, Alexey, et al.
Published: (2025)
by: Krylov, Alexey, et al.
Published: (2025)
HRM-Text: Efficient Pretraining Beyond Scaling
by: Wang, Guan, et al.
Published: (2026)
by: Wang, Guan, et al.
Published: (2026)
Challenges in Explaining Pretrained Clinical Text Classifiers
by: Miok, Kristian, et al.
Published: (2026)
by: Miok, Kristian, et al.
Published: (2026)
Improving Estonian Text Simplification through Pretrained Language Models and Custom Datasets
by: Barbu, Eduard, et al.
Published: (2025)
by: Barbu, Eduard, et al.
Published: (2025)
MedSyn: LLM-based Synthetic Medical Text Generation Framework
by: Kumichev, Gleb, et al.
Published: (2024)
by: Kumichev, Gleb, et al.
Published: (2024)
GlossLM: A Massively Multilingual Corpus and Pretrained Model for Interlinear Glossed Text
by: Ginn, Michael, et al.
Published: (2024)
by: Ginn, Michael, et al.
Published: (2024)
Energy-Based Diffusion Language Models for Text Generation
by: Xu, Minkai, et al.
Published: (2024)
by: Xu, Minkai, et al.
Published: (2024)
Differences in Text Generated by Diffusion and Autoregressive Language Models
by: Zhang, Zeyang, et al.
Published: (2026)
by: Zhang, Zeyang, et al.
Published: (2026)
A Reparameterized Discrete Diffusion Model for Text Generation
by: Zheng, Lin, et al.
Published: (2023)
by: Zheng, Lin, et al.
Published: (2023)
Prune or Retrain: Optimizing the Vocabulary of Multilingual Models for Estonian
by: Dorkin, Aleksei, et al.
Published: (2025)
by: Dorkin, Aleksei, et al.
Published: (2025)
Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models
by: Nguyen, Minh, et al.
Published: (2024)
by: Nguyen, Minh, et al.
Published: (2024)
Jailbreaking Large Language Diffusion Models: Revealing Hidden Safety Flaws in Diffusion-Based Text Generation
by: Zhang, Yuanhe, et al.
Published: (2025)
by: Zhang, Yuanhe, et al.
Published: (2025)
Harnessing the Intrinsic Knowledge of Pretrained Language Models for Challenging Text Classification Settings
by: Gao, Lingyu
Published: (2024)
by: Gao, Lingyu
Published: (2024)
Transfer Learning for Text Diffusion Models
by: Han, Kehang, et al.
Published: (2024)
by: Han, Kehang, et al.
Published: (2024)
Sõnajaht: Definition Embeddings and Semantic Search for Reverse Dictionary Creation
by: Dorkin, Aleksei, et al.
Published: (2024)
by: Dorkin, Aleksei, et al.
Published: (2024)
TartuNLP @ SIGTYP 2024 Shared Task: Adapting XLM-RoBERTa for Ancient and Historical Languages
by: Dorkin, Aleksei, et al.
Published: (2024)
by: Dorkin, Aleksei, et al.
Published: (2024)
Cross-lingual paraphrase identification
by: Fedorova, Inessa, et al.
Published: (2024)
by: Fedorova, Inessa, et al.
Published: (2024)
TartuNLP @ AXOLOTL-24: Leveraging Classifier Output for New Sense Detection in Lexical Semantics
by: Dorkin, Aleksei, et al.
Published: (2024)
by: Dorkin, Aleksei, et al.
Published: (2024)
Comparison of Current Approaches to Lemmatization: A Case Study in Estonian
by: Dorkin, Aleksei, et al.
Published: (2024)
by: Dorkin, Aleksei, et al.
Published: (2024)
TartuNLP at EvaLatin 2024: Emotion Polarity Detection
by: Dorkin, Aleksei, et al.
Published: (2024)
by: Dorkin, Aleksei, et al.
Published: (2024)
GliLem: Leveraging GliNER for Contextualized Lemmatization in Estonian
by: Dorkin, Aleksei, et al.
Published: (2024)
by: Dorkin, Aleksei, et al.
Published: (2024)
TartuNLP at SemEval-2025 Task 5: Subject Tagging as Two-Stage Information Retrieval
by: Dorkin, Aleksei, et al.
Published: (2025)
by: Dorkin, Aleksei, et al.
Published: (2025)
Generalizable and Stable Finetuning of Pretrained Language Models on Low-Resource Texts
by: Somayajula, Sai Ashish, et al.
Published: (2024)
by: Somayajula, Sai Ashish, et al.
Published: (2024)
Comparison of End-to-end Speech Assessment Models for the NOCASA 2025 Challenge
by: Žavoronkov, Aleksei, et al.
Published: (2025)
by: Žavoronkov, Aleksei, et al.
Published: (2025)
Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation
by: Shabalin, Alexander, et al.
Published: (2025)
by: Shabalin, Alexander, et al.
Published: (2025)
Empowering Diffusion Models on the Embedding Space for Text Generation
by: Gao, Zhujin, et al.
Published: (2022)
by: Gao, Zhujin, et al.
Published: (2022)
Diffusion-Pretrained Dense and Contextual Embeddings
by: Eslami, Sedigheh, et al.
Published: (2026)
by: Eslami, Sedigheh, et al.
Published: (2026)
ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings
by: Brannon, William, et al.
Published: (2023)
by: Brannon, William, et al.
Published: (2023)
EstLLM: Enhancing Estonian Capabilities in Multilingual LLMs via Continued Pretraining and Post-Training
by: Dorkin, Aleksei, et al.
Published: (2026)
by: Dorkin, Aleksei, et al.
Published: (2026)
NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text
by: Kailas, Prajwal, et al.
Published: (2024)
by: Kailas, Prajwal, et al.
Published: (2024)
TextOmics-Guided Diffusion for Hit-like Molecular Generation
by: Yuan, Hang, et al.
Published: (2025)
by: Yuan, Hang, et al.
Published: (2025)
Similar Items
-
Confidence Estimation for Error Detection in Text-to-SQL Systems
by: Somov, Oleg, et al.
Published: (2025) -
Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers
by: Gong, Linyuan, et al.
Published: (2023) -
The benefits of query-based KGQA systems for complex and temporal questions in LLM era
by: Alekseev, Artem, et al.
Published: (2025) -
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs
by: Seleznyov, Mikhail, et al.
Published: (2025) -
Evolutionary Search for Automated Design of Uncertainty Quantification Methods
by: Seleznyov, Mikhail, et al.
Published: (2026)