:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Pulipaka, Sidharth, Jain, Sparsh, Sankar, Ashwin, Dabre, Raj
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computation and Language
Online-Zugang:	https://arxiv.org/abs/2506.03793
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 14 Indian Languages
von: Sankar, Ashwin, et al.
Veröffentlicht: (2024)

PSK at SemEval-2026 Task 9: Multilingual Polarization Detection Using Ensemble Gemma Models with Synthetic Data Augmentation
von: Pulipaka, Srikar Kashyap
Veröffentlicht: (2026)

The Reasoning Lingua Franca: A Double-Edged Sword for Multilingual AI
von: Saji, Alan, et al.
Veröffentlicht: (2025)

Top-b: Entropic Regulation of Relative Probability Bands in Autoregressive Language Processes
von: Halder, Deepon, et al.
Veröffentlicht: (2026)

Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages
von: Ghosh, Poulami, et al.
Veröffentlicht: (2024)

Mark My Words: Analyzing and Evaluating Language Model Watermarks
von: Piet, Julien, et al.
Veröffentlicht: (2023)

Pretraining Language Models Using Translationese
von: Doshi, Meet, et al.
Veröffentlicht: (2024)

IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages
von: Khan, Mohammed Safi Ur Rahman, et al.
Veröffentlicht: (2024)

When Alignment Hurts: Decoupling Representational Spaces in Multilingual Models
von: Elshabrawy, Ahmed, et al.
Veröffentlicht: (2025)

PSK@EEUCA 2026: Fine-Tuning Large Language Models with Synthetic Data Augmentation for Multi-Class Toxicity Detection in Gaming Chat
von: Pulipaka, Srikar Kashyap
Veröffentlicht: (2026)

Scripts Through Time: A Survey of the Evolving Role of Transliteration in NLP
von: Jayakumar, Thanmay, et al.
Veröffentlicht: (2026)

Punctuation Prediction for Polish Texts using Transformers
von: Pokrywka, Jakub
Veröffentlicht: (2024)

Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs
von: Doddapaneni, Sumanth, et al.
Veröffentlicht: (2024)

CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation
von: Halder, Deepon, et al.
Veröffentlicht: (2025)

An Empirical Study of In-context Learning in LLMs for Machine Translation
von: Chitale, Pranjal A., et al.
Veröffentlicht: (2024)

Resolving Transcription Ambiguity in Spanish: A Hybrid Acoustic-Lexical System for Punctuation Restoration
von: Zhu, Xiliang, et al.
Veröffentlicht: (2024)

A Morphology-Based Investigation of Positional Encodings
von: Ghosh, Poulami, et al.
Veröffentlicht: (2024)

Spontaneous Informal Speech Dataset for Punctuation Restoration
von: Liu, Xing Yi, et al.
Veröffentlicht: (2024)

How effective is Multi-source pivoting for Translation of Low Resource Indian Languages?
von: Gaikwad, Pranav, et al.
Veröffentlicht: (2024)

Punctuation and Predicates in Language Models
von: Chauhan, Sonakshi, et al.
Veröffentlicht: (2025)

RomanLens: The Role Of Latent Romanization In Multilinguality In LLMs
von: Saji, Alan, et al.
Veröffentlicht: (2025)

Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models
von: Atwany, Hanin, et al.
Veröffentlicht: (2025)

Does Dependency Locality Predict Non-canonical Word Order in Hindi?
von: Ranjan, Sidharth, et al.
Veröffentlicht: (2024)

Assessing and Improving Punctuation Robustness in English-Marathi Machine Translation
von: Shejole, Kaustubh Shivshankar, et al.
Veröffentlicht: (2025)

Cued Speech Generation Leveraging a Pre-trained Audiovisual Text-to-Speech Model
von: Sankar, Sanjana, et al.
Veröffentlicht: (2025)

RASMALAI: Resources for Adaptive Speech Modeling in Indian Languages with Accents and Intonations
von: Sankar, Ashwin, et al.
Veröffentlicht: (2025)

IndicRAGSuite: Large-Scale Datasets and a Benchmark for Indian Language RAG Systems
von: Prasanjith, Pasunuti, et al.
Veröffentlicht: (2025)

The Art of Breaking Words: Rethinking Multilingual Tokenizer Design
von: Thakur, Aamod, et al.
Veröffentlicht: (2025)

Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses
von: Gómez-Zaragozá, Lucía, et al.
Veröffentlicht: (2023)

Right at My Level: A Unified Multilingual Framework for Proficiency-Aware Text Simplification
von: Jeong, Jinhong, et al.
Veröffentlicht: (2026)

PrahokBART: A Pre-trained Sequence-to-Sequence Model for Khmer Natural Language Generation
von: Kaing, Hour, et al.
Veröffentlicht: (2025)

FeruzaSpeech: A 60 Hour Uzbek Read Speech Corpus with Punctuation, Casing, and Context
von: Povey, Anna, et al.
Veröffentlicht: (2024)

Streaming Translation and Transcription Through Speech-to-Text Causal Alignment
von: Koshkin, Roman, et al.
Veröffentlicht: (2026)

Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?
von: Sharma, Roshan, et al.
Veröffentlicht: (2024)

Predicting Punctuation in Ancient Chinese Texts: A Multi-Layered LSTM and Attention-Based Approach
von: Cai, Tracy, et al.
Veröffentlicht: (2024)

Punctuation-aware treebank tree binarization
von: Klinger, Eitan, et al.
Veröffentlicht: (2025)

RiddleBench: A New Generative Reasoning Benchmark for LLMs
von: Halder, Deepon, et al.
Veröffentlicht: (2025)

Analyzing and Fine-Tuning Whisper Models for Multilingual Pilot Speech Transcription in the Cockpit
von: Nareddy, Kartheek Kumar Reddy, et al.
Veröffentlicht: (2025)

TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
von: Belouadi, Jonas, et al.
Veröffentlicht: (2025)

Algorithms For Automatic Accentuation And Transcription Of Russian Texts In Speech Recognition Systems
von: Iakovenko, Olga, et al.
Veröffentlicht: (2024)