:: Library Catalog

Okładka

Zapisane w:

Opis bibliograficzny
Główni autorzy:	Wang, Jingyuan, Xu, Shengdong, Yang, Yang
Format:	Preprint
Wydane:	2024
Hasła przedmiotowe:	Computation and Language Machine Learning
Dostęp online:	https://arxiv.org/abs/2403.19713
Etykiety:	Dodaj etykietę Nie ma etykietki, Dołącz pierwszą etykiete!

Podobne zapisy

LegalLens Shared Task 2024: Legal Violation Identification in Unstructured Text
od: Hagag, Ben, i wsp.
Wydane: (2024)

MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks
od: Qi, Jingyuan, i wsp.
Wydane: (2023)

Investigating and Alleviating Harm Amplification in LLM Interactions
od: Guo, Ruohao, i wsp.
Wydane: (2026)

From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards
od: Chehbouni, Khaoula, i wsp.
Wydane: (2024)

Data Contamination Report from the 2024 CONDA Shared Task
od: Sainz, Oscar, i wsp.
Wydane: (2024)

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
od: Andriushchenko, Maksym, i wsp.
Wydane: (2024)

Offline Reinforcement Learning for LLM Multi-Step Reasoning
od: Wang, Huaijie, i wsp.
Wydane: (2024)

Opir: Efficient Multi-Task Safety Classification for Toxicity, Jailbreaks, Hate Speech, and Harmful Content
od: Stepanov, Ihor, i wsp.
Wydane: (2026)

Task Vector Geometry Underlies Dual Modes of Task Inference in Transformers
od: Yan, Hao, i wsp.
Wydane: (2026)

SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
od: Pandey, Punya Syon, i wsp.
Wydane: (2025)

Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
od: Hong, Yuzhong, i wsp.
Wydane: (2024)

RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents
od: Zhu, Jialiang, i wsp.
Wydane: (2026)

The Solution for The PST-KDD-2024 OAG-Challenge
od: Zhong, Shupeng, i wsp.
Wydane: (2024)

Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts
od: Pang, Jing-Cheng, i wsp.
Wydane: (2024)

Disentangling Task Conflicts in Multi-Task LoRA via Orthogonal Gradient Projection
od: Yang, Ziyu, i wsp.
Wydane: (2026)

Token Buncher: Shielding LLMs from Harmful Reinforcement Learning Fine-Tuning
od: Feng, Weitao, i wsp.
Wydane: (2025)

Exploring Task Performance with Interpretable Models via Sparse Auto-Encoders
od: Wang, Shun, i wsp.
Wydane: (2025)

When 2D Tasks Meet 1D Serialization: On Serialization Friction in Structured Tasks
od: Lo, Chung-Hsiang, i wsp.
Wydane: (2026)

Representation Noising: A Defence Mechanism Against Harmful Finetuning
od: Rosati, Domenic, i wsp.
Wydane: (2024)

OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models
od: Wu, Junda, i wsp.
Wydane: (2024)

NeuroLoRA: Context-Aware Neuromodulation for Parameter-Efficient Multi-Task Adaptation
od: Yang, Yuxin, i wsp.
Wydane: (2026)

DISA: Offline Importance Sampling for Distribution-Matching LLM-RL
od: Wang, Shaobo, i wsp.
Wydane: (2026)

A Baseline for Self-state Identification and Classification in Mental Health Data: CLPsych 2025 Task
od: Kim, Laerdon
Wydane: (2025)

Harder Tasks Need More Experts: Dynamic Routing in MoE Models
od: Huang, Quzhe, i wsp.
Wydane: (2024)

Learning Task Representations from In-Context Learning
od: Saglam, Baturay, i wsp.
Wydane: (2025)

PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning
od: Lu, Yao, i wsp.
Wydane: (2026)

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
od: Lee, Seanie, i wsp.
Wydane: (2024)

Which LLMs are Difficult to Detect? A Detailed Analysis of Potential Factors Contributing to Difficulties in LLM Text Detection
od: Thorat, Shantanu, i wsp.
Wydane: (2024)

Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization
od: Mukherjee, Subhojyoti, i wsp.
Wydane: (2025)

Offline Preference Optimization via Maximum Marginal Likelihood Estimation
od: Najafi, Saeed, i wsp.
Wydane: (2025)

Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs
od: Mendu, Sai Krishna, i wsp.
Wydane: (2025)

Frustratingly Easy Task-aware Pruning for Large Language Models
od: Tian, Yuanhe, i wsp.
Wydane: (2025)

End-to-end Planner Training for Language Modeling
od: Cornille, Nathan, i wsp.
Wydane: (2024)

SIG: Speaker Identification in Literature via Prompt-Based Generation
od: Su, Zhenlin, i wsp.
Wydane: (2023)

PIE: Performance Interval Estimation for Free-Form Generation Tasks
od: Hsu, Chi-Yang, i wsp.
Wydane: (2025)

MALTO at SemEval-2024 Task 6: Leveraging Synthetic Data for LLM Hallucination Detection
od: Borra, Federico, i wsp.
Wydane: (2024)

A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models
od: Pfohl, Stephen R., i wsp.
Wydane: (2024)

Vaccine: Perturbation-aware Alignment for Large Language Models against Harmful Fine-tuning Attack
od: Huang, Tiansheng, i wsp.
Wydane: (2024)

FLARE: Task-agnostic embedding model evaluation through a normalization process
od: Jiang, Jingzhou, i wsp.
Wydane: (2026)

HarmPot: An Annotation Framework for Evaluating Offline Harm Potential of Social Media Text
od: Kumar, Ritesh, i wsp.
Wydane: (2024)