:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Ruibo, Wei, Jerry, Liu, Fangyu, Si, Chenglei, Zhang, Yanzhe, Rao, Jinmeng, Zheng, Steven, Peng, Daiyi, Yang, Diyi, Zhou, Denny, Dai, Andrew M.
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2404.07503
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering
by: Si, Chenglei, et al.
Published: (2024)

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
by: Si, Chenglei, et al.
Published: (2024)

The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas
by: Si, Chenglei, et al.
Published: (2025)

Higher Layers Need More LoRA Experts
by: Gao, Chongyang, et al.
Published: (2024)

Searching for Privacy Risks in LLM Agents via Simulation
by: Zhang, Yanzhe, et al.
Published: (2025)

Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data
by: Chen, Qi, et al.
Published: (2025)

A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration
by: Liu, Zijun, et al.
Published: (2023)

Knowing You Don't Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing
by: Yang, Diji, et al.
Published: (2025)

Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping
by: Li, Ryan, et al.
Published: (2024)

Attacking Vision-Language Computer Agents via Pop-ups
by: Zhang, Yanzhe, et al.
Published: (2024)

Towards Execution-Grounded Automated AI Research
by: Si, Chenglei, et al.
Published: (2026)

Auditing Gender Presentation Differences in Text-to-Image Models
by: Zhang, Yanzhe, et al.
Published: (2023)

Long-form factuality in large language models
by: Wei, Jerry, et al.
Published: (2024)

Distilling an End-to-End Voice Assistant Without Instruction Training Data
by: Held, William, et al.
Published: (2024)

Highly Efficient and Stable Narrow Band Green Emitting Phosphor of Sb 3+ /Ce 3+ Sensitized Cs 2 NaTbCl 6 for WLED
by: Changheng Chen, et al.
Published: (2024)

Contextual Experience Replay for Self-Improvement of Language Agents
by: Liu, Yitao, et al.
Published: (2025)

Generative Interfaces for Language Models
by: Chen, Jiaqi, et al.
Published: (2025)

Real-Time Reasoning Agents in Evolving Environments
by: Wen, Yule, et al.
Published: (2025)

Type-Compliant Adaptation Cascades: Adapting Programmatic LM Workflows to Data
by: Lin, Chu-Cheng, et al.
Published: (2025)

Borderline content and platformised speech governance: Mapping TikTok's moderation controversies in South and Southeast Asia
by: Diyi Liu
Published: (2024)

SPHERE: An Evaluation Card for Human-AI Systems
by: Ma, Qianou, et al.
Published: (2025)

Defect‐Engineered Zero‐Dimensional Perovskite Cs 3 LuCl 6 : Tb 3+ Scintillator with Exceptional Thermal Stability for Flexible High‐Temperature X‐Ray Imaging
by: Ruibo Gao, et al.
Published: (2026)

Achieving Single‐Phased Full Visible Spectrum Broadband White Emission in Ag⁺, Bi 3 ⁺, and Sb 3 ⁺ Tri‐Doped Cs₂NaLuCl₆ Double Perovskite Phosphor
by: Changheng Chen, et al.
Published: (2025)

The Best Instruction-Tuning Data are Those That Fit
by: Zhang, Dylan, et al.
Published: (2025)

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
by: Zhang, Yanzhe, et al.
Published: (2023)

Contextualized Privacy Defense for LLM Agents
by: Wen, Yule, et al.
Published: (2026)

Security and Innovation in ERP Systems: Best Practices for AI, OIC, and Automation Integration
by: Sreenivasa Rao Sola
Published: (2023)

Simple synthetic data reduces sycophancy in large language models
by: Wei, Jerry, et al.
Published: (2023)

Challenges and Best Practices in Corporate AI Governance:Lessons from the Biopharmaceutical Industry
by: Mökander, Jakob, et al.
Published: (2024)

When to Showcase Automated Production Processes? Disclosing Production Processes Increases Evaluation of Low‐End but Decreases Evaluation of High‐End Products
by: Diyi Liu, et al.
Published: (2025)

Deploying Tiny LVLM Judges for Real-World Evaluation of Chart Models: Lessons Learned and Best Practices
by: Laskar, Md Tahmid Rahman, et al.
Published: (2025)

Selecting the Best Optimizing System
by: Si, Nian, et al.
Published: (2022)

Robust Output Regulation of Uncertain Linear Time-Varying Systems
by: Zha, Jinmeng, et al.
Published: (2026)

AutoMetrics: Approximate Human Judgements with Automatically Generated Evaluators
by: Ryan, Michael J., et al.
Published: (2025)

S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models
by: Lei, Fangyu, et al.
Published: (2023)

Evaluation on Aggregate Particle Spalling of Induction Heating‐Based Functional Ultra‐Thin Friction Layer Using Image Processing Based on MATLAB
by: Zhengmengyuan Rao, et al.
Published: (2026)

IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues
by: Yang, Diji, et al.
Published: (2024)

Tweedie Regression for Video Recommendation System
by: Zheng, Yan, et al.
Published: (2025)

Relic abundance of dark matter with coannihilation in non-standard cosmological scenarios
by: Liu, Fangyu, et al.
Published: (2023)

Constraints on Asymmetric Dark Matter Self Annihilation Cross Sections in Non-standard Cosmological Scenarios
by: Liu, Fangyu, et al.
Published: (2023)