:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Zhong, Chunlin, Hou, Qiuxia, Zhou, Zhangjun, Hao, Shuang, Lu, Haonan, Zhang, Yanhao, Tang, He, Bai, Xiang
Formato:	Preprint
Publicado:	2025
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2508.18634
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Cap2Sum: Learning to Summarize Videos by Generating Captions
por: Zhao, Cairong, et al.
Publicado: (2024)

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
por: Chai, Wenhao, et al.
Publicado: (2024)

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
por: Cheng, Kanzhi, et al.
Publicado: (2025)

UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks
por: Wu, Peiran, et al.
Publicado: (2025)

SnapCap: Efficient Snapshot Compressive Video Captioning
por: Sun, Jianqiao, et al.
Publicado: (2024)

Synthetic Data in AI: Challenges, Applications, and Ethical Implications
por: Hao, Shuang, et al.
Publicado: (2024)

Video ReCap: Recursive Captioning of Hour-Long Videos
por: Islam, Md Mohaiminul, et al.
Publicado: (2024)

Describe Anything: Detailed Localized Image and Video Captioning
por: Lian, Long, et al.
Publicado: (2025)

FingerCap: Fine-grained Finger-level Hand Motion Captioning
por: Shen, Xin, et al.
Publicado: (2025)

IF-VidCap: Can Video Caption Models Follow Instructions?
por: Li, Shihao, et al.
Publicado: (2025)

Benchmarking and Improving Detail Image Caption
por: Dong, Hongyuan, et al.
Publicado: (2024)

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
por: Fan, Tiehan, et al.
Publicado: (2024)

VoCap: Video Object Captioning and Segmentation from Any Prompt
por: Uijlings, Jasper, et al.
Publicado: (2025)

ControlCap: Controllable Region-level Captioning
por: Zhao, Yuzhong, et al.
Publicado: (2024)

Dense Motion Captioning
por: Xu, Shiyao, et al.
Publicado: (2025)

Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
por: Qin, Luozheng, et al.
Publicado: (2025)

SynPO: Synergizing Descriptiveness and Preference Optimization for Video Detailed Captioning
por: Dang, Jisheng, et al.
Publicado: (2025)

CCCaption: Dual-Reward Reinforcement Learning for Complete and Correct Image Captioning
por: Tang, Zhijiang, et al.
Publicado: (2026)

VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking
por: Meng, Desen, et al.
Publicado: (2025)

VCap: Hypergeometric Rewards for Weak-to-Strong Visual Captioning
por: Lu, Xingyu, et al.
Publicado: (2026)

ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
por: Yeshwanth, Chandan, et al.
Publicado: (2025)

Towards Fine-Grained Human Motion Video Captioning
por: Song, Guorui, et al.
Publicado: (2025)

Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
por: Tang, Changli, et al.
Publicado: (2024)

ChartCap: Mitigating Hallucination of Dense Chart Captioning
por: Lim, Junyoung, et al.
Publicado: (2025)

Retrieval-Augmented Egocentric Video Captioning
por: Xu, Jilan, et al.
Publicado: (2024)

Panoptic Captioning: An Equivalence Bridge for Image and Text
por: Lin, Kun-Yu, et al.
Publicado: (2025)

VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation
por: Chen, Xinlong, et al.
Publicado: (2025)

AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models
por: Bai, Jisheng, et al.
Publicado: (2024)

FinCap: Topic-Aligned Captions for Short-Form Financial YouTube Videos
por: Sukhani, Siddhant, et al.
Publicado: (2025)

GLaVE-Cap: Global-Local Aligned Video Captioning with Vision Expert Integration
por: Xu, Wan, et al.
Publicado: (2025)

CodecCap: High-Fidelity Codec-Inspired Residual Modeling for Dense Video Captioning
por: Lin, Zihan, et al.
Publicado: (2026)

ReflectCAP: Detailed Image Captioning with Reflective Memory
por: Min, Kyungmin, et al.
Publicado: (2026)

Generating Accurate and Detailed Captions for High-Resolution Images
por: Lee, Hankyeol, et al.
Publicado: (2025)

The Devil is in the EOS: Sequence Training for Detailed Image Captioning
por: Mohamed, Abdelrahman, et al.
Publicado: (2025)

Live Video Captioning
por: Blanco-Fernández, Eduardo, et al.
Publicado: (2024)

DEVICE: Depth and Visual Concepts Aware Transformer for OCR-based Image Captioning
por: Xu, Dongsheng, et al.
Publicado: (2023)

Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
por: Lu, Yifan, et al.
Publicado: (2023)

GroundCap: A Visually Grounded Image Captioning Dataset
por: Oliveira, Daniel A. P., et al.
Publicado: (2025)

AlignCap: Aligning Speech Emotion Captioning to Human Preferences
por: Liang, Ziqi, et al.
Publicado: (2024)

XMeCap: Meme Caption Generation with Sub-Image Adaptability
por: Chen, Yuyan, et al.
Publicado: (2024)