Guardado en:
| Autores principales: | Zhong, Chunlin, Hou, Qiuxia, Zhou, Zhangjun, Hao, Shuang, Lu, Haonan, Zhang, Yanhao, Tang, He, Bai, Xiang |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2508.18634 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Cap2Sum: Learning to Summarize Videos by Generating Captions
por: Zhao, Cairong, et al.
Publicado: (2024)
por: Zhao, Cairong, et al.
Publicado: (2024)
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
por: Chai, Wenhao, et al.
Publicado: (2024)
por: Chai, Wenhao, et al.
Publicado: (2024)
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
por: Cheng, Kanzhi, et al.
Publicado: (2025)
por: Cheng, Kanzhi, et al.
Publicado: (2025)
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks
por: Wu, Peiran, et al.
Publicado: (2025)
por: Wu, Peiran, et al.
Publicado: (2025)
SnapCap: Efficient Snapshot Compressive Video Captioning
por: Sun, Jianqiao, et al.
Publicado: (2024)
por: Sun, Jianqiao, et al.
Publicado: (2024)
Synthetic Data in AI: Challenges, Applications, and Ethical Implications
por: Hao, Shuang, et al.
Publicado: (2024)
por: Hao, Shuang, et al.
Publicado: (2024)
Video ReCap: Recursive Captioning of Hour-Long Videos
por: Islam, Md Mohaiminul, et al.
Publicado: (2024)
por: Islam, Md Mohaiminul, et al.
Publicado: (2024)
Describe Anything: Detailed Localized Image and Video Captioning
por: Lian, Long, et al.
Publicado: (2025)
por: Lian, Long, et al.
Publicado: (2025)
FingerCap: Fine-grained Finger-level Hand Motion Captioning
por: Shen, Xin, et al.
Publicado: (2025)
por: Shen, Xin, et al.
Publicado: (2025)
IF-VidCap: Can Video Caption Models Follow Instructions?
por: Li, Shihao, et al.
Publicado: (2025)
por: Li, Shihao, et al.
Publicado: (2025)
Benchmarking and Improving Detail Image Caption
por: Dong, Hongyuan, et al.
Publicado: (2024)
por: Dong, Hongyuan, et al.
Publicado: (2024)
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
por: Fan, Tiehan, et al.
Publicado: (2024)
por: Fan, Tiehan, et al.
Publicado: (2024)
VoCap: Video Object Captioning and Segmentation from Any Prompt
por: Uijlings, Jasper, et al.
Publicado: (2025)
por: Uijlings, Jasper, et al.
Publicado: (2025)
ControlCap: Controllable Region-level Captioning
por: Zhao, Yuzhong, et al.
Publicado: (2024)
por: Zhao, Yuzhong, et al.
Publicado: (2024)
Dense Motion Captioning
por: Xu, Shiyao, et al.
Publicado: (2025)
por: Xu, Shiyao, et al.
Publicado: (2025)
Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
por: Qin, Luozheng, et al.
Publicado: (2025)
por: Qin, Luozheng, et al.
Publicado: (2025)
SynPO: Synergizing Descriptiveness and Preference Optimization for Video Detailed Captioning
por: Dang, Jisheng, et al.
Publicado: (2025)
por: Dang, Jisheng, et al.
Publicado: (2025)
CCCaption: Dual-Reward Reinforcement Learning for Complete and Correct Image Captioning
por: Tang, Zhijiang, et al.
Publicado: (2026)
por: Tang, Zhijiang, et al.
Publicado: (2026)
VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking
por: Meng, Desen, et al.
Publicado: (2025)
por: Meng, Desen, et al.
Publicado: (2025)
VCap: Hypergeometric Rewards for Weak-to-Strong Visual Captioning
por: Lu, Xingyu, et al.
Publicado: (2026)
por: Lu, Xingyu, et al.
Publicado: (2026)
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
por: Yeshwanth, Chandan, et al.
Publicado: (2025)
por: Yeshwanth, Chandan, et al.
Publicado: (2025)
Towards Fine-Grained Human Motion Video Captioning
por: Song, Guorui, et al.
Publicado: (2025)
por: Song, Guorui, et al.
Publicado: (2025)
Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization
por: Tang, Changli, et al.
Publicado: (2024)
por: Tang, Changli, et al.
Publicado: (2024)
ChartCap: Mitigating Hallucination of Dense Chart Captioning
por: Lim, Junyoung, et al.
Publicado: (2025)
por: Lim, Junyoung, et al.
Publicado: (2025)
Retrieval-Augmented Egocentric Video Captioning
por: Xu, Jilan, et al.
Publicado: (2024)
por: Xu, Jilan, et al.
Publicado: (2024)
Panoptic Captioning: An Equivalence Bridge for Image and Text
por: Lin, Kun-Yu, et al.
Publicado: (2025)
por: Lin, Kun-Yu, et al.
Publicado: (2025)
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation
por: Chen, Xinlong, et al.
Publicado: (2025)
por: Chen, Xinlong, et al.
Publicado: (2025)
AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models
por: Bai, Jisheng, et al.
Publicado: (2024)
por: Bai, Jisheng, et al.
Publicado: (2024)
FinCap: Topic-Aligned Captions for Short-Form Financial YouTube Videos
por: Sukhani, Siddhant, et al.
Publicado: (2025)
por: Sukhani, Siddhant, et al.
Publicado: (2025)
GLaVE-Cap: Global-Local Aligned Video Captioning with Vision Expert Integration
por: Xu, Wan, et al.
Publicado: (2025)
por: Xu, Wan, et al.
Publicado: (2025)
CodecCap: High-Fidelity Codec-Inspired Residual Modeling for Dense Video Captioning
por: Lin, Zihan, et al.
Publicado: (2026)
por: Lin, Zihan, et al.
Publicado: (2026)
ReflectCAP: Detailed Image Captioning with Reflective Memory
por: Min, Kyungmin, et al.
Publicado: (2026)
por: Min, Kyungmin, et al.
Publicado: (2026)
Generating Accurate and Detailed Captions for High-Resolution Images
por: Lee, Hankyeol, et al.
Publicado: (2025)
por: Lee, Hankyeol, et al.
Publicado: (2025)
The Devil is in the EOS: Sequence Training for Detailed Image Captioning
por: Mohamed, Abdelrahman, et al.
Publicado: (2025)
por: Mohamed, Abdelrahman, et al.
Publicado: (2025)
Live Video Captioning
por: Blanco-Fernández, Eduardo, et al.
Publicado: (2024)
por: Blanco-Fernández, Eduardo, et al.
Publicado: (2024)
DEVICE: Depth and Visual Concepts Aware Transformer for OCR-based Image Captioning
por: Xu, Dongsheng, et al.
Publicado: (2023)
por: Xu, Dongsheng, et al.
Publicado: (2023)
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
por: Lu, Yifan, et al.
Publicado: (2023)
por: Lu, Yifan, et al.
Publicado: (2023)
GroundCap: A Visually Grounded Image Captioning Dataset
por: Oliveira, Daniel A. P., et al.
Publicado: (2025)
por: Oliveira, Daniel A. P., et al.
Publicado: (2025)
AlignCap: Aligning Speech Emotion Captioning to Human Preferences
por: Liang, Ziqi, et al.
Publicado: (2024)
por: Liang, Ziqi, et al.
Publicado: (2024)
XMeCap: Meme Caption Generation with Sub-Image Adaptability
por: Chen, Yuyan, et al.
Publicado: (2024)
por: Chen, Yuyan, et al.
Publicado: (2024)
Ejemplares similares
-
Cap2Sum: Learning to Summarize Videos by Generating Captions
por: Zhao, Cairong, et al.
Publicado: (2024) -
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
por: Chai, Wenhao, et al.
Publicado: (2024) -
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
por: Cheng, Kanzhi, et al.
Publicado: (2025) -
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks
por: Wu, Peiran, et al.
Publicado: (2025) -
SnapCap: Efficient Snapshot Compressive Video Captioning
por: Sun, Jianqiao, et al.
Publicado: (2024)