Zhang, D., You, W., Li, J., Lin, W., Shi, W., Zhao, X., . . . Sun, L. (2025). Learning to Hear by Seeing: It's Time for Vision Language Models to Understand Artistic Emotion from Sight and Sound.
Chicago Style (17th ed.) CitationZhang, Dengming, Weitao You, Jingxiong Li, Weishen Lin, Wenda Shi, Xue Zhao, Heda Zuo, Junxian Wu, and Lingyun Sun. Learning to Hear by Seeing: It's Time for Vision Language Models to Understand Artistic Emotion from Sight and Sound. 2025.
MLA (9th ed.) CitationZhang, Dengming, et al. Learning to Hear by Seeing: It's Time for Vision Language Models to Understand Artistic Emotion from Sight and Sound. 2025.
Warning: These citations may not always be 100% accurate.