Sirdás sisdollui
VuFind
  • Čálihuva
    • English
    • Deutsch
    • Español
    • Français
    • Italiano
    • 日本語
    • Nederlands
    • Português
    • Português (Brasil)
    • 中文(简体)
    • 中文(繁體)
    • Türkçe
    • עברית
    • Gaeilge
    • Cymraeg
    • Ελληνικά
    • Català
    • Euskara
    • Русский
    • Čeština
    • Suomi
    • Svenska
    • polski
    • Dansk
    • slovenščina
    • اللغة العربية
    • বাংলা
    • Galego
    • Tiếng Việt
    • Hrvatski
    • हिंदी
    • Հայերէն
    • Українська
    • Sámegiella
    • Монгол
    • Māori
Aiddostahtton
  • Čujuhandieđut
  • Deakstadieđáhus
  • Sádde šleađgaboasttain
  • Čálit
  • Doalvvo čujuhusa
    • Doalvun: RefWorks
    • Doalvun: EndNoteWeb
    • Doalvun: EndNote
  • Lasit oiddohiidda
  • Bissovaš liŋka
Bearbmagovva

Furkejuvvon:
Bibliográfalaš dieđut
Váldodahkkit: Zhang, Haotian, Gao, Mingfei, Gan, Zhe, Dufter, Philipp, Wenzel, Nina, Huang, Forrest, Shah, Dhruti, Du, Xianzhi, Zhang, Bowen, Li, Yanghao, Dodge, Sam, You, Keen, Yang, Zhen, Timofeev, Aleksei, Xu, Mingze, Chen, Hong-You, Fauconnier, Jean-Philippe, Lai, Zhengfeng, You, Haoxuan, Wang, Zirui, Dehghan, Afshin, Grasch, Peter, Yang, Yinfei
Materiálatiipa: Preprint
Almmustuhtton: 2024
Fáttát:
Computer Vision and Pattern Recognition
Computation and Language
Machine Learning
Liŋkkat:https://arxiv.org/abs/2409.20566
Fáddágilkorat: Lasit fáddágilkoriid
Eai fáddágilkorat, Lasit vuosttaš fáddágilkora!
  • Oažžasuvvandieđut
  • Govvádus
  • Sisdoallologahallan
  • Kommeanttat
  • Geahča maid
  • Bargiidšearbma

Interneahtta

https://arxiv.org/abs/2409.20566

Geahča maid

  • MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
    Dahkki: Daxberger, Erik, et al.
    Almmustuhtton: (2025)
  • SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
    Dahkki: Xu, Mingze, et al.
    Almmustuhtton: (2024)
  • Understanding Alignment in Multimodal LLMs: A Comprehensive Study
    Dahkki: Amirloo, Elmira, et al.
    Almmustuhtton: (2024)
  • SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
    Dahkki: Xu, Mingze, et al.
    Almmustuhtton: (2025)
  • MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
    Dahkki: Ye, Hanrong, et al.
    Almmustuhtton: (2024)

Ozu molssaeavttut

  • Ohcanhistorjá
  • Aiddostahtton ohcu

Viečča lasi

  • Bláđe logahallama
  • Bláđe alfabehtalaš ortnegis
  • Dutkka kanálaid
  • Gursagirjjit
  • Ođatlogahallan

Dárbbašatgo veahki?

  • Ohcanráva
  • Jeara girjerájus
  • DJG:t