Saved in:
| Main Authors: | Yoshida, Takero, Ito, Yuikazu, Fujiwara, Yoshihiro, Tsuchida, Shinji, Sugiyama, Daisuke, Matsuoka, Daisuke |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.15574 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Time-varying rPPG signal separation via block-sparse signal model
by: Kurihara, Kosuke, et al.
Published: (2026)
by: Kurihara, Kosuke, et al.
Published: (2026)
JAMMEval: A Refined Collection of Japanese Benchmarks for Reliable VLM Evaluation
by: Sugiura, Issa, et al.
Published: (2026)
by: Sugiura, Issa, et al.
Published: (2026)
Structure from Motion-based Motion Estimation and 3D Reconstruction of Unknown Shaped Space Debris
by: Uno, Kentaro, et al.
Published: (2024)
by: Uno, Kentaro, et al.
Published: (2024)
CQVPR: Landmark-aware Contextual Queries for Visual Place Recognition
by: Li, Dongyue, et al.
Published: (2025)
by: Li, Dongyue, et al.
Published: (2025)
Pathology Foundation Models
by: Ochi, Mieko, et al.
Published: (2024)
by: Ochi, Mieko, et al.
Published: (2024)
Evaluating Multimodal Large Language Models on Vertically Written Japanese Text
by: Sasagawa, Keito, et al.
Published: (2025)
by: Sasagawa, Keito, et al.
Published: (2025)
Objective, Absolute and Hue-aware Metrics for Intrinsic Image Decomposition on Real-World Scenes: A Proof of Concept
by: Sato, Shogo, et al.
Published: (2025)
by: Sato, Shogo, et al.
Published: (2025)
Breaking the Scalability Limit of Multi-Projector Calibration with Embedded Cameras
by: Kawano, Takumi, et al.
Published: (2026)
by: Kawano, Takumi, et al.
Published: (2026)
UHD-IQA Benchmark Database: Pushing the Boundaries of Blind Photo Quality Assessment
by: Hosu, Vlad, et al.
Published: (2024)
by: Hosu, Vlad, et al.
Published: (2024)
TimeLogic: A Temporal Logic Benchmark for Video QA
by: Swetha, Sirnam, et al.
Published: (2025)
by: Swetha, Sirnam, et al.
Published: (2025)
Fusion of regional and sparse attention in Vision Transformers
by: Ibtehaz, Nabil, et al.
Published: (2024)
by: Ibtehaz, Nabil, et al.
Published: (2024)
Generalizable Semantic Vision Query Generation for Zero-shot Panoptic and Semantic Segmentation
by: Chen, Jialei, et al.
Published: (2024)
by: Chen, Jialei, et al.
Published: (2024)
Image Intrinsic Scale Assessment: Bridging the Gap Between Quality and Resolution
by: Hosu, Vlad, et al.
Published: (2025)
by: Hosu, Vlad, et al.
Published: (2025)
ACC-ViT : Atrous Convolution's Comeback in Vision Transformers
by: Ibtehaz, Nabil, et al.
Published: (2024)
by: Ibtehaz, Nabil, et al.
Published: (2024)
3D-Plotting Algorithm for Insects using YOLOv5
by: Mori, Daisuke, et al.
Published: (2024)
by: Mori, Daisuke, et al.
Published: (2024)
PLaMo 2.1-VL Technical Report
by: Kerola, Tommi, et al.
Published: (2026)
by: Kerola, Tommi, et al.
Published: (2026)
CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs
by: Zhou, Xingcheng, et al.
Published: (2026)
by: Zhou, Xingcheng, et al.
Published: (2026)
AI-EDI-SPACE: A Co-designed Dataset for Evaluating the Quality of Public Spaces
by: Gowaikar, Shreeyash, et al.
Published: (2024)
by: Gowaikar, Shreeyash, et al.
Published: (2024)
Unbiased Regression Loss for DETRs
by: Edric, et al.
Published: (2024)
by: Edric, et al.
Published: (2024)
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs
by: Wang, Hao, et al.
Published: (2025)
by: Wang, Hao, et al.
Published: (2025)
Split Matching for Inductive Zero-shot Semantic Segmentation
by: Chen, Jialei, et al.
Published: (2025)
by: Chen, Jialei, et al.
Published: (2025)
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
by: Rasheed, Hanoona, et al.
Published: (2025)
by: Rasheed, Hanoona, et al.
Published: (2025)
ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition
by: Salehi, Mohammadreza, et al.
Published: (2024)
by: Salehi, Mohammadreza, et al.
Published: (2024)
Can multimodal representation learning by alignment preserve modality-specific information?
by: Thoreau, Romain, et al.
Published: (2025)
by: Thoreau, Romain, et al.
Published: (2025)
Beyond RGB: Adaptive Parallel Processing for RAW Object Detection
by: Gamrian, Shani, et al.
Published: (2025)
by: Gamrian, Shani, et al.
Published: (2025)
RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation
by: Reinders, Christoph, et al.
Published: (2024)
by: Reinders, Christoph, et al.
Published: (2024)
Event Interval Modulation: A Novel Scheme for Event-based Optical Camera Communication
by: Sumino, Miu, et al.
Published: (2025)
by: Sumino, Miu, et al.
Published: (2025)
PAGen: Phase-guided Amplitude Generation for Domain-adaptive Object Detection
by: Du, Shuchen, et al.
Published: (2025)
by: Du, Shuchen, et al.
Published: (2025)
CLIP Is Also a Good Teacher: A New Learning Framework for Inductive Zero-shot Semantic Segmentation
by: Chen, Jialei, et al.
Published: (2023)
by: Chen, Jialei, et al.
Published: (2023)
Experimental Demonstration of Event-based Optical Camera Communication in Long-Range Outdoor Environment
by: Sumino, Miu, et al.
Published: (2025)
by: Sumino, Miu, et al.
Published: (2025)
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs
by: Wang, Jiarui, et al.
Published: (2025)
by: Wang, Jiarui, et al.
Published: (2025)
MovieRecapsQA: A Multimodal Open-Ended Video Question-Answering Benchmark
by: Shaar, Shaden, et al.
Published: (2026)
by: Shaar, Shaden, et al.
Published: (2026)
mChartQA: A universal benchmark for multimodal Chart Question Answer based on Vision-Language Alignment and Reasoning
by: Wei, Jingxuan, et al.
Published: (2024)
by: Wei, Jingxuan, et al.
Published: (2024)
Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images
by: Lompo, Boammani Aser, et al.
Published: (2025)
by: Lompo, Boammani Aser, et al.
Published: (2025)
MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems
by: Zhu, Zifeng, et al.
Published: (2024)
by: Zhu, Zifeng, et al.
Published: (2024)
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports
by: Li, Haopeng, et al.
Published: (2024)
by: Li, Haopeng, et al.
Published: (2024)
TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes
by: Zhou, Xingcheng, et al.
Published: (2025)
by: Zhou, Xingcheng, et al.
Published: (2025)
RTime-QA: A Benchmark for Atomic Temporal Event Understanding in Large Multi-modal Models
by: Liu, Yuqi, et al.
Published: (2025)
by: Liu, Yuqi, et al.
Published: (2025)
Simple Visual Artifact Detection in Sora-Generated Videos
by: Sugiyama, Misora, et al.
Published: (2025)
by: Sugiyama, Misora, et al.
Published: (2025)
GS-QA: Comprehensive Quality Assessment Benchmark for Gaussian Splatting View Synthesis
by: Martin, Pedro, et al.
Published: (2025)
by: Martin, Pedro, et al.
Published: (2025)
Similar Items
-
Time-varying rPPG signal separation via block-sparse signal model
by: Kurihara, Kosuke, et al.
Published: (2026) -
JAMMEval: A Refined Collection of Japanese Benchmarks for Reliable VLM Evaluation
by: Sugiura, Issa, et al.
Published: (2026) -
Structure from Motion-based Motion Estimation and 3D Reconstruction of Unknown Shaped Space Debris
by: Uno, Kentaro, et al.
Published: (2024) -
CQVPR: Landmark-aware Contextual Queries for Visual Place Recognition
by: Li, Dongyue, et al.
Published: (2025) -
Pathology Foundation Models
by: Ochi, Mieko, et al.
Published: (2024)