Saved in:
| Main Authors: | Chen, Huilin, Sun, Qiyu, Li, Fangfei, Tang, Yang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.06513 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Applying Deep Neural Networks to automate visual verification of manual bracket installations in aerospace
by: Oyekan, John, et al.
Published: (2024)
by: Oyekan, John, et al.
Published: (2024)
Thinker: A vision-language foundation model for embodied intelligence
by: Pan, Baiyu, et al.
Published: (2026)
by: Pan, Baiyu, et al.
Published: (2026)
Robust Single-shot Structured Light 3D Imaging via Neural Feature Decoding
by: Li, Jiaheng, et al.
Published: (2025)
by: Li, Jiaheng, et al.
Published: (2025)
ProFound: A moderate-sized vision foundation model for multi-task prostate imaging
by: Wang, Yipei, et al.
Published: (2026)
by: Wang, Yipei, et al.
Published: (2026)
Adversarial Examples in Environment Perception for Automated Driving (Review)
by: Yan, Jun, et al.
Published: (2025)
by: Yan, Jun, et al.
Published: (2025)
UniPINN: A Unified PINN Framework for Multi-task Learning of Diverse Navier-Stokes Equations
by: Sun, Dengdi, et al.
Published: (2026)
by: Sun, Dengdi, et al.
Published: (2026)
HSFusion: A high-level vision task-driven infrared and visible image fusion network via semantic and geometric domain transformation
by: Jiang, Chengjie, et al.
Published: (2024)
by: Jiang, Chengjie, et al.
Published: (2024)
Utilizing the Mean Teacher with Supcontrast Loss for Wafer Pattern Recognition
by: Wei, Qiyu, et al.
Published: (2024)
by: Wei, Qiyu, et al.
Published: (2024)
Computer vision-based estimation of invertebrate biomass
by: Impiö, Mikko, et al.
Published: (2026)
by: Impiö, Mikko, et al.
Published: (2026)
ViSTa Dataset: Do vision-language models understand sequential tasks?
by: Wybitul, Evžen, et al.
Published: (2024)
by: Wybitul, Evžen, et al.
Published: (2024)
Representation geometry shapes task performance in vision-language modeling for CT enterography
by: Minoccheri, Cristian, et al.
Published: (2026)
by: Minoccheri, Cristian, et al.
Published: (2026)
Adaptive Dual-Constrained Line Aggregation for Robust Generic and Wireframe Line Segment Detection
by: Liu, Chenguang, et al.
Published: (2025)
by: Liu, Chenguang, et al.
Published: (2025)
A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization
by: Chen, Qiyu, et al.
Published: (2024)
by: Chen, Qiyu, et al.
Published: (2024)
RoMa: Robust Dense Feature Matching
by: Edstedt, Johan, et al.
Published: (2023)
by: Edstedt, Johan, et al.
Published: (2023)
Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection
by: Yang, Longrong, et al.
Published: (2023)
by: Yang, Longrong, et al.
Published: (2023)
Openfly: A comprehensive platform for aerial vision-language navigation
by: Gao, Yunpeng, et al.
Published: (2025)
by: Gao, Yunpeng, et al.
Published: (2025)
Fine-tuning vision foundation model for crack segmentation in civil infrastructures
by: Ge, Kang, et al.
Published: (2023)
by: Ge, Kang, et al.
Published: (2023)
Beyond conventional vision: RGB-event fusion for robust object detection in dynamic traffic scenarios
by: Liu, Zhanwen, et al.
Published: (2025)
by: Liu, Zhanwen, et al.
Published: (2025)
Learning Physical Dynamics for Object-centric Visual Prediction
by: Xu, Huilin, et al.
Published: (2024)
by: Xu, Huilin, et al.
Published: (2024)
Bias-constrained multimodal intelligence for equitable and reliable clinical AI
by: Li, Cheng, et al.
Published: (2026)
by: Li, Cheng, et al.
Published: (2026)
VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection
by: Deng, Huilin, et al.
Published: (2024)
by: Deng, Huilin, et al.
Published: (2024)
TABLET: Table Structure Recognition using Encoder-only Transformers
by: Hou, Qiyu, et al.
Published: (2025)
by: Hou, Qiyu, et al.
Published: (2025)
MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval
by: Xu, Chaoran, et al.
Published: (2026)
by: Xu, Chaoran, et al.
Published: (2026)
POPCat: Propagation of particles for complex annotation tasks
by: Yang, Adam Srebrnjak, et al.
Published: (2024)
by: Yang, Adam Srebrnjak, et al.
Published: (2024)
OmniCamera: A Unified Framework for Multi-task Video Generation with Arbitrary Camera Control
by: Wang, Yukun, et al.
Published: (2026)
by: Wang, Yukun, et al.
Published: (2026)
Concurrent validity of computer-vision artificial intelligence player tracking software using broadcast footage
by: Crang, Zachary L., et al.
Published: (2025)
by: Crang, Zachary L., et al.
Published: (2025)
Dynamic Proxy Domain Generalizes the Crowd Localization by Better Binary Segmentation
by: Gao, Junyu, et al.
Published: (2024)
by: Gao, Junyu, et al.
Published: (2024)
SOTA: Self-adaptive Optimal Transport for Zero-Shot Classification with Multiple Foundation Models
by: Hu, Zhanxuan, et al.
Published: (2025)
by: Hu, Zhanxuan, et al.
Published: (2025)
DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
by: Luo, Jingzhou, et al.
Published: (2025)
by: Luo, Jingzhou, et al.
Published: (2025)
4D-Rotor Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes
by: Duan, Yuanxing, et al.
Published: (2024)
by: Duan, Yuanxing, et al.
Published: (2024)
MT-Depth: Multi-task Instance feature analysis for the Depth Completion
by: Nizamani, Abdul Haseeb, et al.
Published: (2025)
by: Nizamani, Abdul Haseeb, et al.
Published: (2025)
Efficient RGB-D Scene Understanding via Multi-task Adaptive Learning and Cross-dimensional Feature Guidance
by: Sun, Guodong, et al.
Published: (2026)
by: Sun, Guodong, et al.
Published: (2026)
Flatten: Video Action Recognition is an Image Classification task
by: Chen, Junlin, et al.
Published: (2024)
by: Chen, Junlin, et al.
Published: (2024)
WS-DETR: Robust Water Surface Object Detection through Vision-Radar Fusion with Detection Transformer
by: Yin, Huilin, et al.
Published: (2025)
by: Yin, Huilin, et al.
Published: (2025)
LiMT: A Multi-task Liver Image Benchmark Dataset
by: Liu, Zhe, et al.
Published: (2025)
by: Liu, Zhe, et al.
Published: (2025)
SGIA: Enhancing Fine-Grained Visual Classification with Sequence Generative Image Augmentation
by: Liao, Qiyu, et al.
Published: (2024)
by: Liao, Qiyu, et al.
Published: (2024)
Co-Training Vision Language Models for Remote Sensing Multi-task Learning
by: Li, Qingyun, et al.
Published: (2025)
by: Li, Qingyun, et al.
Published: (2025)
Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up
by: Huang, Lang, et al.
Published: (2025)
by: Huang, Lang, et al.
Published: (2025)
RefracGS: Novel View Synthesis Through Refractive Water Surfaces with 3D Gaussian Ray Tracing
by: Shao, Yiming, et al.
Published: (2026)
by: Shao, Yiming, et al.
Published: (2026)
OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation
by: Li, Bohan, et al.
Published: (2024)
by: Li, Bohan, et al.
Published: (2024)
Similar Items
-
Applying Deep Neural Networks to automate visual verification of manual bracket installations in aerospace
by: Oyekan, John, et al.
Published: (2024) -
Thinker: A vision-language foundation model for embodied intelligence
by: Pan, Baiyu, et al.
Published: (2026) -
Robust Single-shot Structured Light 3D Imaging via Neural Feature Decoding
by: Li, Jiaheng, et al.
Published: (2025) -
ProFound: A moderate-sized vision foundation model for multi-task prostate imaging
by: Wang, Yipei, et al.
Published: (2026) -
Adversarial Examples in Environment Perception for Automated Driving (Review)
by: Yan, Jun, et al.
Published: (2025)