:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Huilin, Sun, Qiyu, Li, Fangfei, Tang, Yang
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2407.06513
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Applying Deep Neural Networks to automate visual verification of manual bracket installations in aerospace
by: Oyekan, John, et al.
Published: (2024)

Thinker: A vision-language foundation model for embodied intelligence
by: Pan, Baiyu, et al.
Published: (2026)

Robust Single-shot Structured Light 3D Imaging via Neural Feature Decoding
by: Li, Jiaheng, et al.
Published: (2025)

ProFound: A moderate-sized vision foundation model for multi-task prostate imaging
by: Wang, Yipei, et al.
Published: (2026)

Adversarial Examples in Environment Perception for Automated Driving (Review)
by: Yan, Jun, et al.
Published: (2025)

UniPINN: A Unified PINN Framework for Multi-task Learning of Diverse Navier-Stokes Equations
by: Sun, Dengdi, et al.
Published: (2026)

HSFusion: A high-level vision task-driven infrared and visible image fusion network via semantic and geometric domain transformation
by: Jiang, Chengjie, et al.
Published: (2024)

Utilizing the Mean Teacher with Supcontrast Loss for Wafer Pattern Recognition
by: Wei, Qiyu, et al.
Published: (2024)

Computer vision-based estimation of invertebrate biomass
by: Impiö, Mikko, et al.
Published: (2026)

ViSTa Dataset: Do vision-language models understand sequential tasks?
by: Wybitul, Evžen, et al.
Published: (2024)

Representation geometry shapes task performance in vision-language modeling for CT enterography
by: Minoccheri, Cristian, et al.
Published: (2026)

Adaptive Dual-Constrained Line Aggregation for Robust Generic and Wireframe Line Segment Detection
by: Liu, Chenguang, et al.
Published: (2025)

A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization
by: Chen, Qiyu, et al.
Published: (2024)

RoMa: Robust Dense Feature Matching
by: Edstedt, Johan, et al.
Published: (2023)

Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection
by: Yang, Longrong, et al.
Published: (2023)

Openfly: A comprehensive platform for aerial vision-language navigation
by: Gao, Yunpeng, et al.
Published: (2025)

Fine-tuning vision foundation model for crack segmentation in civil infrastructures
by: Ge, Kang, et al.
Published: (2023)

Beyond conventional vision: RGB-event fusion for robust object detection in dynamic traffic scenarios
by: Liu, Zhanwen, et al.
Published: (2025)

Learning Physical Dynamics for Object-centric Visual Prediction
by: Xu, Huilin, et al.
Published: (2024)

Bias-constrained multimodal intelligence for equitable and reliable clinical AI
by: Li, Cheng, et al.
Published: (2026)

VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection
by: Deng, Huilin, et al.
Published: (2024)

TABLET: Table Structure Recognition using Encoder-only Transformers
by: Hou, Qiyu, et al.
Published: (2025)

MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval
by: Xu, Chaoran, et al.
Published: (2026)

POPCat: Propagation of particles for complex annotation tasks
by: Yang, Adam Srebrnjak, et al.
Published: (2024)

OmniCamera: A Unified Framework for Multi-task Video Generation with Arbitrary Camera Control
by: Wang, Yukun, et al.
Published: (2026)

Concurrent validity of computer-vision artificial intelligence player tracking software using broadcast footage
by: Crang, Zachary L., et al.
Published: (2025)

Dynamic Proxy Domain Generalizes the Crowd Localization by Better Binary Segmentation
by: Gao, Junyu, et al.
Published: (2024)

SOTA: Self-adaptive Optimal Transport for Zero-Shot Classification with Multiple Foundation Models
by: Hu, Zhanxuan, et al.
Published: (2025)

DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
by: Luo, Jingzhou, et al.
Published: (2025)

4D-Rotor Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes
by: Duan, Yuanxing, et al.
Published: (2024)

MT-Depth: Multi-task Instance feature analysis for the Depth Completion
by: Nizamani, Abdul Haseeb, et al.
Published: (2025)

Efficient RGB-D Scene Understanding via Multi-task Adaptive Learning and Cross-dimensional Feature Guidance
by: Sun, Guodong, et al.
Published: (2026)

Flatten: Video Action Recognition is an Image Classification task
by: Chen, Junlin, et al.
Published: (2024)

WS-DETR: Robust Water Surface Object Detection through Vision-Radar Fusion with Detection Transformer
by: Yin, Huilin, et al.
Published: (2025)

LiMT: A Multi-task Liver Image Benchmark Dataset
by: Liu, Zhe, et al.
Published: (2025)

SGIA: Enhancing Fine-Grained Visual Classification with Sequence Generative Image Augmentation
by: Liao, Qiyu, et al.
Published: (2024)

Co-Training Vision Language Models for Remote Sensing Multi-task Learning
by: Li, Qingyun, et al.
Published: (2025)

Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up
by: Huang, Lang, et al.
Published: (2025)

RefracGS: Novel View Synthesis Through Refractive Water Surfaces with 3D Gaussian Ray Tracing
by: Shao, Yiming, et al.
Published: (2026)

OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation
by: Li, Bohan, et al.
Published: (2024)