Saved in:
Bibliographic Details
Main Authors: Lian, Junbo Jacob, Xiong, Feng, Sun, Yujun, Ouyang, Kaichen, Ke, Zong, Yu, Mingyang, Fu, Shengwei, Rui, Zhong, Yujun, Zhang, Chen, Huiling
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.07262
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915974887768064
author Lian, Junbo Jacob
Xiong, Feng
Sun, Yujun
Ouyang, Kaichen
Ke, Zong
Yu, Mingyang
Fu, Shengwei
Rui, Zhong
Yujun, Zhang
Chen, Huiling
author_facet Lian, Junbo Jacob
Xiong, Feng
Sun, Yujun
Ouyang, Kaichen
Ke, Zong
Yu, Mingyang
Fu, Shengwei
Rui, Zhong
Yujun, Zhang
Chen, Huiling
contents Second-order feature statistics are central to texture recognition, yet existing mechanisms exhibit a structural tension: bilinear pooling and Gram matrices capture global channel correlations but discard spatial structure, whereas self-attention models capture cross-position relations through weighted sums rather than explicit pairwise products. We propose TwistNet-2D, a lightweight module that computes local pairwise channel products under directional spatial displacement, jointly encoding where features co-occur and how they interact. The core component, Spiral-Twisted Channel Interaction (STCI), shifts one feature map along a prescribed direction before L2-normalized channel multiplication, capturing cross-position co-occurrence patterns that characterize structured and periodic textures. Four directional heads are aggregated through content-adaptive channel reweighting, and the result is injected via a sigmoid-gated residual path with near-zero initialization. TwistNet-2D adds only approximately 3.5% parameters and approximately 2% FLOPs over ResNet-18. To isolate the contribution of architectural inductive bias from that of transfer learning, all models in this study are trained from scratch without ImageNet pretraining. Under this protocol, TwistNet-2D consistently surpasses parameter-matched baselines and substantially larger ConvNeXt and Swin Transformer backbones across four texture and fine-grained recognition benchmarks, while the multi-head structure produces interpretable, orientation-selective representations that align with classical texture analysis.
format Preprint
id arxiv_https___arxiv_org_abs_2602_07262
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle TwistNet-2D: Learning Second-Order Channel Interactions via Spiral Twisting for Texture Recognition
Lian, Junbo Jacob
Xiong, Feng
Sun, Yujun
Ouyang, Kaichen
Ke, Zong
Yu, Mingyang
Fu, Shengwei
Rui, Zhong
Yujun, Zhang
Chen, Huiling
Computer Vision and Pattern Recognition
Second-order feature statistics are central to texture recognition, yet existing mechanisms exhibit a structural tension: bilinear pooling and Gram matrices capture global channel correlations but discard spatial structure, whereas self-attention models capture cross-position relations through weighted sums rather than explicit pairwise products. We propose TwistNet-2D, a lightweight module that computes local pairwise channel products under directional spatial displacement, jointly encoding where features co-occur and how they interact. The core component, Spiral-Twisted Channel Interaction (STCI), shifts one feature map along a prescribed direction before L2-normalized channel multiplication, capturing cross-position co-occurrence patterns that characterize structured and periodic textures. Four directional heads are aggregated through content-adaptive channel reweighting, and the result is injected via a sigmoid-gated residual path with near-zero initialization. TwistNet-2D adds only approximately 3.5% parameters and approximately 2% FLOPs over ResNet-18. To isolate the contribution of architectural inductive bias from that of transfer learning, all models in this study are trained from scratch without ImageNet pretraining. Under this protocol, TwistNet-2D consistently surpasses parameter-matched baselines and substantially larger ConvNeXt and Swin Transformer backbones across four texture and fine-grained recognition benchmarks, while the multi-head structure produces interpretable, orientation-selective representations that align with classical texture analysis.
title TwistNet-2D: Learning Second-Order Channel Interactions via Spiral Twisting for Texture Recognition
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2602.07262