Saved in:
| Main Authors: | , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.07262 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866915974887768064 |
|---|---|
| author | Lian, Junbo Jacob Xiong, Feng Sun, Yujun Ouyang, Kaichen Ke, Zong Yu, Mingyang Fu, Shengwei Rui, Zhong Yujun, Zhang Chen, Huiling |
| author_facet | Lian, Junbo Jacob Xiong, Feng Sun, Yujun Ouyang, Kaichen Ke, Zong Yu, Mingyang Fu, Shengwei Rui, Zhong Yujun, Zhang Chen, Huiling |
| contents | Second-order feature statistics are central to texture recognition, yet existing mechanisms exhibit a structural tension: bilinear pooling and Gram matrices capture global channel correlations but discard spatial structure, whereas self-attention models capture cross-position relations through weighted sums rather than explicit pairwise products. We propose TwistNet-2D, a lightweight module that computes local pairwise channel products under directional spatial displacement, jointly encoding where features co-occur and how they interact. The core component, Spiral-Twisted Channel Interaction (STCI), shifts one feature map along a prescribed direction before L2-normalized channel multiplication, capturing cross-position co-occurrence patterns that characterize structured and periodic textures. Four directional heads are aggregated through content-adaptive channel reweighting, and the result is injected via a sigmoid-gated residual path with near-zero initialization. TwistNet-2D adds only approximately 3.5% parameters and approximately 2% FLOPs over ResNet-18. To isolate the contribution of architectural inductive bias from that of transfer learning, all models in this study are trained from scratch without ImageNet pretraining. Under this protocol, TwistNet-2D consistently surpasses parameter-matched baselines and substantially larger ConvNeXt and Swin Transformer backbones across four texture and fine-grained recognition benchmarks, while the multi-head structure produces interpretable, orientation-selective representations that align with classical texture analysis. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2602_07262 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | TwistNet-2D: Learning Second-Order Channel Interactions via Spiral Twisting for Texture Recognition Lian, Junbo Jacob Xiong, Feng Sun, Yujun Ouyang, Kaichen Ke, Zong Yu, Mingyang Fu, Shengwei Rui, Zhong Yujun, Zhang Chen, Huiling Computer Vision and Pattern Recognition Second-order feature statistics are central to texture recognition, yet existing mechanisms exhibit a structural tension: bilinear pooling and Gram matrices capture global channel correlations but discard spatial structure, whereas self-attention models capture cross-position relations through weighted sums rather than explicit pairwise products. We propose TwistNet-2D, a lightweight module that computes local pairwise channel products under directional spatial displacement, jointly encoding where features co-occur and how they interact. The core component, Spiral-Twisted Channel Interaction (STCI), shifts one feature map along a prescribed direction before L2-normalized channel multiplication, capturing cross-position co-occurrence patterns that characterize structured and periodic textures. Four directional heads are aggregated through content-adaptive channel reweighting, and the result is injected via a sigmoid-gated residual path with near-zero initialization. TwistNet-2D adds only approximately 3.5% parameters and approximately 2% FLOPs over ResNet-18. To isolate the contribution of architectural inductive bias from that of transfer learning, all models in this study are trained from scratch without ImageNet pretraining. Under this protocol, TwistNet-2D consistently surpasses parameter-matched baselines and substantially larger ConvNeXt and Swin Transformer backbones across four texture and fine-grained recognition benchmarks, while the multi-head structure produces interpretable, orientation-selective representations that align with classical texture analysis. |
| title | TwistNet-2D: Learning Second-Order Channel Interactions via Spiral Twisting for Texture Recognition |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2602.07262 |