Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.12342 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913570124464128 |
|---|---|
| author | Xue, Jintang Wang, Yun-Cheng Wei, Chengwei Kuo, C. -C. Jay |
| author_facet | Xue, Jintang Wang, Yun-Cheng Wei, Chengwei Kuo, C. -C. Jay |
| contents | As a fundamental task in natural language processing, word embedding converts each word into a representation in a vector space. A challenge with word embedding is that as the vocabulary grows, the vector space's dimension increases, which can lead to a vast model size. Storing and processing word vectors are resource-demanding, especially for mobile edge-devices applications. This paper explores word embedding dimension reduction. To balance computational costs and performance, we propose an efficient and effective weakly-supervised feature selection method named WordFS. It has two variants, each utilizing novel criteria for feature selection. Experiments on various tasks (e.g., word and sentence similarity and binary and multi-class classification) indicate that the proposed WordFS model outperforms other dimension reduction methods at lower computational costs. We have released the code for reproducibility along with the paper. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2407_12342 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection Xue, Jintang Wang, Yun-Cheng Wei, Chengwei Kuo, C. -C. Jay Computation and Language As a fundamental task in natural language processing, word embedding converts each word into a representation in a vector space. A challenge with word embedding is that as the vocabulary grows, the vector space's dimension increases, which can lead to a vast model size. Storing and processing word vectors are resource-demanding, especially for mobile edge-devices applications. This paper explores word embedding dimension reduction. To balance computational costs and performance, we propose an efficient and effective weakly-supervised feature selection method named WordFS. It has two variants, each utilizing novel criteria for feature selection. Experiments on various tasks (e.g., word and sentence similarity and binary and multi-class classification) indicate that the proposed WordFS model outperforms other dimension reduction methods at lower computational costs. We have released the code for reproducibility along with the paper. |
| title | Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection |
| topic | Computation and Language |
| url | https://arxiv.org/abs/2407.12342 |