You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Hanzi decomposition (Chinese character decomposition) | 汉字拆字
1
+
# Hanzi Chaizi | 汉字拆字
2
2
3
-
> 拆字是指將一文字,以筆畫、字形等基本組成單位分解成多個文字。
4
-
> The decomposition of characters refers to breaking down a single character into multiple characters based on its basic components, such as strokes and structural elements.
> Hanzi decomposition yields similar decomposition results for characters with similar structures.
8
+
Chinese character decomposition library for NLP and deep learning applications.
8
9
9
-
> 这种特性可以被深度学习模型用来作为字的特征之一:字形的特征。
10
-
> This feature can be used by deep learning models as one of the features of characters: the structural feature.
10
+
Decompose Chinese characters into basic structural components. Characters with similar structures yield similar decomposition results, making this useful as a glyph feature for deep learning models.
- Covers 20,000+ Chinese characters | 覆盖 20,000+ 汉字
17
+
- Zero third-party dependencies | 零第三方依赖
18
+
- Python 3.10+ support | 支持 Python 3.10+
19
+
20
+
## Installation | 安装
13
21
14
22
```bash
15
23
pip install hanzi_chaizi
16
24
```
17
25
18
-
## Usage
26
+
## Usage | 使用
19
27
20
28
```python
21
29
from hanzi_chaizi import HanziChaizi
@@ -26,21 +34,31 @@ result = hc.query('名')
26
34
print(result)
27
35
```
28
36
29
-
Output:
37
+
Output | 输出:
30
38
31
39
```text
32
40
['夕', '口']
33
41
```
34
42
35
-
## Development
43
+
## Notes | 说明
44
+
45
+
Some characters (e.g. 农, 表, 衣, 囊) contain `\uf7ee` in their decomposition results. This is a Unicode Private Use Area character representing the bottom part of 衣 (the downward strokes), which has no standard Unicode codepoint.
Data from [漢語拆字字典](https://github.com/kfcd/chaizi) (CC BY 3.0)
53
+
## Development | 开发
42
54
43
-
## Citation
55
+
See [develop.md](develop.md) for development guide. | 参见 [develop.md](develop.md) 开发指南。
56
+
57
+
## Credits | 致谢
58
+
59
+
Data from [漢語拆字字典](https://github.com/kfcd/chaizi) (CC BY 3.0) | 数据来自 [漢語拆字字典](https://github.com/kfcd/chaizi) (CC BY 3.0)
60
+
61
+
## Citation | 引用
44
62
45
63
```
46
64
@misc{kong2018hanzichaizi,
@@ -52,3 +70,5 @@ Data from [漢語拆字字典](https://github.com/kfcd/chaizi) (CC BY 3.0)
52
70
```
53
71
54
72
If the package is cited in books, seminars, and academic research papers, or used in company products, you are welcome (but not required) to email me about this. I'm glad to see the package being used and valuable to everyone.
0 commit comments