Skip to content

Commit 6098b8a

Browse files
committed
docs: update readme; add non_decomposable.txt; update classifiers in pyproject.toml
1 parent 02a5b21 commit 6098b8a

File tree

3 files changed

+64
-15
lines changed

3 files changed

+64
-15
lines changed

README.md

Lines changed: 35 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,29 @@
1-
# Hanzi decomposition (Chinese character decomposition) | 汉字拆字
1+
# Hanzi Chaizi | 汉字拆字
22

3-
> 拆字是指將一文字,以筆畫、字形等基本組成單位分解成多個文字。
4-
> The decomposition of characters refers to breaking down a single character into multiple characters based on its basic components, such as strokes and structural elements.
3+
[![PyPI version](https://img.shields.io/pypi/v/hanzi_chaizi)](https://pypi.org/project/hanzi_chaizi/)
4+
[![Python](https://img.shields.io/pypi/pyversions/hanzi_chaizi)](https://pypi.org/project/hanzi_chaizi/)
5+
[![CI](https://github.com/howl-anderson/hanzi_chaizi/actions/workflows/python-app.yml/badge.svg)](https://github.com/howl-anderson/hanzi_chaizi/actions/workflows/python-app.yml)
6+
[![License](https://img.shields.io/github/license/howl-anderson/hanzi_chaizi)](https://github.com/howl-anderson/hanzi_chaizi/blob/master/LICENSE)
57

6-
> 汉字拆字让字型相似的字具有相似的拆解结果。
7-
> Hanzi decomposition yields similar decomposition results for characters with similar structures.
8+
Chinese character decomposition library for NLP and deep learning applications.
89

9-
> 这种特性可以被深度学习模型用来作为字的特征之一:字形的特征。
10-
> This feature can be used by deep learning models as one of the features of characters: the structural feature.
10+
Decompose Chinese characters into basic structural components. Characters with similar structures yield similar decomposition results, making this useful as a glyph feature for deep learning models.
1111

12-
## Installation
12+
汉字拆字:将汉字拆解为基础部件。字形相似的字会得到相似的拆解结果,可用于深度学习模型的字形特征提取。
13+
14+
## Features | 特性
15+
16+
- Covers 20,000+ Chinese characters | 覆盖 20,000+ 汉字
17+
- Zero third-party dependencies | 零第三方依赖
18+
- Python 3.10+ support | 支持 Python 3.10+
19+
20+
## Installation | 安装
1321

1422
```bash
1523
pip install hanzi_chaizi
1624
```
1725

18-
## Usage
26+
## Usage | 使用
1927

2028
```python
2129
from hanzi_chaizi import HanziChaizi
@@ -26,21 +34,31 @@ result = hc.query('名')
2634
print(result)
2735
```
2836

29-
Output:
37+
Output | 输出:
3038

3139
```text
3240
['夕', '口']
3341
```
3442

35-
## Development
43+
## Notes | 说明
44+
45+
Some characters (e.g. 农, 表, 衣, 囊) contain `\uf7ee` in their decomposition results. This is a Unicode Private Use Area character representing the bottom part of 衣 (the downward strokes), which has no standard Unicode codepoint.
46+
47+
部分汉字(如 农、表、衣、囊)的拆解结果中包含 `\uf7ee`。这是一个 Unicode 私有区域字符,用于表示"衣"的下半部分(撇捺结构),该部件在标准 Unicode 中没有独立编码。
3648

37-
See [develop.md](develop.md) for development guide.
49+
Some characters cannot be decomposed. See [non_decomposable.txt](non_decomposable.txt) for the list.
3850

39-
## Credits
51+
部分汉字无法被拆解,详见 [non_decomposable.txt](non_decomposable.txt)
4052

41-
Data from [漢語拆字字典](https://github.com/kfcd/chaizi) (CC BY 3.0)
53+
## Development | 开发
4254

43-
## Citation
55+
See [develop.md](develop.md) for development guide. | 参见 [develop.md](develop.md) 开发指南。
56+
57+
## Credits | 致谢
58+
59+
Data from [漢語拆字字典](https://github.com/kfcd/chaizi) (CC BY 3.0) | 数据来自 [漢語拆字字典](https://github.com/kfcd/chaizi) (CC BY 3.0)
60+
61+
## Citation | 引用
4462

4563
```
4664
@misc{kong2018hanzichaizi,
@@ -52,3 +70,5 @@ Data from [漢語拆字字典](https://github.com/kfcd/chaizi) (CC BY 3.0)
5270
```
5371

5472
If the package is cited in books, seminars, and academic research papers, or used in company products, you are welcome (but not required) to email me about this. I'm glad to see the package being used and valuable to everyone.
73+
74+
如果本项目被书籍、学术论文引用,或被公司产品使用,欢迎(但不强求)告知我。很高兴看到这个项目对大家有价值。

non_decomposable.txt

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
2+
3+
4+
5+
6+
7+
8+
9+
10+
11+
12+
13+
14+
15+
16+
17+
18+
19+
20+
21+
22+
23+

pyproject.toml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,12 @@ authors = [
1313
]
1414
keywords = ["chinese", "hanzi", "nlp", "character-decomposition", "deep-learning"]
1515
requires-python = ">=3.10"
16+
classifiers = [
17+
"Programming Language :: Python :: 3.10",
18+
"Programming Language :: Python :: 3.11",
19+
"Programming Language :: Python :: 3.12",
20+
"Programming Language :: Python :: 3.13",
21+
]
1622

1723
[project.urls]
1824
Homepage = "https://github.com/howl-anderson/hanzi_chaizi"

0 commit comments

Comments
 (0)