Skip to content

PP-OCRv5_server_rec,ocr模型转换为onnx格式后,识别没有内容 #1625

@akbaji

Description

@akbaji

请将下面信息填写完整,便于我们快速解决问题,谢谢!

问题描述
请在此处详细的描述报错信息
paddle2onnx --model_dir ./models/PP-OCRv5_server_rec/ \ --model_filename inference.json \ --params_filename inference.pdiparams \ --save_file OCRv5_server_rec.onnx
执行上述命令成功转化onnx格式的模型后
使用如下代码进行测试
`import cv2
import numpy as np
import onnxruntime as ort
import os

def test_onnx_model():
# 模型路径
model_path = "OCRv5_server_rec.onnx"

# 检查模型文件是否存在
if not os.path.exists(model_path):
    print(f"错误: 模型文件不存在 - {model_path}")
    return

# 创建推理会话
session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])

# 获取输入信息
input_name = session.get_inputs()[0].name
input_shape = session.get_inputs()[0].shape
print(f"模型输入形状: {input_shape}")

# 根据错误信息,高度是48,宽度是动态的,使用常见OCR宽度
height = 48  # 根据错误信息确定
width = 320  # 常见OCR模型宽度,可根据需要调整

# 加载并预处理测试图像
image_path = "./corrected_output.jpg"
if not os.path.exists(image_path):
    print(f"错误: 图像文件不存在 - {image_path}")
    return

image = cv2.imread(image_path)
if image is None:
    print(f"错误: 无法读取图像 - {image_path}")
    return

# 调整图像大小 - 使用正确的高度48
resized = cv2.resize(image, (width, height))
rgb_image = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB)
normalized = rgb_image.astype(np.float32) / 255.0
input_tensor = np.transpose(normalized, (2, 0, 1))
input_tensor = np.expand_dims(input_tensor, axis=0)

# 执行推理
outputs = session.run(None, {input_name: input_tensor})

# 输出结果信息
print(f"推理成功!")
print(f"输出数量: {list(outputs)}")

if name == "main":
test_onnx_model()
`

输出内容
模型输入形状: ['DynamicDimension.0', 3, 48, 'DynamicDimension.1'] 推理成功! 输出数量: [array([[[7.6080370e-01, 7.9060410e-06, 3.1949130e-03, ..., 1.3064701e-07, 1.3064701e-07, 1.6966098e-03], [8.2255471e-01, 4.1061026e-06, 8.4294996e-04, ..., 6.7163661e-08, 6.7163541e-08, 2.6663253e-03], [8.3662903e-01, 3.1375664e-06, 6.8985601e-04, ..., 1.4328862e-07, 1.4328862e-07, 7.7438941e-03], ..., [8.2054043e-01, 8.6402821e-07, 4.8858193e-03, ..., 1.5247034e-07, 1.5247019e-07, 4.2911787e-03], [8.1573290e-01, 2.2904621e-06, 1.1912972e-03, ..., 2.1580020e-07, 2.1580020e-07, 1.1523661e-02], [8.5430664e-01, 1.8579595e-06, 4.7853659e-04, ..., 1.2333194e-07, 1.2333194e-07, 1.3619601e-03]]], shape=(1, 40, 18385), dtype=float32)]

其他信息
我想要输出ocr识别的内容,请帮忙分析下我的逻辑哪里有错误

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions