Text Recognition
Application Introduction
This project is based on the SizectorS_SDK to develop a character recognition application, which retrieves point cloud data from offline mpdat files in a Windows operating system environment.
Application Process
Loading Offline Data
Load data using the Load function in MPSizectorS_Utils, and convert the data format to the MPSizectorS_DataFrameFloat3DStruct type floating point matrix for subsequent processing.
MPSizectorS_DataFrameUndefinedStruct data;
bool dev = MPSizectorS_Utils::Load(&data, "E:/SizectorS_SDK_DigitRec/test/SizectorS_DataExport4.mpdat");
MPSizectorS_DataFrameFloat3DStruct float3DData = MPSizectorS_Utils::ConvertToFloat3DFrame(data);
Retrieving Image Information
This step is to obtain the scale and depth information of the image by calling MPSizectorS_Utils::GetAutoColorDepthRange_Float3D() to return a MPSizectorS_ColorDepthRangeStruct structure type data, which facilitates subsequent processing.
MPSizectorS_ColorDepthRangeStruct colorDepth;
int xPixResolution = data.FrameInfo.DataInfo.XPixResolution;
int yPixResolution = data.FrameInfo.DataInfo.YPixResolution;
MPSizectorS_Utils::GetAutoColorDepthRange_Float3D(&colorDepth, data.FrameInfo, float3DData.Data);
Displaying Images
Convert the floating point matrix to a grayscale image using the GetGrayBitmap function and create a CV_8UC1 type Mat object inputImg to display the image. Additionally, convert the floating point matrix to a color image using the GetColorDepthBitmap function and create a CV_8UC3 type Mat object inputRGBImg to display the image.
unsigned char* bitmapBuffer = new unsigned char[xPixResolution * yPixResolution];
unsigned char* redBuffer = new unsigned char[xPixResolution * yPixResolution];
unsigned char* greenBuffer = new unsigned char[xPixResolution * yPixResolution];
unsigned char* blueBuffer = new unsigned char[xPixResolution * yPixResolution];
MPSizectorS_Utils::GetGrayBitmap(&data, bitmapBuffer);
cv::Mat inputImg(yPixResolution, xPixResolution, CV_8UC1, bitmapBuffer);
MPSizectorS_Utils::GetColorDepthBitmap(&data, redBuffer, greenBuffer, blueBuffer, colorDepth.Max + 0.1, colorDepth.Min - 0.1);
// Create a color image using the three arrays
cv::Mat inputRGBImg(yPixResolution, xPixResolution, CV_8UC3);
for (int i = 0; i < yPixResolution; ++i) {
for (int j = 0; j < xPixResolution; ++j) {
int index = i * xPixResolution + j;
inputRGBImg.at<cv::Vec3b>(i, j) = cv::Vec3b(
blueBuffer[index], greenBuffer[index], redBuffer[index]);
}
}
cv::namedWindow("inputImg", cv::WINDOW_NORMAL);
imshow("inputImg", inputImg);
cv::resizeWindow("inputImg", 800, 600);
cv::namedWindow("inputRGBImg", cv::WINDOW_NORMAL);
imshow("inputRGBImg", inputRGBImg);
cv::resizeWindow("inputRGBImg", 800, 600);
cv::waitKey();
Establishing a Reference Plane
Select three coordinate points (xPoint1, yPoint1), (xPoint2, yPoint2), (xPoint3, yPoint3) on the input image and establish a plane using these three points.
unsigned int xPoint1 = 126;
unsigned int yPoint1 = 439;
unsigned int xPoint2 = 117;
unsigned int yPoint2 = 579;
unsigned int xPoint3 = 818;
unsigned int yPoint3 = 514;
MPSizectorS_DataPointFloat3DStruct point1 = {};
MPSizectorS_DataPointFloat3DStruct point2 = {};
MPSizectorS_DataPointFloat3DStruct point3 = {};
traverseNineGrid(&point1, xPoint1, yPoint1, float3DData);
traverseNineGrid(&point2, xPoint2, yPoint2, float3DData);
traverseNineGrid(&point3, xPoint3, yPoint3, float3DData);
Point a = { point1.X, point1.Y, point1.Z };
Point b = { point2.X, point2.Y, point2.Z };
Point c = { point3.X, point3.Y, point3.Z };
Point normal = plane(a, b, c);
Image Binarization
Sequentially select points on the input image, use their distance to the plane as the brightness value of that point, and compare this brightness value with a set threshold. Points that are above the threshold are set to white, while others are set to black. After selection, save the binarized image to disk.
// Create an image of the same size as the display area
cv::Mat image(creatImageHeight, creatImageWidth, CV_8UC1);
// Modify the value of each pixel, those above the threshold are white, others are black
for (int i = 0; i < creatImageHeight; i++)
{
for (int j = 0; j < creatImageWidth; j++)
{
point4 = {};
flag = traverseNineGrid(&point4, j + xStart, i + yStart, float3DData);
if (flag) {
Point p = { point4.X, point4.Y, point4.Z };
float distance = distance_to_plane(normal, p, a, b, c);
if (distance >= binThreshold) {
image.at<uchar>(i, j) = 255;
continue;
}
}
image.at<uchar>(i, j) = 0;
}
}
Saving and Displaying Images
cv::imwrite("ResultImg.png", image);
cv::imshow("ResultImg", image);
cv::waitKey();
OCR Recognition
Perform OCR recognition using the Tesseract library, using the binarized image as the input image for character recognition. Save the recognition result in the recognizedText variable and output the recognition result.
/// Call recognition API
tesseract::TessBaseAPI* api = new tesseract::TessBaseAPI();
if (api->Init(NULL, "eng")) {
fprintf(stderr, "Could not initialize tesseract.\n");
exit(1);
}
Pix* img = pixRead("ResultImg.png");
api->SetImage(img);
char* recognizedText = api->GetUTF8Text();
std::string text(recognizedText);
std::cout << "Recognition result: " << text << std::endl;
Terminating the Call
Close the API call and free pointer addresses.
api->End();
delete[] bitmapBuffer;
delete[] redBuffer;
delete[] greenBuffer;
delete[] blueBuffer;
delete[] recognizedText;
pixDestroy(&img);
Conclusion
-
This application primarily utilizes MPSizectorS_API and MPSizectorS_Utils libraries for data processing and conversion, the OpenCV library for image processing and visualization, and the Tesseract library for character recognition.
-
MPSizectorS_API: Provides support for the MPSizectorS range finder application interface.
-
MPSizectorS_Utils: Provides support for SizectorS_SDK_DigitRec, handling data conversion and preprocessing.
-
OpenCV: Provides image processing capabilities, including Mat objects, imread and imwrite functions, imshow function, etc.
-
Tesseract: Provides OCR recognition capabilities.