The OCR API is a library API providing the user with recognition functions to process images and convert them to texts. Images, no matter whether from the phone camera or existing files, are used by these functions through their handles from the font and bitmap server. Currently, only 24-bit colored and 8-bit grayscale images are supported, and they have to be in the bitmap format.
There are two sets of recognition interfaces implemented in the OcrSrv library. One is for the applications which require document recognition from the whole image, and the procedure starts from the layout analysis by using a layout engine to analyze the image and divide it into several blocks which contains the title, subtitles, paragraphs and other text blocks of the document if possible, then a recognition engine will further process these blocks and convert them to Unicode strings respectively. The other set of functions are used for recognizing just part of an image. Hereby the API user has to inform the recognition engine about the exact position and extent where the conversion should be performed. In addition, the API user can also designate some special content types (such as phone numbers, web addresses and e-mails) to make the region recognition result even more accurate.
To carry out the recognition, the OCR API needs databases for the supported languages. The supported languages are English, Japanese, Simplified Chinese and Traditional Chinese. Note that those databases are not always shipped together with the phones (some shipped in other ways for example in some external memory cards). The OCR API provides some methods to tell exactly what are the ready language databases on your device.
In short, the API provides automatic layout analysis and recognition on images. A typical use of the API would be, for example, to input texts from the phone camera or to recognize and save personal information from business card images.
Use cases of the OCR API are illustrated in the following figure.
Figure 1: The use cases of the OCR API
There are five use cases here:
OCR API initialization
Recognition with the layout analysis
Region Recognition
Cancel operation
Release the OCR API
The API interface class structure consists of six interfaces. A static
class OCREngineFactory
create the OCR engine instance and
the client application shall be inherited from the MOCREngineObserver
to
get the layout analysis and recognition result asynchronously.
Figure 2: API Class Structure and Interaction
The interface MOCREngineRecognizeBlock
and MOCREngineLayoutRecognize
provide
the two sets of recognition API, and the MOCREngineBase
offers
some features, which are in common regardless the recognition types.
The detail function in these interface is visible in following figure.
Figure 3: Interface Class Functions