ATAPY Software - OCR, Document Imaging, Document Management, Data Capture, Data Conversion
Services and Solutions for Document Management



Solution: the basics

The idea behind iOCR is to give up the illusive goal of recognizing freeforms automatically, to admit that the human operator is a principal participant of freeform input, and to concentrate on human aspects of the process. Because many forms mix automatically-locatable and not-automatically-locatable fields, iOCR is supposed to be taking every advantage of the flexiform technologies to lighten the workload for the operator. But whenever the flexiform engine feels uncertain, iOCR prefers to deliver the field to the operator rather than risk misdetection. The primary goal of iOCR is to provide highly efficient tools for interactive input — hence the i in iOCR.

Certain elements of this approach exist in modern OCR packages. For example, in ABBYY FineReader you can correct the borders of a block and then re-read it. However, this is just a small secondary feature and not the primary processing scenario as in iOCR

The freeform input process suggested by iOCR is as follows:

  1. Just as with ‘fixed forms’ and flexiforms, we must first describe our freeforms to the system. This includes telling the system which fields we expect to find and their parameters. See the cartoon 1... (JavaScript must be enabled; use the Play/Stop/Pause/<</>> buttons) See the cartoon 2...

    The existing version is not coupled with any flexiform engine. When this happens, form definition will become more complex as it will also include building the flexiform template.

  2. The main user interface is a window which contains the list of forms in the current batch, the image of the current form, and a set of input boxes which need to be filled out from the current form. See the cartoon...

  3. First, images in the current batch get processed automatically. In the future, this will be done using the hi-end flexiform technology of ABBYY Software House. The present version of iOCR does a very simple ‘flexiform-like’ analysis by recognizing large portions of the image as ‘plain text’ and locating fields based on substring identification. For example, if all of your freeforms contain the word ‘Waybill’ followed by the waybill number, iOCR will locate and read the number automatically. Interestingly enough, for many real-life forms this primitive technology yields a large percentage of the fields, with flexiform technology not being able to dramatically improve this percentage. See the cartoon...

  4. Following a command from the operator, the next image gets loaded into the main interface. Fields that have been located and recognized with sufficient confidence are already filled out. The cursor is placed at the fist low-confidence character of the first recognized field, and the image is auto-scrolled and highlighted correspondingly. The operator corrects the character(s) and presses Enter.

    If there are no low-confidence characters prior to the fist unfilled field, the curson is placed to the first unfilled field and the image is auto-scrolled to display the region in which the field is expected to be. (For example, in 90% of resumes, Education is located in the upper half of the page, in 5% in the lower half, and in 5% it is absent — so it makes sense to auto-scroll the image to the upper half of the page). The region to auto-scroll to is defined by the form template.

    The key feature of iOCR is what we call ‘mouse sweep recognition’ (‘mousweep’? :-). The idea is minimizing the number of mouse and keyboard actions which the operator needs to undertake before an undefined field is correctly entered into the system. Ideally, this only requires one quick sweep of the mouse. It can be a rubberband sweep (operator points out the top left and the bottom right corners of the field), a one-line sweep (makes it easy to enter small one-line fields), or a page-wide sweep (good for entering large portions of text taking up the entire page width, such as the message areas on fax cover pages).

    No extra actions is needed to launch recognition or to place the cursor to the first low-confidence character — this happens automatically as soon as the mouse button is depressed. Jumping to the next field can also take place automatically (works well for high-printing-quality documents), or when the right mouse button is clicked (this means that if few characters need be corrected, the operator practically never touches the keyboard).

  5. When all fields are finished, the system automatically opens the next image. Jumping to the next image can be also initiated manually. If not all mandatory fields are completed, or if the system has some other reasons for alert, a warning dialog appears.

  6. Adding and deleting pages in/from the batch, navigating through pages, and exporting recognition results is straightforward and conceptually it not much different from ABBYY FormReader.