Understanding Evernote's recognition process and accessing recognition data
Evernote’s ability to index words and characters within images is one of its most oft-touted features. In this article, we’re going to describe how this process works and how third-party partners can access the recognition information generated by Evernote’s image processing.
How it works
From a user’s perspective, the process is quite seamless: after they synchronize a note with an image
Resource attached, the image is sent to a small farm of image processing servers. The server examines the image for hand- and type-written text and inserts the resulting data into the body of the note (in the form of
recoIndex elements). The augmented note is then delivered to the user the next time their device is synced.
After processing is complete, the note’s XML will contain one or more
recoIndex elements at the end of the note. These elements are validated against Evernote’s recoIndex DTD and describes the size and location of one or more possible occurrences of text within the Resource.
For example, when we export this note as
.enex we can view the
recoIndex element at the end of the document:
recoIndex element includes several attributes; feel free to peruse the recoIndex.dtd document linked above if you’re curious about the specifics. The child
item elements will contain the results of the image processing.
item child element of
recoIndex represents what Evernote’s image processing servers believe to be a discrete piece of text within the resource. The
item element contains four attributes:
x- The x coordinate of the upper-left corner of the item.
y- The y coordinate of the upper-left corner of the item.
w- The width of the item.
h- The height of the item.
These four values create a rectangle containing the text.
item element will, in turn, contain one or more
t elements. Each of these elements contains the text as evaluated by the image processing server as well as the weight attributed to the text. The
t elements are listed in descending order according to the
weight attribute. Put simply, the
weight attribute is a numeric representation of the Evernote image processor’s confidence in that particular interpretation of the image.
Accessing recognition data programmatically
As you can see, the API will return the raw XML of the
recoIndex element the corresponds to the single image in the example note linked above. If the note used in this example had contained more than one resource, the output would have been one
recoIndex element per indexed resource.
This data can also be retrieved quickly using
NoteStore.getResourceRecognition if you know the GUID of the
Resource in question. This example will produce the same output as the last snippet:
This data will change if a
Resource is changed (obviously), so make sure that you’ve got the latest version when putting it to your favorite creative use.
If you have any questions, don’t hesitate to get in touch with Evernote Developer Support.