Image Recognition

Understanding Evernote's recognition process and accessing recognition data

Evernote’s ability to index words and characters within images is one of its most oft-touted features. In this article, we’re going to describe how this process works and how third-party partners can access the recognition information generated by Evernote’s image processing.

How it works

From a user’s perspective, the process is quite seamless: after they synchronize a note with an image Resource attached, the image is sent to a small farm of image processing servers. The server examines the image for hand- and type-written text and inserts the resulting data into the body of the note (in the form of recoIndex elements). The augmented note is then delivered to the user the next time their device is synced.

Understanding `recoIndex`

After processing is complete, the note’s XML will contain one or more recoIndex elements at the end of the note. These elements are validated against Evernote’s recoIndex DTD and describes the size and location of one or more possible occurrences of text within the Resource.

For example, when we export this note as .enex we can view the recoIndex element at the end of the document:

The recoIndex element includes several attributes; feel free to peruse the recoIndex.dtd document linked above if you’re curious about the specifics. The child item elements will contain the results of the image processing.

`item` elements

Each item child element of recoIndex represents what Evernote’s image processing servers believe to be a discrete piece of text within the resource. The item element contains four attributes:

x - The x coordinate of the upper-left corner of the item.
y - The y coordinate of the upper-left corner of the item.
w - The width of the item.
h - The height of the item.

These four values create a rectangle containing the text.

The item element will, in turn, contain one or more t elements. Each of these elements contains the text as evaluated by the image processing server as well as the weight attributed to the text. The t elements are listed in descending order according to the weight attribute. Put simply, the weight attribute is a numeric representation of the Evernote image processor’s confidence in that particular interpretation of the image.

Accessing recognition data programmatically

Recogntion data can be retrieved from the Evernote Cloud API in two different ways: using NoteStore.getNote or NoteStore.getResourceRecognition. Here’s an example using getNote:

As you can see, the API will return the raw XML of the recoIndex element the corresponds to the single image in the example note linked above. If the note used in this example had contained more than one resource, the output would have been one recoIndex element per indexed resource.

This data can also be retrieved quickly using NoteStore.getResourceRecognition if you know the GUID of the Resource in question. This example will produce the same output as the last snippet:

Conclusion

This data will change if a Resource is changed (obviously), so make sure that you’ve got the latest version when putting it to your favorite creative use.

If you have any questions, don’t hesitate to get in touch with Evernote Developer Support.