Image Recognition
Understanding Evernote's recognition process and accessing recognition data
Evernote’s ability to index words and characters within images is one of its most oft-touted features. In this article, we’re going to describe how this process works and how third-party partners can access the recognition information generated by Evernote’s image processing.
How it works
From a user’s perspective, the process is quite seamless: after they synchronize a note with an image Resource
attached, the image is sent to a small farm of image processing servers. The server examines the image for hand- and type-written text and inserts the resulting data into the body of the note (in the form of recoIndex
elements). The augmented note is then delivered to the user the next time their device is synced.
Understanding recoIndex
After processing is complete, the note’s XML will contain one or more recoIndex
elements at the end of the note. These elements are validated against Evernote’s recoIndex DTD and describes the size and location of one or more possible occurrences of text within the Resource.
For example, when we export this note as .enex
we can view the recoIndex
element at the end of the document:
The recoIndex
element includes several attributes; feel free to peruse the recoIndex.dtd document linked above if you’re curious about the specifics. The child item
elements will contain the results of the image processing.
item
elements
Each item
child element of recoIndex
represents what Evernote’s image processing servers believe to be a discrete piece of text within the resource. The item
element contains four attributes:
x
- The x coordinate of the upper-left corner of the item.y
- The y coordinate of the upper-left corner of the item.w
- The width of the item.h
- The height of the item.
These four values create a rectangle containing the text.
The item
element will, in turn, contain one or more t
elements. Each of these elements contains the text as evaluated by the image processing server as well as the weight attributed to the text. The t
elements are listed in descending order according to the weight
attribute. Put simply, the weight
attribute is a numeric representation of the Evernote image processor’s confidence in that particular interpretation of the image.
Accessing recognition data programmatically
Recogntion data can be retrieved from the Evernote Cloud API in two different ways: using NoteStore.getNote
or NoteStore.getResourceRecognition
. Here’s an example using getNote
:
As you can see, the API will return the raw XML of the recoIndex
element the corresponds to the single image in the example note linked above. If the note used in this example had contained more than one resource, the output would have been one recoIndex
element per indexed resource.
This data can also be retrieved quickly using NoteStore.getResourceRecognition
if you know the GUID of the Resource
in question. This example will produce the same output as the last snippet:
Conclusion
This data will change if a Resource
is changed (obviously), so make sure that you’ve got the latest version when putting it to your favorite creative use.
If you have any questions, don’t hesitate to get in touch with Evernote Developer Support.