I OCR Wine Labels

Community Topic(s):

Keywords: uses, OCR

Current Rating:
(0 ratings)

I have used OCR for some strange things, but this weekend I was posed with a new one.  My Fiancée’s mom owns one of the local wineries.  I often help with various administrative tasks, because as I’ve found, technology is not high on the priority list for most wineries.  The task I was given was to come up with a new digital version of the wineries defacto back label.  The original digital file has been sequestered by a printer, and a new one needed to be created for wine that was to be bottled in just a few days.

In California the words, and look of a wine label are very serious business.  Each label has to be approved by the state, so you want to make as few changes as possible to ensure approval.  However, the design of a wine label is also very important to people purchasing a bottle.  For that reason, I wanted to be as true to what had already been approved, and I did not want to have to type the label in by hand I decided, I should OCR it.  Here is what the label looked like after 600 DPI color scanning.

As you can see this is not an optimum image for OCR.  The vertical lines are fairly obtrusive against the dark background.  There is a stylized text at the beginning. The font and background are not significantly different. The fonts are small. And the dark background prevents me from OCRing anything without some planning.  So an OCR challenge it was.  This is how I approached it:

First I used the trick of inversion.  I inverted the image so that the text stood out more.  This trick can be used in a lot of scenarios.  One of the coolest is using it to improve OCR results by doing a two pass read on a document one inverted, one not, and reconciling the results.

After inverting the image I played with some contrast so that the letters were more complete, despeckled (though it really did not need it), and straightened the lines to compensate for a slight vertical bend.  Next was zoning.  Below is how the OCR engine automatically zoned (document analysis) the document.

With experience, you find out how auto zoning can be either really good or bad.  In this case, I knew that the automatic zoning would hinder the active pattern training during OCR (the second pass of OCR engines does after an untrained pass), and it would not give me a nice page layout after the fact by introducing some strange spacing. Instead of accepting the automatic zoning, as I knew it would not work well with the final export. I manually zoned with two zones, one for the title, and one for the body.

Doing this I could ensure better formatting on the body, and it would allow me to enable pattern training for the first zone to get that crazy font.  And it worked. After about 5 minutes of work I was able to produce a word document where I had to make only 5 edits to have a 100% accurate digital representation of the back label.

There are a lot of crazy things I OCR, email, Flash and Silverlight screen captures, programming code snippets, even YouTube videos.  Above and beyond the traditional scanning of paper documents and converting them to text, there are actually many other ways simple OCR technology can make you more efficient.

Now that the back label is converted it’s time to design a front.

Report

Rate Post

You need to log in to rate blog posts. Click here to login.

Add a Comment

You need to log in to post messages. Click here to login.

Comments

Natalie Lyda

These are great tips and tricks for getting the most out of conversions. Thank you for sharing. Ricoh Innovations is offering a free beta version of their online document conversion software that you may want to check out. It's available at: http://beta.rii.ricoh.com/betalabs/content/document-conversion

Report
Was this helpful? Yes No
Reply

Daniel O'Leary

Well done Chris, this tells a great story. This is a really good use of the technology, and could just as easily be applied to other documents that have similar properties. For things like historical documents, having this type of technology is key.
Report
Was this helpful? Yes No
Reply

This post and comment(s) reflect the personal perspectives of community members, and not necessarily those of their employers or of AIIM International