As great as the Web is for surveys, there are times paper is a better fit (and phones and in-person too of course). When paper is what you need, people often look to scanning as a "no data entry" solution. While you can get close with some surveys, as with all technology there's fine print.

When scanning works best

  • Almost all fixed scales, as anything written in will have to be manually typed by someone. Even software which recognizes block print in tick-mark boxes has a significant error rate and needs a QA pass.
  • Enough responses so the efficiency of scanning can off-set the initial setup of the form specification and time to scan a batch—sometimes just keying in forms is faster.
  • A form return process that won't fold, spindle, or mutilate the pages.

Now all of this assumes you've invested in a reasonable set of technology to process the forms, which will run you somewhere from $2,000-10,000+ depending on your volume. If you're relying on bargain basement software with poor logic for ambiguous marks you'll be spending all your time telling it whether that was a mark or a smudge, and if you're using a scanner you picked up at OfficeMax you may find yourself standing over it all day as you move stacks of paper in and out.

The scanning process

From start to finish, four components have to pass information back and forth, so making sure all the pieces are compatible and testing fully is a must:

Scanning stages

Some survey software embeds the scanning function in their code, while others will have recommended partners to make it all run smoothly. Whatever you do, don't print your forms before buying and testing your scanning setup! Or scan all 5,000 forms before testing the data format imports into your reporting software. Seriously, I've seen it tried and it's just technology roulette.

Scanner types and functions

The scanner itself is a simple-minded device. It will recognize light and dark on the page—everything else is software. There are two main types of scanners: mark readers and image scanners. Mark readers are the Scantron and NCS type forms we all grew up with, where the bubbles are in tidy columns and an edge of black rectangles marks the rows. While these still have their place and you may have one around your office, they tend to be more expensive and less flexible, so this article is about image scanners.

Image scanners use the same technology as the multi-purpose office scanners you see in your local office supply store. They take a picture of an entire page, including all the instructions, and save it just like you would a scan of a photo. Software then interprets the picture, and extracts data from the pixels.

The two main types of image scanners are flatbed and sheet feeder.

Flatbed scanners have a glass surface on which you place the document to be scanned. Some will have a sheet feeder option as well (also known as an ADF or automatic document feeder) which will pass pages across the scanning surface. Flatbeds tend to be general-purpose office scanners, with fairly high resolution, color, single sided scanning, and a modest page per minute scanning rate.

Pure sheet feeders, on the other hand, are designed to quickly process stacks of paper. They may only scan in shades of gray, and may be a lower resolution than the general purpose scanners because they’re being used to capture data rather than illustrations. The advantage of sheet feeders is their greater speed, greater sheet feeder capacity, the option to scan both sides of a sheet at once (duplex), and longer duty cycle.

In the sheet feeder realm, popular scanner manufacturers are Fujitsu, Canon, Panasonic and Ricoh—not the brands you're used to seeing in a store. For most surveyors handling batches of a few hundred to 10,000 forms, you should be good with a scanner from $1,000-4,000. If you don't already have a departmental scanner (often these make sense as a shared resource, like workgroup laser printers), find the software that fits you first and they can point you toward a compatible device.

Software essentials and extras

Since the scanner simply produces an image, what’s next? OMR (Optical Mark Recognition) or OCR (Optical Character Recognition) software converts the scanner’s file into a data file that you import into your database, reporting software, or survey program. While individual applications vary, you’ll typically “train” the software to understand your survey form. This is done by running a blank survey through the scanner. When the software brings up the page image, you then identify regions on the page which contain data fields (often by drawing boxes around a set of bubbles). For each of these fields you’ll specify the valid responses, such as single or multiple answer, values, and a field name. If the program supports it, you’ll also identify write-in or image fields, for which it will capture pictures of the responses for manual data entry.

Once the software has all this information, you start scanning completed surveys. It compares each respondent form to the template, ignoring all the pre-printed information, and converting the extra marks to data fields per your definition. If your survey has write-in fields, a data entry person will need to review each one and type the value (don’t worry, this is done quickly as a batch after all the forms are scanned). There’s also usually a cleaning pass for borderline marks, such as a respondent only partially filling in a bubble, or erasing a response. Then you’ll export the data file for analysis in your survey application, spreadsheet, database, or statistical software.

Most surveys are scanned with the less expensive OMR software because of the limitations of “character recognition.” At this time, software can only automatically recognize machine printed text, such as the pages in a book, or block printed handwriting spaced out in one character per box as you’ve seen on forms. Both of these methods can have significant error rates, so they require auditing and/or interpretation of characters the software identifies as questionable. And software which can recognize block printed script is still very expensive—sometimes costing ten times more than OMR software.

The software which comes bundled with your scanner is unlikely to process your surveys well (though the bar is constantly being raised). When shopping for survey processing OMR software, this is the basic feature set:

  • Reads forms designed by a range of layout programs, or at the least, the applications you care about
  • Recognizes marks from pen as well as pencil
  • Flags dubious marks for correction
  • Mechanism for data entry of written answers
  • Error correction is done in a batch, not holding up the scanning process
  • Reads multiple page surveys
  • Handles double-sided surveys
  • Variety of industry-standard data file formats

Additional niceties include:

  • Automated form training from your survey software, or streamlined data transfer to your survey software or statistics program
  • Captures text and numeric barcodes
  • Archive of entire form image
  • Recognition of block printing (again with audit or correction mechanism)
  • Recognizes the survey version as it scans, letting you mix surveys in a scanning batch
  • Routing of form validation/data entry to a set of data entry clerks (very high volume software)

There are a number of applications which may suit your needs, but one which I've worked with for many years as a solid fit and good value for surveyors is Remark Office OMR

