How Can We Help?

Search for answers or browse our knowledge base.

Documentation | Demos | Support

< All Topics
Print

How to configure Symantec OCR

Adding an OCR profile
  1. Go to System > Settings > OCR Engine Configuration.

  2. Click Add OCR Engine Configuration.

Configuring the OCR Engine
  1. Enter the Name of the profile.

  2. Enter an optional Description of the profile.

  3. Enter the OCR server hostname of the server where the OCR requests should be sent. It can be a single load balancer or an individual OCR Server.

  4. Enter the Port number of the port where requests should be sent. The default port is 8555.

  5. Enter the OCR Engine timeout (seconds) value. This setting defines how long before an OCR request should be timed out. The default timeout is 30.

    The timeout is how much time the request is allowed to spend inside the OCR Server, and does not include transit time or other delays.

    The timeout needs to be set with the other content timeout settings in the Advanced Settings. As with other content extraction operations, if the timeout is reached, the OCR component is skipped and the previously extracted content moves on to detection.

  6. Enter a value for Accuracy vs speed. By default, the OCR Server sets the value dynamically for each document. A Sensitive Image Recognition pre-classifier is present on the detection server. This pre-classifier inspects each image and determines if it is suitable for OCR content extraction (and form recognition). It then determines which preset is most appropriate. If you uncheck this box, you can select a preset to use for all images. You can choose from Accurate, Balanced, or Fast. This strategy can be appropriate for Discover scans, where accuracy is prioritized over time.

  7. In the Supported Languages section, select the candidate languages for OCR.

    You can select one or more languages, and then the OCR Server selects a language from that pool to use for the image. Symantec assumes that documents are primarily one language (for example, all French, or all English, as opposed to mixed English and French). The number of languages should be as small as possible. The more languages you select, the slower the processing speed.

    Even if a language is not selected, you may still get accurate text from that language. For example, you can select English and German and submit a mixed English-French image the OCR Server. It may choose English and still return some French text. The language selection affects which spell-check dictionary to use. It also affects the pool of characters to choose from if a character in the image is unclear.

  8. In the Languages and Dictionaries Specialized Dictionaries section, you enable supplemental spell checking for different businesses (legal, financial, medical) across different languages.

  9. In the Languages and Dictionaries Custom Dictionary section, specify the name of your custom dictionary file to aid recognition accuracy. For example, if certain proper nouns give the OCR Server difficulty, you can place them in this custom dictionary.

    Using Dictionaries and spell checking improves recognition results for low-quality scans and images (such as faxes). If the characters are crisp and clean, the engine has less uncertainty about what they might be, and the Dictionaries are less useful.

  10. The custom dictionary is a text file, with one entry per line. This text file must be placed in the dictionary directory of each server at c:SymantecDLPOCRProtectbin.

Assign a profile to a detection server
  1. Go to System > Servers and Detectors > Overview.

  2. Select a monitor.

  3. On the Server/Detector Detail page, click Configure.

  4. On the Configure Server page, click OCR Engine. In OCR Engine Configuration select the configuration that you want to use for the server.

  5. Click Save.

Was this article helpful?
5 out of 5 stars
5 Stars 100%
4 Stars 0%
3 Stars 0%
2 Stars 0%
1 Stars 0%
How can we improve this article?
Please submit the reason for your vote so that we can improve the article.
Table of Contents