PDF OCR Accuracy Guide: DPI, PSM, OEM and Language Tuning

March 1, 2026 8 min read

PDF OCR Accuracy Guide: DPI, PSM, OEM and Language Tuning

Get better OCR output quality for invoices, receipts, and scanned forms.

1. DPI matters most

For low-quality scans, increase dpi (200–300 range usually helps). Higher DPI improves recognition but increases processing time.

2. Tune PSM to document layout

Single block invoices: lower segmentation complexity.
Multi-column reports: use broader layout parsing.

3. Set correct language packs

Use lang matching your document language(s), e.g. eng or eng+hin, to reduce substitution errors.

curl -X POST https://pdfmunk.com/api/v1/pdf/ocr/parse \
  -H "CLIENT-API-KEY: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/scanned.pdf",
    "pages": "1-3",
    "lang": "eng",
    "dpi": 240,
    "psm": 3,
    "oem": 3
  }'

4. Run post-processing

Normalize whitespace, fix line breaks, and validate known patterns (invoice numbers, totals, dates).

Conclusion

DPI + language + segmentation strategy delivers most OCR gains. Iterate on a small golden dataset before scaling. Start from PDF OCR API.