March 1, 2026 8 min read
PDF OCR Accuracy Guide: DPI, PSM, OEM and Language Tuning
Get better OCR output quality for invoices, receipts, and scanned forms.
1. DPI matters most
For low-quality scans, increase dpi (200–300 range usually helps). Higher DPI improves recognition but increases processing time.
2. Tune PSM to document layout
- Single block invoices: lower segmentation complexity.
- Multi-column reports: use broader layout parsing.
3. Set correct language packs
Use lang matching your document language(s), e.g. eng or eng+hin, to reduce substitution errors.
curl -X POST https://pdfmunk.com/api/v1/pdf/ocr/parse \
-H "CLIENT-API-KEY: your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/scanned.pdf",
"pages": "1-3",
"lang": "eng",
"dpi": 240,
"psm": 3,
"oem": 3
}'4. Run post-processing
Normalize whitespace, fix line breaks, and validate known patterns (invoice numbers, totals, dates).
Conclusion
DPI + language + segmentation strategy delivers most OCR gains. Iterate on a small golden dataset before scaling. Start from PDF OCR API.