Embeddable OCR stable

Tesseract.js

Pure JavaScript OCR engine running in browser and Node.js

36.0K stars 100 contributors Since 2016
Website → GitHub

Pure JavaScript OCR engine running in browser and Node.js

License
Apache-2.0
Min RAM
128 MB
Min CPUs
1 core
Scaling
single_node
Complexity
beginner
Performance
medium
Self-hostable
K8s native
Offline
Pricing
fully free
Docs quality
good
Vendor lock-in
none

Use cases

  • Client-side OCR in web apps without server
  • Browser-based document scanning
  • Privacy-preserving text extraction

Anti-patterns / when NOT to use

  • Slower than native Tesseract
  • Large WASM binary
  • Accuracy same limitations as Tesseract

Replaces / alternatives to

  • Google Vision API (for browser)

Technical specs

Language
JavaScript
API type
SDK
Protocols
HTTP
Deployment
npm
SDKs
javascript

Community

GitHub stars 36.0K
Contributors 100
Commit frequency weekly
Plugin ecosystem none
Backing Naptha
Funding community

Release

Latest version
Last release
Since 2016

Best fit

Team size
solosmallmedium
Industries
general

Tags

  • ocr
  • javascript
  • browser
  • wasm
  • nodejs
  • 100-languages
  • client-side