tesseract-ocr 4.0.0

tesseract-ocr is an OCR engine originally developed by Hewlett Packard and now sponsored by Google. It is highly accurate and will read a binary, gray, or color image and output text.

Tags c++ c ocr library cli
License Apache
State stable

Recent Releases

4.0.030 Oct 2018 21:45 minor feature: Add deconfiguration for LSTM . . . add lstmdeconfig to distribution and installation process. . 4.0.0 Release.
4.0.0-beta.431 Jul 2018 15:25 minor feature: CID 1393540 (Explicit null dereferenced) . CID 1393244 and CID 1393244 (Uninitialized scalar variable). . CID 1393243 (Uninitialized scalar field). . . . CID 1393239 (Dereference null return value). . CID 1393238 (Dereference null return value). . CID 1393241 (Dereference null return value). . . . Replace ASSERT_HOST in genericvector.h. . Remove errcode.h from public API. . . . Remove public API file ndminx.h. . . . Clean usage of assert.h. . . . Replace string.h by standard C++ cstring. . . . Remove LSTM header files from public API. . Remove arch header files from public API. . . . Remove unneeded include statements for scanutils.h. . . . Remove recursive header. . Clean some include statements. . Remove memry.h from public API. . . . Remove empty tessbox.h. . Clean more include files and include statements. . . . coutln: Replace alloc_mem, free_mem by standard functions. . adaptions: Remove unneeded include statement. . qspline: Remove unneeded include statement. . strngs: Replace alloc_mem, free_mem by standard functions. . gap_map: Replace alloc_mem, free_mem by C++ new, delete. . pitsync1: Remove unneeded include statement. . qspline: Replace alloc_mem, free_mem by C++ new, delete. . makerow: Replace alloc_mem, free_mem by C++ new, delete, std::vector. . oldbasel: Replace alloc_mem, free_mem by C++ new, delete, std::vector. . pithsync: Replace alloc_mem, free_mem by C++ std::vector. . tordmain: Replace alloc_mem, free_mem by C++ std::vector. . Remove memry.cpp, memry.h. . Remove stderr.h and its include statements. . . . dotproductsse: include statements. . . . Update VERSION. . . . CID 1386094 (Unread field). . CID 1386098 (Dubious method used). . CID 1386104 (Dereference null return value). . CID 1386083 (Dereference null return value). . . . CID 1164746 (Big parameter passed by value). . CID 1157757 (Logically dead code). . CID 1158180 (Argument cannot be negative) and clean code a bit. . CID 1242849 (U
4.0.0-beta.220 Jun 2018 03:17 minor feature: Download the leptonica source from github . . . Add new line to a few error messages. . . . filenames in comments. . . . from pull of cleanups: clang tidied, reviewed, new, . . Added script-specific validation and normalization for virama-using s . . build broken by previous commits that added use of string in lo . . Deleted some dead LSTM code, making everything use the recoder. . Removed changes from last commit that didn't belong. . Move LSTM unicharset and recoder to traineddata with version string p . . type of bit values. . wrong data type in argument for sscanf. . Remove extra semicolons. . windows build. . . . regression of. PangoFontInfo: Remove unused method is_fraktur. . PangoFontInfo: Remove unused method is_monospace. . PangoFontInfo: Remove unused method is_smallcaps. . PangoFontInfo: Remove unused method is_bold. . PangoFontInfo: Remove unused method is_italic. . Use lept_free to free memory allocated by Leptonica. . regression of again!. . . . . . BestPix to always return the highest resolution available, even . . Removed unnecessary using statements and cleaned up google/non-google . . Important to RTL languages saves last space on each line, which w . . clang tidy on previous pull. . Add googletest submodule. . cmake: Add googletest. . googletest: Add dummy test. . Changed the way unicharsets are handled to allow support for the ch . . Rewrote the recoder to use an encoding based on wubi instead of radic . . Define std::max under VS2017 x64. . . . . . Part 2 of separating out the unicharset from the LSTM model, ing c . . Added ADAM optimizer, unless git screwed it up, cos there is no diff. . Removed errors introduced by git merge. . Added AVX2 and AVX512 detector. . Added convert to int and directory listing to combine_tessdata. .
4.0.0-beta.111 Mar 2018 19:05 minor feature: Remove unused method TessdataManager::OverwriteEntry . Remove unused method TessdataManager::LoadFileLater. . crash if output file could not be opened. . : cleanup. . : inside main() use return rather than exit. . . . . . Improve robustness of TessdataManager. . . automake: Enable all warnings and a warning. . . . genericvector: Add overloaded LoadDataFromFile. . Remove unneeded null pointer check. . . . Replace Standard C library header files by C++ header files. . Remove obsolete comments and unused code from ccutil/host.h. . . . EquationDetect: Remove unneeded new / delete operations. . . . and improve Dockerfile. . . . opencl: Remove more unused code. . . . README: Add Coverity badge. . . . Update README.md. . Reduce number of new / delete operations for class KDTreeSearch. . Reduce number of new / delete operations for class LanguageModel. . . . UNICHARSET: Add missing initialization. . . Optimize LSTM code for builds without OpenMP. . . . use correct name for Mac OS X, correct link to training wiki;. Update documentation for installation. . . . Reorganize Readme.md. . Update Template. . Add link to ` the guidelines for this repository`. . Add link to guidelines for this repository. . Add badges for Doxygen and Wiki documentation. . typo. . Update readme for 3.05.01. . StringRenderer::pen_color_: int 3 - double 3 . . Change Mac OS X - macOS. . PangoFontInfo: Remove unused method is_fraktur. . Remove strcasestr which is no longer needed. . . . . . . . . . . . PangoFontInfo: Remove unused method is_monospace. . PangoFontInfo: Remove unused method is_smallcaps. . PangoFontInfo: Remove unused method is_bold. . PangoFontInfo: Remove unused method is_italic. . Make less verbose. . . . . . opencl: Remove unused code. . opencl: some compiler warnings. . . . LSTMTrainer: Catch empty vectors. . Update from Leptonica 1.74.1 to 1.74.2. . Travis CI for Leptonica 1.74.2. . . . Remove local implementation of
3.05.0102 Jun 2017 06:39 major bugfix: Bugfix release for stable tesseract version
3.05.0017 Feb 2017 11:05 minor feature: Made some fine tuning to the hOCR output. Added TSV as another optional output format. ABI break introduced in 3.04.00 with the AnalyseLayout() method. text2image tool - Enable all OpenType ligatures available in a font. This feature requires Pango 1.38 or newer. Training tools - Replaced asserts with tprintf() and exit(1). Cygwin compatibility. Improved multipage tiff processing. Improved the embedded pdf font (pdf.ttf). Enable selection of OCR engine mode from command line. Changed tesseract command line parameter '-psm' to '--psm'. Added new C API for orientation and script detection, removed the old one. Increased minimum autoconf version to 2.59. Removed dead code. many compiler warning. memory and resource leaks. some with the 'Cube' OCR engine. some openCL. Added option to build Tesseract with CMake build system. Implemented CPPAN support for easy Windows building. . Added TSV as another optional output format. ABI break introduced in 3.04.00 with the AnalyseLayout() method. text2image tool - Enable all OpenType ligatures available in a font. This feature requires Pango 1.38 or newer. Training tools - Replaced asserts with tprintf() and exit(1). Cygwin compatibility. Improved multipage tiff processing. Improved the embedded pdf font (pdf.ttf). Enable selection of OCR engine mode from command line. Changed tesseract command line parameter '-psm' to '--psm'. Added new C API for orientation and script detection, removed the old one. Increased minimum autoconf version to 2.59. Removed dead code. many compiler warning. memory and resource leaks. some with the 'Cube' OCR engine. some openCL. Added option to build Tesseract with CMake build system. Implemented CPPAN support for easy Windows building.
4.00.00alpha16 Dec 2016 09:05 minor feature: Remove unneeded definition for NULL. Use different font list and exposures for "lat" language training. Add info for progress monitor, make it visible in doxygen doc; remove?. Add Junicode to neo-Latin fonts. Update ci scripts. Test release build on windows. Update appveyor.yml. Update appveyor.yml. Update appveyor.yml. Training should work now. Update.travis.yml. Update appveyor.yml. Update CMakeLists.txt. Update.travis.yml. Merge branch 'master' of github.com:tesseract-ocr/tesseract. Update CMakeLists.txt. Update leptonica version. Update.travis.yml. Update appveyor.yml. Merge branch 'master' of github.com-egorpugin:egorpugin/tesseract. Update CMakeLists.txt. Improve leptonica search. Make box training work. Compatibility with Leptonica 1.73. Add more include directories. Merge branch 'master' of github.com:tesseract-ocr/tesseract. Update README.md. Update README.md. Update README.md. Replace pdf.ttf with sharp2.ttf, keep name the same. Document hocr_font_info in config. INCOMPATIBLE to hOCR line height information -. varsize array for Microsoft compiler. Only generate dir for HOCR when needed -. Emit fewer "lang" attributes. Add LTR mixed direction test files. Update README.md. compiler warning (signed / unsigned mismatch). Adds char GetHOCRTSVText(int) as placeholder. Copy of char GetHOCRT?. Adds TessHOcrTsvRenderer class for rendering HOCR info in tsv format. Calls TessHOcrTsvRenderer if tessedit_create_hocrtsv is true. Adds hocrtsv file to configs folder. Adds hocrtsv to tessdata/configs/Makefile.am. Adds BoolParam tessedit_create_hocrtsv in class Tesseract. Render output in TSV format. Avoids HTML escaping. Cleanup TSV renderer. hocrtsv references in Makefile. Add inactivity timeout for icu download on windows. move new delete histogramAllChannels inside the #ifdef USE_OPENCL; fi?. Update INSTALL.GIT.md. improve tesseract.pc.in -. solve segfault for box.train;. update Release Notes. Don't display tesseract's banner when quiet
3.04.0117 Feb 2016 10:45 minor feature: Add check for opencl requirements. Rework opencl requirements (configure: error: conditional "AMDEP"?. Typo. GRAPHICS_DISABLED build. Strcasestr needed on Cygwin too. Libicui18n is only called libicuin on mingw, not cygwin. Implement build without cube (-DNO_CUBE_BUILD). Tessedit_create_txt 0 blocks box training. Memmory leak based on (https://code.google.com/p/tesse?. Remove empty header file secname.h. Replace CubeUtils::UTF8ToUTF32 in pdfrenderer. Enable pdfrender with NO_CUBE_BUILD. NO_CUBE_BUILD with reverting to ANDROID_BUILD in baseapi. Improve NO_CUBE_BUILD. in UTF-16BE conversion. Remove extraneous line feed. VC14 compiler. Enable OpenMP support. Turn off optimisation in Microsoft Visual Studio for TextlineProjecti?. Rename README to README.md -. Remove info about VS 2008. to compile tesseract on mac with clang. For OpenCL reported on Apple Mac. Still get -54 on Apple?. VS2010 build. OpenCL build on Mac. Configure.ac for OS X and -framework. Missing "allheaders.h" when compiling with --enable-opencl on OS X. Various clang compilation errors. Get OpenCL to compile on OS X. Configure.ac unconditionally enabling OpenCL. Add ULL to constants which overflow 32 bits. Simplify build and run of ScrollView. Tesstrain.sh: Only fall back to default Latin fonts if none were prov?. Tesstrain.sh: Only set FONTS if they weren't set on the command line. Tesstrain.sh: Initialise fontconfig even if Arial isn't available. Remove --bin_dir option from tesstrain.sh (should use PATH instead). Add --exposures option to tesstrain.sh. Use mktemp to create workspace directory. COPYING: typo found by codespell. Api: typos in comments (all found by codespell). Ccmain: typos in comments and strings. Typo. Ccstruct: typos in comments and strings. Ccutil: typos in comments and strings. Classify: typos in comments and strings. Cube: typos in comments. Cutil: typos in comments. Dict: typos in comments and strings. Doxyfile: typo in comment. Java: typos in comments and strings. Wordrec: ty
3.04.0020 Aug 2015 08:26 minor feature: