Discussion:
Tesseract 3.04 - Cygwin64 - Windows 8.1 - Can't open makebox
Schmitz, Marco
2015-09-21 09:03:38 UTC
Permalink
I am using Windows 8.1 and Cygwin64 in order to run Tesseract 3.04.

Running the following command:

tesseract arbeitsunfaehigkeit.hausarzt.exp0.jpg arbeitsunfaehigkeit batch.nochop makebox

results in the following output:

Tesseract Open Source OCR Engine v3.04.00 with Leptonica
read_params_file: Can't open makebox

And this is after I fixed the output:

Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Error opening data file C:\DEV\tesseract\Tesseract-OCR\tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

Using the following line in .bash_profile:

export TESSDATA_PREFIX="/cygdrive/c/DEV/cygwin64/usr/share/tessdata/"

It seems to be a Cygwin64 issue because using Windows command line (added C:\DEV\cygwin64\bin to PATH and setting TESSDATA_PREFIX to C:\DEV\cygwin64\usr\share\tessdata) it works nicely.

--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Marco Atzeri
2015-09-21 14:14:40 UTC
Permalink
Post by Schmitz, Marco
I am using Windows 8.1 and Cygwin64 in order to run Tesseract 3.04.
tesseract arbeitsunfaehigkeit.hausarzt.exp0.jpg arbeitsunfaehigkeit batch.nochop makebox
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
read_params_file: Can't open makebox
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Error opening data file C:\DEV\tesseract\Tesseract-OCR\tessdata/eng.traineddata
Are you defining TESSDATA_PREFIX ? Why ?
Post by Schmitz, Marco
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
export TESSDATA_PREFIX="/cygdrive/c/DEV/cygwin64/usr/share/tessdata/"
The default should be

TESSDATA_PREFIX="/usr/share/tessdata/"

Without defining TESSDATA_PREFIX, I have

$ tesseract.exe --list-langs
List of available languages (4):
deu
deu_frak
eng
osd

and the language files are in :

$ ls /usr/share/tessdata/
configs/ eng.cube.fold eng.cube.size
osd.traineddata
deu.traineddata eng.cube.lm eng.cube.word-freq pdf.ttf
deu_frak.traineddata eng.cube.nn eng.tesseract_cube.nn tessconfigs/
eng.cube.bigrams eng.cube.params eng.traineddata training/


Regards
Marco




--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Schmitz, Marco
2015-09-22 11:23:11 UTC
Permalink
Hi Marco,

without setting TESSDATA_PREFIX (neither Windows environment variables nor .bash_profile) I get:

$ tesseract --list-langs
Error opening data file C:\DEV\tesseract\Tesseract-OCR\tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.


This is my first problem, which I solved defining TESSDATA_PREFIX (in Windows environment). Now I get:

$ tesseract --list-langs
List of available languages (13):
arbeitsunfaehigkeit
deu
deu_frak
eng
fra
ita
ita_old
nld
osd
por
spa
spa_old
vie


Then I try this:

$ tesseract arbeitsunfaehigkeit.hausarzt.exp0.jpg arbeitsunfaehigkeit batch.nochop makebox
Tesseract Open Source OCR Engine v3.04.00 with Leptonica


Okay, but originally I wrote this issue because I tried to call it from a shell script. So, this is my box.sh:

#!/usr/bin/env bash
tesseract arbeitsunfaehigkeit.hausarzt.exp0.jpg arbeitsunfaehigkeit batch.nochop makebox

and calling it brings up the original error:

$ ./box.sh
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
read_params_file: Can't open makebox


Best regards,
Marco


-----Ursprüngliche Nachricht-----
Von: cygwin-***@cygwin.com [mailto:cygwin-***@cygwin.com] Im Auftrag von Marco Atzeri
Gesendet: Montag, 21. September 2015 16:15
An: ***@cygwin.com
Betreff: Re: Tesseract 3.04 - Cygwin64 - Windows 8.1 - Can't open makebox
Post by Schmitz, Marco
I am using Windows 8.1 and Cygwin64 in order to run Tesseract 3.04.
tesseract arbeitsunfaehigkeit.hausarzt.exp0.jpg arbeitsunfaehigkeit batch.nochop makebox
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
read_params_file: Can't open makebox
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Error opening data file C:\DEV\tesseract\Tesseract-OCR\tessdata/eng.traineddata
Are you defining TESSDATA_PREFIX ? Why ?
Post by Schmitz, Marco
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
export TESSDATA_PREFIX="/cygdrive/c/DEV/cygwin64/usr/share/tessdata/"
The default should be

TESSDATA_PREFIX="/usr/share/tessdata/"

Without defining TESSDATA_PREFIX, I have

$ tesseract.exe --list-langs
List of available languages (4):
deu
deu_frak
eng
osd

and the language files are in :

$ ls /usr/share/tessdata/
configs/ eng.cube.fold eng.cube.size
osd.traineddata
deu.traineddata eng.cube.lm eng.cube.word-freq pdf.ttf
deu_frak.traineddata eng.cube.nn eng.tesseract_cube.nn tessconfigs/
eng.cube.bigrams eng.cube.params eng.traineddata training/


Regards
Marco




--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple


--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Schmitz, Marco
2015-09-22 12:18:49 UTC
Permalink
Okay, my shell script problem "not finding makebox" was a line ending problem (CR+LF).

But how about TESSDATA_PREFIX ?

-----Ursprüngliche Nachricht-----
Von: cygwin-***@cygwin.com [mailto:cygwin-***@cygwin.com] Im Auftrag von Schmitz, Marco
Gesendet: Dienstag, 22. September 2015 13:23
An: Marco Atzeri <***@gmail.com>; ***@cygwin.com
Betreff: AW: Tesseract 3.04 - Cygwin64 - Windows 8.1 - Can't open makebox

Hi Marco,

without setting TESSDATA_PREFIX (neither Windows environment variables nor .bash_profile) I get:

$ tesseract --list-langs
Error opening data file C:\DEV\tesseract\Tesseract-OCR\tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.


This is my first problem, which I solved defining TESSDATA_PREFIX (in Windows environment). Now I get:

$ tesseract --list-langs
List of available languages (13):
arbeitsunfaehigkeit
deu
deu_frak
eng
fra
ita
ita_old
nld
osd
por
spa
spa_old
vie


Then I try this:

$ tesseract arbeitsunfaehigkeit.hausarzt.exp0.jpg arbeitsunfaehigkeit batch.nochop makebox Tesseract Open Source OCR Engine v3.04.00 with Leptonica


Okay, but originally I wrote this issue because I tried to call it from a shell script. So, this is my box.sh:

#!/usr/bin/env bash
tesseract arbeitsunfaehigkeit.hausarzt.exp0.jpg arbeitsunfaehigkeit batch.nochop makebox

and calling it brings up the original error:

$ ./box.sh
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
read_params_file: Can't open makebox


Best regards,
Marco


-----Ursprüngliche Nachricht-----
Von: cygwin-***@cygwin.com [mailto:cygwin-***@cygwin.com] Im Auftrag von Marco Atzeri
Gesendet: Montag, 21. September 2015 16:15
An: ***@cygwin.com
Betreff: Re: Tesseract 3.04 - Cygwin64 - Windows 8.1 - Can't open makebox
Post by Schmitz, Marco
I am using Windows 8.1 and Cygwin64 in order to run Tesseract 3.04.
tesseract arbeitsunfaehigkeit.hausarzt.exp0.jpg
arbeitsunfaehigkeit batch.nochop makebox
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
read_params_file: Can't open makebox
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Error opening data file
C:\DEV\tesseract\Tesseract-OCR\tessdata/eng.traineddata
Are you defining TESSDATA_PREFIX ? Why ?
Post by Schmitz, Marco
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
export TESSDATA_PREFIX="/cygdrive/c/DEV/cygwin64/usr/share/tessdata/"
The default should be

TESSDATA_PREFIX="/usr/share/tessdata/"

Without defining TESSDATA_PREFIX, I have

$ tesseract.exe --list-langs
List of available languages (4):
deu
deu_frak
eng
osd

and the language files are in :

$ ls /usr/share/tessdata/
configs/ eng.cube.fold eng.cube.size
osd.traineddata
deu.traineddata eng.cube.lm eng.cube.word-freq pdf.ttf
deu_frak.traineddata eng.cube.nn eng.tesseract_cube.nn tessconfigs/
eng.cube.bigrams eng.cube.params eng.traineddata training/


Regards
Marco




--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple


--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple


--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Marco Atzeri
2015-09-22 16:15:24 UTC
Permalink
Post by Schmitz, Marco
Okay, my shell script problem "not finding makebox" was a line ending problem (CR+LF).
But how about TESSDATA_PREFIX ?
-----Ursprüngliche Nachricht-----
Gesendet: Dienstag, 22. September 2015 13:23
Betreff: AW: Tesseract 3.04 - Cygwin64 - Windows 8.1 - Can't open makebox
Hi Marco,
$ tesseract --list-langs
Error opening data file C:\DEV\tesseract\Tesseract-OCR\tessdata/eng.traineddata
This file path is surely not coming from cygwin tesseract default.
have you another version around in the path ?

$ which tesseract
/usr/bin/tesseract

$ tesseract --version
tesseract 3.04.00
leptonica-1.72
libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.3.1) : libpng 1.6.17 :
libtiff 4.0.3 : zlib 1.2.8 : libwebp 0.4.3


As reported on https://www.cygwin.com/problems.html please

Run cygcheck -s -v -r > cygcheck.out and include that file as an
attachment in your report. Please do not compress or otherwise encode
the output. Just attach it as a straight text file so that it can be
easily viewed.

Regards
Marco

--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Schmitz, Marco
2015-10-05 09:09:27 UTC
Permalink
Hello Marco,

here come the deserved 2 attachments.

Without setting the Windows environment variable TESSDATA_PREFIX to C:\DEV\cygwin64\usr\share\tessdata I get:

$ tesseract --list-langs
Error opening data file C:\DEV\tesseract\Tesseract-OCR\tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

This I put into cygcheck-without-windows-environment-variable-set.out.

----------------------------------------------------------------------------------------------------------------------------------------------------

Then I set TESSDATA_PREFIX to C:\DEV\cygwin64\usr\share\tessdata and restart cygwin. I get:

$ tesseract --list-langs
List of available languages (13):
arbeitsunfaehigkeit
deu
deu_frak
eng
fra
ita
ita_old
nld
osd
por
spa
spa_old
vie

This I put into cygcheck-with-windows-environment-variable-set.out.


Gretings,
Marco Schmitz







-----Ursprüngliche Nachricht-----
Von: Marco Atzeri [mailto:***@gmail.com]
Gesendet: Dienstag, 22. September 2015 18:15
An: Schmitz, Marco <***@adesso-mobile.de>; ***@cygwin.com
Betreff: Re: AW: Tesseract 3.04 - Cygwin64 - Windows 8.1 - Can't open makebox
Post by Schmitz, Marco
Okay, my shell script problem "not finding makebox" was a line ending problem (CR+LF).
But how about TESSDATA_PREFIX ?
-----Ursprüngliche Nachricht-----
Gesendet: Dienstag, 22. September 2015 13:23
Betreff: AW: Tesseract 3.04 - Cygwin64 - Windows 8.1 - Can't open makebox
Hi Marco,
$ tesseract --list-langs
Error opening data file
C:\DEV\tesseract\Tesseract-OCR\tessdata/eng.traineddata
This file path is surely not coming from cygwin tesseract default.
have you another version around in the path ?

$ which tesseract
/usr/bin/tesseract

$ tesseract --version
tesseract 3.04.00
leptonica-1.72
libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.3.1) : libpng 1.6.17 :
libtiff 4.0.3 : zlib 1.2.8 : libwebp 0.4.3


As reported on https://www.cygwin.com/problems.html please

Run cygcheck -s -v -r > cygcheck.out and include that file as an attachment in your report. Please do not compress or otherwise encode the output. Just attach it as a straight text file so that it can be easily viewed.

Regards
Marco
Marco Atzeri
2015-10-06 04:43:49 UTC
Permalink
Post by Schmitz, Marco
Hello Marco,
here come the deserved 2 attachments.
$ tesseract --list-langs
Error opening data file C:\DEV\tesseract\Tesseract-OCR\tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
This I put into cygcheck-without-windows-environment-variable-set.out.
----------------------------------------------------------------------------------------------------------------------------------------------------
$ tesseract --list-langs
arbeitsunfaehigkeit
deu
deu_frak
eng
fra
ita
ita_old
nld
osd
por
spa
spa_old
vie
This I put into cygcheck-with-windows-environment-variable-set.out.
Gretings,
Marco Schmitz
Please note that TESSDATA_PREFIX is present in both case.
So in the first case it should be a residual of a previous Tesseract
(for windows) installation.


$ grep TESS cygcheck-with*
cygcheck-without-windows-environment-variable-set.out:TESSDATA_PREFIX =
'C:\DEV\tesseract\Tesseract-OCR\'
cygcheck-without-windows-environment-variable-set.out:_TESSDATA_PREFIX =
'C:\DEV\cygwin64\usr\share\tessdata'
cygcheck-with-windows-environment-variable-set.out:TESSDATA_PREFIX =
'C:\DEV\cygwin64\usr\share\tessdata'

I suggest to look on your environment variables: user or system specific.



--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Schmitz, Marco
2015-10-06 08:00:16 UTC
Permalink
Hi Marco,

you got me. I simply renamed the environment variable TESSDATA_PREFIX into _TESSDATA_PREFIX for that first situation instead of deleting it. But a (on purpopse) wrong named variable is like a non set one, right?

Greetings,
Marco

-----Ursprüngliche Nachricht-----
Von: Marco Atzeri [mailto:***@gmail.com]
Gesendet: Dienstag, 6. Oktober 2015 06:44
An: Schmitz, Marco <***@adesso-mobile.de>; ***@cygwin.com
Betreff: Re: AW: AW: Tesseract 3.04 - Cygwin64 - Windows 8.1 - Can't open makebox
Post by Schmitz, Marco
Hello Marco,
here come the deserved 2 attachments.
$ tesseract --list-langs
Error opening data file
C:\DEV\tesseract\Tesseract-OCR\tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
This I put into cygcheck-without-windows-environment-variable-set.out.
----------------------------------------------------------------------
----------------------------------------------------------------------
--------
$ tesseract --list-langs
arbeitsunfaehigkeit
deu
deu_frak
eng
fra
ita
ita_old
nld
osd
por
spa
spa_old
vie
This I put into cygcheck-with-windows-environment-variable-set.out.
Gretings,
Marco Schmitz
Please note that TESSDATA_PREFIX is present in both case.
So in the first case it should be a residual of a previous Tesseract (for windows) installation.


$ grep TESS cygcheck-with*
cygcheck-without-windows-environment-variable-set.out:TESSDATA_PREFIX =
'C:\DEV\tesseract\Tesseract-OCR\'
cygcheck-without-windows-environment-variable-set.out:_TESSDATA_PREFIX =
'C:\DEV\cygwin64\usr\share\tessdata'
cygcheck-with-windows-environment-variable-set.out:TESSDATA_PREFIX =
'C:\DEV\cygwin64\usr\share\tessdata'

I suggest to look on your environment variables: user or system specific.



--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Marco Atzeri
2015-10-06 10:01:50 UTC
Permalink
Post by Schmitz, Marco
Hi Marco,
you got me. I simply renamed the environment variable TESSDATA_PREFIX into _TESSDATA_PREFIX for that first situation instead of deleting it. But a (on purpopse) wrong named variable is like a non set one, right?
Greetings,
Marco
$ grep TESS cygcheck-with*
cygcheck-without-windows-environment-variable-set.out:TESSDATA_PREFIX =
'C:\DEV\tesseract\Tesseract-OCR\'
cygcheck-without-windows-environment-variable-set.out:_TESSDATA_PREFIX =
'C:\DEV\cygwin64\usr\share\tessdata'
you missed the point.
TESSDATA_PREFIX = 'C:\DEV\tesseract\Tesseract-OCR\'
is present before your renamed _TESSDATA_PREFIX
so something else is setting it for you.
Post by Schmitz, Marco
cygcheck-with-windows-environment-variable-set.out:TESSDATA_PREFIX =
'C:\DEV\cygwin64\usr\share\tessdata'
I suggest to look on your environment variables: user or system specific.
Regards
Marco

--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple

Brian Inglis
2015-09-22 16:43:30 UTC
Permalink
Post by Schmitz, Marco
I am using Windows 8.1 and Cygwin64 in order to run Tesseract 3.04.
tesseract arbeitsunfaehigkeit.hausarzt.exp0.jpg arbeitsunfaehigkeit batch.nochop makebox
looking at usage and your --list-langs output, try:

$ tesseract arbeitsunfaehigkeit.hausarzt.exp0.jpg batch.nochop -l
arbeitsunfaehigkeit makebox

assuming your output is batch.nochop, have lang data in arbeitsunfaehigkeit,
and your config file is makebox; if your output is makebox and config is
batch.nochop, try:

$ tesseract arbeitsunfaehigkeit.hausarzt.exp0.jpg makebox -l
arbeitsunfaehigkeit batch.nochop



--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Loading...