From: Digest <deadmail>
To: "OS/2GenAu Digest"<deadmail>
Date: Tue, 5 Aug 2008 00:00:37 EST-10EDT,10,1,0,7200,4,1,0,7200,3600
Subject: [os2genau_digest] No. 1688
Reply-To: <deadmail>
X-List-Unsubscribe: www.os2site.com/list/

**************************************************
Monday 04 August 2008
 Number  1688
**************************************************

Subjects for today
 
1  Re:  Tesseract : Alan Duval <amoht at westnet dot com dot au>
2  Re:  Tesseract : Voytek Eymont" <voytek at sbt dot net dot au>
3  Re:  Tesseract : Alan Duval <amoht at westnet dot com dot au>

**= Email   1 ==========================**

Date:  Mon, 04 Aug 2008 11:50:35 +1100
From:  Alan Duval <amoht at westnet dot com dot au>
Subject:  Re:  Tesseract

Peter Moylan wrote:
> Alan Duval wrote:
>> Dennis Nolan wrote:
>
>>> A better way is to create a Program object on your desktop. There is 
>>> a Program Object template in the Templates folder.
>>> Make your OCR program the Object. From Memory you just need to drag 
>>> it to the Object when creating it.
>>> During the creatioin you need to specify the dropped file as the 
>>> input parameter. The Help file that you can access in the program 
>>> object explain how to do this.
>>>
>>> If it is set up correctly you only need to drag and drop your tif 
>>> files on the object for it to do its stuff.
>>
>> I can drag and drop my tif files on the program object that I created 
>> and it will process it and save it to C:\OCR.
>>
>>> There is a way to get it to open your word processor too, but it's 
>>> been too long for me to clearly remember how I used to do it.
>>
>> That's what I now want but can't see how to do it. I can drag and 
>> drop the txt file that has been created on to the word processor 
>> object and it opens in the word processor but I would like that to 
>> happen without doing this second drag and drop.
>
> I haven't tried this (I don't even have Tesseract), but I think what 
> you need to do is write a simple Rexx script that does three things in 
> sequence:
>   - parse its program argument to decompose the file name, so as to
>     construct the arguments for the next two steps
>   - call Tesseract
>   - call the word processor
>
> For someone who is not familiar with Rexx (I don't know whether you 
> are), the only hard part is the parsing of the file name, and even 
> that is easy once you look up the Rexx manual because Rexx has an 
> explicit PARSE command. The rest is just like writing a batch file.
>
> Suppose this script is called "script.cmd". Then you can create a 
> program object which has the program name specified as "CMD.EXE" 
> (without the quotes), and the parameter string "/C SCRIPT.CMD" (also 
> without the quotes). The working directory should be the directory 
> where script.cmd lives. Alternatively, you can give a full path 
> specification for script.cmd, and set the working directory to be 
> where you want your data files to live. That part is not particularly 
> important, because you can always include CD (i.e. change directory) 
> commands in your script, or use full path names for every file that 
> has to be mentioned.
>
> On further thought, it's possible that the parameter string in the 
> program object should be something like "/C SCRIPT.CMD %1", or 
> something similar, to ensure that the parameter is passed to the 
> script. I can't check that now because I don't have OS/2 at work.
>
Hi Peter,

After much trial and error I have finally made a REXX command to both 
OCR a tif file and then open the resulting txt file in StarOffice.
I named the command  tes.cmd and have written it as follows:

/*program for running tesseract*/
SAY 'number'
PULL nbr
parse arg program
program 'C:\ocr\tes  c:\ocr\'nbr'.tif' nbr
parse arg program
program 'D:\office51\soffice C:\'nbr'.txt'
exit

I guess it is poor programming, but it works. At a command prompt I type 
"tes" and it asks for the number.
If I have a scanned file in C:\ocr  e.g.  001.tif, I just have to type 
001 and tesseract converts it to 001.txt in C:\ and then Staroffice is 
opened displaying the text file.

Thanks for the help,

Alan
----------------------------------------------------------------------------------
 

**= Email   2 ==========================**

Date:  Mon, 4 Aug 2008 21:14:03 +1000 (EST)
From:  "Voytek Eymont" <voytek at sbt dot net dot au>
Subject:  Re:  Tesseract

<quote who="Alan Duval">
> Voytek Eymont wrote:

>> I just made a quick hack of a script I've used for something (like
>> archiving log files):
>>
>> it looks through one or more of predefined directories, and, ocrs any
>> predefined file types (like TIF and FAX)

>> ----
>> /* ocr.cmd  */



> However i'm afraid it's too complicated for me as I am not familiar with
> REXX.

you don't need to know REXX to use REXX...
just save the script as 'ocr.cmd' somewhere on a path;
edit the user defines sections like:

/* user defines below */

extlist= 'tif fax'	/* list all target extensions to process */
dirlist= '\scanner \scanner\out' /* list all target dirs to
process */
logdir= '\logs\' /* target dir for logs, NEEDS trailing '\'

what type of files do you wish to process ?
current default is to process *.tif and *.fax

to ALSO process 'jpg' extensions, edit like:
extlist= 'tif fax jpg'

where are the source files ?
dirlist= '\scanner \scanner\out'

to ALSO process all faxes in \pmfax add \pmfax to the list like:
dirlist= '\scanner \scanner\out \pmfax'


current default is to process all above file types as present in
\scanner
\scanner\out

dirs on current drive

where do you wish to log what's done ?
current default is
\logs\

as well, supply path to the tessarc executable:

ocrdir= '\ute\tesseract\usr\bin'        /* where is t exe ? */
'SET TESSDATA_PREFIX=F:/ute/tesseract/usr/share/' /* t's libs are there */

/* user defines end */

when you run it, it will find ALL defined file types in ALL defined
sirectories, and, ocr them

so, process/scann whatever, as many as you care, when scanning is
completed, run the script, the script will find all files with defined
extension, and, ocr them

I think...

I tried with some fax files, and, seems to have worked


-- 
Voytek


----------------------------------------------------------------------------------
 
**= Email   3 ==========================**

Date:  Mon, 04 Aug 2008 21:48:30 +1100
From:  Alan Duval <amoht at westnet dot com dot au>
Subject:  Re:  Tesseract

Voytek Eymont wrote:
> <quote who="Alan Duval">
>   
>> Voytek Eymont wrote:
>>     
>
>   
>>> I just made a quick hack of a script I've used for something (like
>>> archiving log files):
>>>
>>> it looks through one or more of predefined directories, and, ocrs any
>>> predefined file types (like TIF and FAX)
>>>       
>
>   
>>> ----
>>> /* ocr.cmd  */
>>>       
>
>
>
>   
>> However i'm afraid it's too complicated for me as I am not familiar with
>> REXX.
>>     
>
> you don't need to know REXX to use REXX...
> just save the script as 'ocr.cmd' somewhere on a path;
> edit the user defines sections like:
>
> /* user defines below */
>
> extlist= 'tif fax'	/* list all target extensions to process */
> dirlist= '\scanner \scanner\out' /* list all target dirs to
> process */
> logdir= '\logs\' /* target dir for logs, NEEDS trailing '\'
>
> what type of files do you wish to process ?
> current default is to process *.tif and *.fax
>
> to ALSO process 'jpg' extensions, edit like:
> extlist= 'tif fax jpg'
>
> where are the source files ?
> dirlist= '\scanner \scanner\out'
>
> to ALSO process all faxes in \pmfax add \pmfax to the list like:
> dirlist= '\scanner \scanner\out \pmfax'
>
>
> current default is to process all above file types as present in
> \scanner
> \scanner\out
>
> dirs on current drive
>
> where do you wish to log what's done ?
> current default is
> \logs\
>
> as well, supply path to the tessarc executable:
>
> ocrdir= '\ute\tesseract\usr\bin'        /* where is t exe ? */
> 'SET TESSDATA_PREFIX=F:/ute/tesseract/usr/share/' /* t's libs are there */
>
> /* user defines end */
>
> when you run it, it will find ALL defined file types in ALL defined
> sirectories, and, ocr them
>
> so, process/scann whatever, as many as you care, when scanning is
> completed, run the script, the script will find all files with defined
> extension, and, ocr them
>
> I think...
>
> I tried with some fax files, and, seems to have worked
>   
Hi Voytek.
I've been able to make a REXX program to do what I want as noted in my 
other Emails today. It's not good programming but it works well.

Thanks for all your help. No doubt I'll use your info for something.

Regards,

Alan
----------------------------------------------------------------------------------