From: Digest <deadmail>
To: "OS/2GenAu Digest"<deadmail>
Date: Thu, 31 Jul 2008 00:00:39 EST-10EDT,10,1,0,7200,4,1,0,7200,3600
Subject: [os2genau_digest] No. 1684
Reply-To: <deadmail>
X-List-Unsubscribe: www.os2site.com/list/

**************************************************
Wednesday 30 July 2008
 Number  1684
**************************************************

Subjects for today
 
1  Re:  Tesseract : Alan Duval <amoht at westnet dot com dot au>
2  Re:  Tesseract : Alan Duval <amoht at westnet dot com dot au>
3  Re:  Tesseract : Alan Duval <amoht at westnet dot com dot au>
4  Re:  Tesseract : Dennis Nolan <dennis at jeg-og dot com>

**= Email   1 ==========================**

Date:  Wed, 30 Jul 2008 09:56:49 +1100
From:  Alan Duval <amoht at westnet dot com dot au>
Subject:  Re:  Tesseract

Voytek Eymont wrote:
> <quote who="Alan Duval">
>
>   
>> I've been doing a lot of OCR work lately with Tesseract. I scan
>> documents into C:\OCR where I also have Tesseract.
>>     
>
> is Tesseract the 'best OCR' there is ?
>
> being somewhat curious, I've installed T 2.01 (following this email), and,
> tried it as follows:
>
> 0[roman][F:\ute\tesseract\usr\bin]tesseract.exe F:\FAXBANKSIA\FX006309.FAX
> test
> -l eng
>
>
> it took about 60 seconds, and, the results (small typeface letter) were
> somewhat not so great...
>
> by contrast, OCRing same file with PMfax took half the time, 24 secs, with
> somewhat better results:
>
> PMfax OCR (24 secs)
> ----
> 24 secs
>   
Hi Voytek,

I haven't used OCRing - didn't know it existed. Does it actually produce 
a text file and not a scanned copy?
I know that you use PMfax a lot, so do you scan docs into PMfax and then 
use OCRing to convert them to text?  I want to scan articles and convert 
them to text formats that I can store or send to friends. That means 
that they have to be converted to *.doc files or *.pdf files.
So far I have found that Tesseract does a good job but it has trouble 
with the ' and - characters. I have found its word recognition to be 
better than SimpleOCR which I have in WIN XP.
Tesseract works with *.tif files. I didn't think it would work with 
*.fax files.

Regards,

Alan
----------------------------------------------------------------------------------
 

**= Email   2 ==========================**

Date:  Wed, 30 Jul 2008 10:05:17 +1100
From:  Alan Duval <amoht at westnet dot com dot au>
Subject:  Re:  Tesseract

Voytek Eymont wrote:
> <quote who="Alan Duval">
>
>   
>> Program field:              C:\OCR\tesseract.exe
>> Parameters:                   image[N1].tif  [N2]
>> Working directory:     C:\OCR
>>
>>
>> Now when I click on the program object a window comes up asking for the
>> value of N1 which I type in and press enter. Then the same happens for N2
>> and the image is processed. With my word processor opened I can then open
>> the file and correct any mistakes.
>>
>> Kris suggests that a REXX program could be written to simplify further
>> the process. AS I don't know REXX I wonder whether someone could help?
>>     
>
>
> so, N1 is say '123' and, N2, corresponding text file like '123.txt'
> so, do you just want to process *.tiff into likewise named txt, is this
> the general idea ?
>   
Hi again,

Yes! I may scan an article and it will be saved as say " image123.tif "  
in C:\OCR in which folder I also have Tesseract installed.
I would then go to that folder via an OS/2 prompt and type:

"Tesseract  image123.tif  123"

That then produces a file 123.txt  which can be opened in a word 
processor for correction of any errors and for conversion to other formats.

Regards,

Alan
----------------------------------------------------------------------------------
 

**= Email   3 ==========================**

Date:  Wed, 30 Jul 2008 10:29:13 +1100
From:  Alan Duval <amoht at westnet dot com dot au>
Subject:  Re:  Tesseract

Voytek Eymont wrote:
> <quote who="Voytek Eymont">
>   
>
>   
>> <quote who="Alan Duval">
>>
>>
>>     
>>> Program field:              C:\OCR\tesseract.exe
>>> Parameters:                   image[N1].tif  [N2]
>>> Working directory:     C:\OCR
>>>
>>>
>>>
>>> Now when I click on the program object a window comes up asking for the
>>>  value of N1 which I type in and press enter. Then the same happens for
>>> N2
>>> and the image is processed. With my word processor opened I can then
>>> open the file and correct any mistakes.
>>>
>>> Kris suggests that a REXX program could be written to simplify further
>>> the process. AS I don't know REXX I wonder whether someone could help?
>>>       
>> so, N1 is say '123' and, N2, corresponding text file like '123.txt' so, do
>> you just want to process *.tiff into likewise named txt, is this the
>> general idea ?
>>     
>
> actually, wouldn't 'runfor' do it for you ?
>
> runfor  Ver 1.9 - Run a command - Mar 31 1998,  W. Kim
>   
Hi again Voytek,

It probably would but whatever I do it still seems that I would have to 
enter values. I would like to just double click on a saved  " 
image***.tif "  file and have it open as text in a word processor much 
like one can click on an attachment to an Email and have it opened in a 
word processor.
The best solution I have so far is that above. So if I scan an article 
and it is saved as  "image123.tif ." I then click on my program object 
and a window comes up requesting the values for N1.  I would type  "123 
" and then a second window would come up requesting the values for N2. I 
would again type " 123 " and the resulting  " 123.txt " file would be 
placed in  " C:\OCR ". With a word processor I can then open the file 
and process it further.
One can make a program or command that will work for a specified  *.tif 
file  but it has to work for any *.tif files  as I may be scanning many 
pages.

Regards,

Alan
----------------------------------------------------------------------------------
 

**= Email   4 ==========================**

Date:  Wed, 30 Jul 2008 16:28:45 +1000
From:  Dennis Nolan <dennis at jeg-og dot com>
Subject:  Re:  Tesseract

Alan

It could be done by associating tif files to your OCR program. 
Unfortunately this will associate all tif files with the OCR 
program.

A better way is to create a Program object on your desktop. 
There is a Program Object template in the Templates folder.
Make your OCR program the Object. From Memory you just need to 
drag it to the Object when creating it.
During the creatioin you need to specify the dropped file as the 
input parameter. The Help file that you can access in the 
program object explain how to do this.

If it is set up correctly you only need to drag and drop your 
tif files on the object for it to do its stuff.
There is a way to get it to open your word processor too, but 
it's been too long for me to clearly remember how I used to do it.
When set up you can select multiple files and drop them on the 
program object. A window for each dropped file will be created 
and closed when it is finished.
You can also specify which directory to write the output file to.

Regards
Dennis.


Alan Duval wrote:
> Voytek Eymont wrote:
>> <quote who="Voytek Eymont">
>>  
>>  
>>> <quote who="Alan Duval">
>>>
>>>
>>>    
>>>> Program field:              C:\OCR\tesseract.exe
>>>> Parameters:                   image[N1].tif  [N2]
>>>> Working directory:     C:\OCR
>>>>
>>>>
>>>>
>>>> Now when I click on the program object a window comes up asking for the
>>>>  value of N1 which I type in and press enter. Then the same happens for
>>>> N2
>>>> and the image is processed. With my word processor opened I can then
>>>> open the file and correct any mistakes.
>>>>
>>>> Kris suggests that a REXX program could be written to simplify further
>>>> the process. AS I don't know REXX I wonder whether someone could help?
>>>>       
>>> so, N1 is say '123' and, N2, corresponding text file like '123.txt' 
>>> so, do
>>> you just want to process *.tiff into likewise named txt, is this the
>>> general idea ?
>>>     
>>
>> actually, wouldn't 'runfor' do it for you ?
>>
>> runfor  Ver 1.9 - Run a command - Mar 31 1998,  W. Kim
>>   
> Hi again Voytek,
> 
> It probably would but whatever I do it still seems that I would have to 
> enter values. I would like to just double click on a saved  " 
> image***.tif "  file and have it open as text in a word processor much 
> like one can click on an attachment to an Email and have it opened in a 
> word processor.
> The best solution I have so far is that above. So if I scan an article 
> and it is saved as  "image123.tif ." I then click on my program object 
> and a window comes up requesting the values for N1.  I would type  "123 
> " and then a second window would come up requesting the values for N2. I 
> would again type " 123 " and the resulting  " 123.txt " file would be 
> placed in  " C:\OCR ". With a word processor I can then open the file 
> and process it further.
> One can make a program or command that will work for a specified  *.tif 
> file  but it has to work for any *.tif files  as I may be scanning many 
> pages.
> 
> Regards,
> 
> Alan
 
> 
> 

----------------------------------------------------------------------------------