Page 3 of 3

Re: WANTED: Profibuch or ST Internals

Posted: Tue Jun 10, 2008 9:31 am
by ppera
muguk wrote:I concur .. now who's going to translate it (only joking!)


I can translate some parts - mostly those which I have in my older edition of Profibuch (and in head somehow). But then someone whit better English should correct translations...

Re: WANTED: Profibuch or ST Internals

Posted: Thu Jun 19, 2008 12:49 pm
by keili
A short overview for all interested readers:
Book scanned to page 800 in 400dpi, greyscale and saved as jpg with low compression. The size of all pictures is 465MB so far, which looks very big, but OCR will work somewhere between good and very good with this high quality.

There are 2 possible ways now. I can use Acrobat Pro for OCR, but someone has to look for errors. In addition i would create a pdf with reduced quality and size. The other way is downloading the large picture and use them for OCR.

So, who wants to help and how do you want to work?

Re: WANTED: Profibuch or ST Internals

Posted: Thu Jun 19, 2008 4:04 pm
by Mug UK
Upload the JPGs to somewhere - i.e. your FTP or a Rapidshare account (or similar) and I could then download them and OCR them?

Re: WANTED: Profibuch or ST Internals

Posted: Thu Jun 19, 2008 5:32 pm
by beastie
keili wrote:A short overview for all interested readers:
Book scanned to page 800 in 400dpi, greyscale and saved as jpg with low compression. The size of all pictures is 465MB so far, which looks very big, but OCR will work somewhere between good and very good with this high quality.

There are 2 possible ways now. I can use Acrobat Pro for OCR, but someone has to look for errors. In addition i would create a pdf with reduced quality and size. The other way is downloading the large picture and use them for OCR.

So, who wants to help and how do you want to work?


We're ready :-) I'm downloading the first batch right now: Profibuch_Part1.zip. Will give you a PM on how the work is going on soon.

Thanx,
B.

Re: WANTED: Profibuch or ST Internals

Posted: Thu Jun 19, 2008 5:36 pm
by CopperCAT
keili wrote:A short overview for all interested readers:
Book scanned to page 800 in 400dpi, greyscale and saved as jpg with low compression. The size of all pictures is 465MB so far, which looks very big, but OCR will work somewhere between good and very good with this high quality.

There are 2 possible ways now. I can use Acrobat Pro for OCR, but someone has to look for errors. In addition i would create a pdf with reduced quality and size. The other way is downloading the large picture and use them for OCR.

So, who wants to help and how do you want to work?


I have access to a PC with abby finereader. I've never used an OCR before, but after my exams are finished, I'm willing to give it a try :)

Re: WANTED: Profibuch or ST Internals

Posted: Thu Jun 19, 2008 6:26 pm
by keili
Wow, so much help :D . Let's organize it.

beastie works on part 1 now.
Mug, you download part 2 on Saturday and
CopperCAT, part 3 is also up on Saturday.

I'll pm you links.

Re: WANTED: Profibuch or ST Internals

Posted: Thu Jun 19, 2008 6:44 pm
by Mug UK
Not a problem - am out Saturday most of the day & evening due to a friend's daughter's 21st birthday party so will get the link on Sunday. If you want to swap links and give me the 'Sunday' link and the other person the Saturday link, then this will mean less delay?

Re: WANTED: Profibuch or ST Internals

Posted: Thu Jun 19, 2008 7:48 pm
by CopperCAT
keili wrote:Wow, so much help :D . Let's organize it.

beastie works on part 1 now.
Mug, you download part 2 on Saturday and
CopperCAT, part 3 is also up on Saturday.

I'll pm you links.


What format should the outputted text be? PDF? And what about the page size?
Otherwise, the merged part could become very inconsistent :)

Re: WANTED: Profibuch or ST Internals

Posted: Thu Jun 19, 2008 9:03 pm
by Mug UK
Stick with PDF all the way .. Adobe 8 can stitch multiple PDFs into the one file easily enough

Re: WANTED: Profibuch or ST Internals

Posted: Fri Jun 20, 2008 6:08 am
by keili
CopperCAT, use pdf for output. Page size is not a problem, because the parts can be merged and, if the size is different, can be printed again to pdf with the same size.

Re: WANTED: Profibuch or ST Internals

Posted: Tue Jun 24, 2008 1:02 pm
by keili
Update: I'm ready with scanning (uff, finally :D ) and in the meantime Mug and i have played around with Acrobat Professional 8. The first results look good, but there are still some problems to fix. Most of the work has been done in batch-modes with different programs, which ran in the background. Now comes the time consuming part. We'll keep you informed.

Re: WANTED: Profibuch or ST Internals

Posted: Tue Jun 24, 2008 4:40 pm
by Mug UK
It's been a bit of a slog but after a couple of run throughs in OCR mode, the book does come out really well. It's about 90% accurate and could easily be re-created into a .DOC file for the final editing process but we need to sort out other mistakes that are cropping up (words not being recognised by OCR, graphics being interpreted as text etc. etc.) but I'm well impressed by the first few attempts at something that's over 700 pages long in a 10Mb (or so) .PDF file!

Re: WANTED: Profibuch or ST Internals

Posted: Tue Jun 24, 2008 8:07 pm
by CopperCAT
I'm about halfway my part now :) After a while it gets easy because the wrongly detected parts are always the same. Especially the array-brackets seem to confuse FInereader.

Re: WANTED: Profibuch or ST Internals

Posted: Tue Jun 24, 2008 10:52 pm
by Mug UK
prt_block() - always the same error .. the '_' becomes a 'c' and the '()' always turn into a 'O'. Hopefully whatever script that Keilli can come up with for Paint Shop Pro X2, that will fine tune the pictures will be useful to the OCR side of things.

Re: WANTED: Profibuch or ST Internals

Posted: Thu Jul 10, 2008 7:29 pm
by CopperCAT
How's this project going actually? I PMed my part 4 to keili but haven't heard anything anymore.

Re: WANTED: Profibuch or ST Internals

Posted: Fri Jul 11, 2008 9:17 pm
by Desty
I've got a few ST books I should probably scan... only the Compute! assembly book is easyish (ring-bound) though. There must be some lazier (= easier & better) way :D

Re: WANTED: Profibuch or ST Internals

Posted: Mon Jul 14, 2008 5:55 pm
by Mark_G
Hello,

The profibuch : http://www.clive.nl/detail/12498/
Price 20,- €
It's in my opinion a reasonable price.
If you scan all the pages and then print them, it cost you more.

Found also on ebay-ger : 4,- € for the profibuch

Mark