Author Topic: Commodore Book collection reworking project  (Read 2339 times)

0 Members and 1 Guest are viewing this topic.

Offline RCtech

  • VIC 20 user
  • ****
  • Posts: 190
  • Country: de
  • Reputation: 2
    • View Profile
Commodore Book collection reworking project
« on: August 02, 2011, 05:16 PM »
A short time ago I got a lot of books in PDF format for commodore computers, as usually all scanned pages. Yesterday I found a very good OCR software (for Intel Macs) which creates text files from scanned PDF's. The quality of the text recognizion is rather good (also depending of the scan quality), but there are layout problems, especially if there are more columns or unusual font types.

Because I have a lot of books it would take too long to rework all the books, so I want to ask if some forum users will help me to correct the scanned text and rework the layout. I could create books with images from it using a DTP program. The rework has a lot of advantages, the files are smaller (a 420 page scanned book takes 27 MB, as rtf text file 724K) and of course they're searchable. So they're ideal for eBook-Readers and Tablet PC's.

This is my list of C128 Books only:

Osborne McGraw-Hill:
 
 C128 Programming Secrets
 Your Commodore 128

SAM's:

The Official Book For The Commodore 128
C128 Assembly Language Programming
Hardware books: C128 flat, 1571, 1581

Tab Books Inc.:

1001 Things to do with your Commodore 128
Advanced C128 Graphics And Sound Programming
C128 Data File Programming

Abacus:

C128 Internals
C128 Tips And Tricks
C128 Peeks And Pokes
C128 1571 Internals

Compute:

First Book Of C128
Second Book Of C128
C128 Machine Language For Beginners
C128 Programmer's Guide
Mapping The Commodore 128

Other:

C128 Programmers Reference Guide (Bantam/Commodore)
How to get the most out of Basic 8 (by Dave Krohne and Roger Silva)
The Black Book of C-128 (Robert H. & Dell Taylor)
The C128 Subroutine Library (Bantam)
Tips & Tricks for Commodore Computers (Windcrest)

If anyone is interested I can upload them, if you have other books, I'm interested too ;-)
« Last Edit: August 02, 2011, 08:30 PM by Naquaada »

Offline maraud

  • VIC 20 user
  • ****
  • Posts: 163
  • Country: 00
  • Reputation: 45
    • View Profile
Re: Commodore Book collection reworking project
« Reply #1 on: August 03, 2011, 05:28 AM »
I can chip in and do one.  Pick a small one - that way I can see just how much effort it takes.
Cheers!  -=Maraud=-
Be sure to "call" maraud.dynalias.com (port 6400)
AABBS 128 12.5c, Rear Admiral Hyperdrive (AKA LtK II)

Offline RCtech

  • VIC 20 user
  • ****
  • Posts: 190
  • Country: de
  • Reputation: 2
    • View Profile
Re: Commodore Book collection reworking project
« Reply #2 on: August 03, 2011, 05:53 AM »
I'll actually converting the books I have and will upload them all to my mediafire account. But you can skip the idea of 'a small one' - the books are from 250 to 750 pages. What the hell you can put in a 1571 internal book with 516 pages???

I attached some images so you can see what you could expect. The pictures are direct screenshots from my PDF viewer and the text editor, without any modification.
« Last Edit: August 03, 2011, 06:01 AM by Naquaada »

Offline RobertB

  • Forum god
  • ********
  • Posts: 3087
  • Country: us
  • Reputation: 451
    • View Profile
    • Fresno Commodore User Group
Re: Commodore Book collection reworking project
« Reply #3 on: August 03, 2011, 09:42 AM »
This is my list of C128 Books only...
     Are you sure these books are not already scanned/archived by David Haynes at http://www.bombjack.org/commodore ?

          Back in California,
          Robert Bernardo
          Fresno Commodore User Group
          http://videocam.net.au/fcug
« Last Edit: August 03, 2011, 09:46 AM by RobertB »

Offline RCtech

  • VIC 20 user
  • ****
  • Posts: 190
  • Country: de
  • Reputation: 2
    • View Profile
Re: Commodore Book collection reworking project
« Reply #4 on: August 03, 2011, 10:45 AM »
Great page, didn't know it. But it's the same, all books are scanned. My idea should be a rework of the books to real text form. This will make the books much smaller and they are searchable. The scan/text conversion is done by the OCR software rather well, but the files have to be corrected, layouted with graphics and recreated as text PDF. Or in ebook format for ebook readers. Another, much bigger effort would be to translate them from English to different languages...

For people who are scanning books or have scanned material and can't create a PDF from it there's another interesting format: The Comic Book Archives. They are extremely simple to create: Create a folder, put in the pictures and rename it with a counting number, f.e. page_001.jpg, page_002.jpg etc. There are batch renamers which can do this. Subfolders are also supported. After this create an archive from the main folder, and rename the suffix depending on the archive type: .cbr (comic book RAR) .cbz (comic book ZIP). That's all. There are more archive types available, but these are the most common. You can optimize files, if you have b/w scans batch convert them to GIF, that's a good lossless format for black and white text. There are many graphic files supported (often text document types, too), and you can mix them w/o problems, only the names have to be in the correct order.

Comic Book Archives can be read with Comic Book Readers, they're available for every actual operating system. I had these things on my Android Tablet, that was great. Comic Book Archives are sometimes faster as PDF's with scanned graphics.
« Last Edit: August 03, 2011, 11:11 AM by Naquaada »

Offline RobertB

  • Forum god
  • ********
  • Posts: 3087
  • Country: us
  • Reputation: 451
    • View Profile
    • Fresno Commodore User Group
Re: Commodore Book collection reworking project
« Reply #5 on: August 03, 2011, 11:46 AM »
Comic Book Archives can be read with Comic Book Readers, they're available for every actual operating system.
     Including GEOS, WiNGs, and Amiga OS 1.2 to 4.1 ?  ;)

          Back in California,
          Robert Bernardo
          Fresno Commodore User Group
          http://videocam.net.au/fcug

Offline Hydrophilic

  • 128D user
  • *******
  • Posts: 1430
  • Reputation: 233
  • Gender: Male
    • View Profile
    • H2Obsesson
Re: Commodore Book collection reworking project
« Reply #6 on: August 03, 2011, 07:27 PM »
That's a great idea you have, converting scanned books into real text.  I'm familiar with RTF and think that it is much better than PDF for most purposes... especially for an 8-bit system.  If there isn't an RTF viewer for the C128, then I could make a simple one. 

The commic book reader format is news to me.  But that just sounds like another version of Acrobat Reader (scanned images).  So I think your original idea is better.

I own a copy of Commodore 128 Programming Secrets, and 1571 Internals, so if you want to make one of those scanned RTFs available, I can take a look and see how much work it would take to make it look right.
I'm kupo for kupo nuts!

Offline RCtech

  • VIC 20 user
  • ****
  • Posts: 190
  • Country: de
  • Reputation: 2
    • View Profile
Re: Commodore Book collection reworking project
« Reply #7 on: August 03, 2011, 09:38 PM »
So, I'm finished converting and have the documents inluding the original PDF's here. The text files are in the Mac's .rtf format, it could be opened in Windows using WordPad or any other better document editor. It would be best if I could get them back as .rtf, I don't like using Windows .doc or OpenOffice's .odf format's. It's possible, but opening a .rtf takes one second, OpenOffice docs takes more than ten times so long.

These are the main goals for pre-preparing the layout:

- Activate page mode (not standard on Macs) and set text alignment to 'justified'
- Remove fragments from converted non-text graphics
- Correct errors created by the converion or if there were errors in the manual
- Set the layout according to the original PDF, especcially with mutlti-column parts or program code.
- Use Tabs for seperating, not a bunch of spaces
- Font styles are Helvetica (default in the RTF) for normal text and Courier for fixed-size text (especcially for code and tables)
- Important: Remove CR's in the normal text (excluding paragraphs), otherwise there's no word-wrapping or justified text layout possible.

I thought about not a complete re-creation the books with their various 80's-style fonts depending on the publisher, I think a simple, standardized layout for all books would be better. It would also be an idea to recreate the graphics in a modern stlye, best in black and white GIF-format.

@ Hydrophilic:

The idea of creation of a rtf reader/writer for the C128 is no bad idea, although it wouldn't working very good for this. For example, Compute's C128 Programmer's Guide needs 808K, that's tiny for a modern computer, but on a Commodore it wouldn't even fit on an 1581 disk. I don't think it would work on a Commodore, even geoWrite doesn't allow 750 pages.

The comic book archive format is - like it's name - mainly used for comics, which are scanned images. It's mostly a kind of batch picture viewer, but with the advantage that all files are stored in one file. Depending on the reader there are many functions, one-page mode, two-page mode, page reading direction, the one I use use has also a loupe function. So you could even use it for schematics. I've put manuals which I got only in images (CMD HD, JiffyDOS, DPaint III etc.) into a .cbr, it's much easier to use. Of course I use it for it's main purpose too, I've got over 22 GB comics ony my computer ;-)

     Including GEOS, WiNGs, and Amiga OS 1.2 to 4.1 ? 
I said for every modern actual (and common) operating system.
« Last Edit: August 03, 2011, 10:29 PM by Naquaada »

Offline Hydrophilic

  • 128D user
  • *******
  • Posts: 1430
  • Reputation: 233
  • Gender: Male
    • View Profile
    • H2Obsesson
Re: Commodore Book collection reworking project
« Reply #8 on: August 04, 2011, 12:43 AM »
Yeah the files would be too big for a 1581, but no problem for mass storage devices.  I would be willing to format the text in a few simple fonts like you suggest and remove any junk that the OCR thought was characters.  Formatting into columns hopefully won't take too much time.  But fixing the author's original mistakes... I don't have time for that!  Anyway if you want to send a link to one of those books I mentioned I'll see what I can do.  Or you could email one to me, since it is less than 1MB.

About DOC files... well MS has annoyed me again with the introduction of their compressed/new DCX files or whatever they're called.  I'm sure this is so everyone will have to buy the latest Microsoft software.  Same goes for some of their websites that won't work with anything less than IE6 (the rest of the net seems okay with IE5).  Of course it's not just MS.  Adobe likes making new incompatible versions of PDFs.

Anyway, Microsoft Word will open an RTF file if it is named with a DOC extension.  I had to actually do that for a friend.  She made up a job resume to send in to a company that was RTF format, but the company's website would only allow you to upload DOC or DCX files.  She didn't have Microsoft Word, so she couldn't really create a DOC file.  So I just renamed her x.RTF to x.DOC and sent it... just plain silly!
I'm kupo for kupo nuts!

Offline RCtech

  • VIC 20 user
  • ****
  • Posts: 190
  • Country: de
  • Reputation: 2
    • View Profile
Re: Commodore Book collection reworking project
« Reply #9 on: August 04, 2011, 01:51 AM »
Standard typos would be seen while correcting the layout, I don't think that's the problem. Difficult would be incorrect information or false program code.

Offline RobertB

  • Forum god
  • ********
  • Posts: 3087
  • Country: us
  • Reputation: 451
    • View Profile
    • Fresno Commodore User Group
Re: Commodore Book collection reworking project
« Reply #10 on: August 04, 2011, 02:10 AM »
For example, Compute's C128 Programmer's Guide needs 808K, that's tiny for a modern computer, but on a Commodore it wouldn't even fit on an 1581 disk.
     A Commodore 1581 disk drive has a formatted, low-density disk capacity of 880K, more than enough for the C128 PRG.

          Truly,
          Robert Bernardo
          Fresno Commodore User Group
          http://videocam.net.au/fcug

Offline RCtech

  • VIC 20 user
  • ****
  • Posts: 190
  • Country: de
  • Reputation: 2
    • View Profile
Re: Commodore Book collection reworking project
« Reply #11 on: August 04, 2011, 03:45 AM »
Nä, that's the capacity of an Amiga Disk. An unformatted DD disk has a raw capacity of 1 MB, the usable capacity is depending on the disk format. An Amiga FFS Disk has 880K, an Amiga OFS Disk has 856K, PC and Atari Disks have 720K. The 1581 has a formatted capacity of 800K and needs 10K for the root directory, so we get 790K. Every partition in the 1581 also need 10K for its directory. If you're using the DirectoryCache Filesystems (DC-OFS, DC-FFS) on an Amiga the directory buffer also needs extra space.

But anyway, large documents aren't made for disk use. Even with a 1581 in burst mode it woldn't be fun to use them, and searching inside the complete document would be difficult, too. It's no wonder that GEOS users are using a REU. A different method would be a kind of hypertext format, similar to the AmigaGuide documents. A specified document has always an main index with links, which allows to jump to a direct address in the document directly on disk. Inside the text could be links to other places. If the main index and/or a glossary is always stored in the RAM, access would be much easier. A connected REU or GeoRAM could be used as cache, so every opened part of the document will stored there for quick access.
« Last Edit: August 04, 2011, 03:51 AM by Naquaada »

Offline RobertB

  • Forum god
  • ********
  • Posts: 3087
  • Country: us
  • Reputation: 451
    • View Profile
    • Fresno Commodore User Group
Re: Commodore Book collection reworking project
« Reply #12 on: August 04, 2011, 08:20 AM »
Nä, that's the capacity of an Amiga Disk.
     And 1581 mechs were used in Amigas and vice versa.
Quote
...the usable capacity is depending on the disk format.
     True.  On a FD-2000 drive, you can format a DD disk to have 1.6 megs, and on a FD-4000 drive, you can format a DD disk to have 3.2 megs (though I have never tried the latter).

          Truly,
          Robert Bernardo
          Fresno Commodore User Grou
          http://videocam.net.au/fcug