Hathi Download Helper

Assistant to download books from Hathitrust.org.


Quickstart

  1. Copy the book URL into the URL field of Hathi Download Helper and press the 'get Book info' button.
  2. Select the source file format (pdf or images), set the destination folder and press the 'start download' button.
  3. Convert and merge the downloaded files to achieve a pdf book by pressing the 'create pdf' buttonm.


Content

  1. Quickstart
  2. User interface elements
    1. Menu Bar
      1. Page Setup
      2. Proxy Setup
      3. GUI setup
        1. Style setup
        2. Font setup
        3. Default
    2. Panels
      1. Book info panel
      2. Download settings
      3. PDF merge & conversion
  3. Hathi Download Helper namespace
  4. Hathi Download Helper as "PDF merger"
  5. Hathi Download Helper as "Image to PDF" converter
  6. FAQ
    1. What does the name "Hathi Download Helper" mean?
  7. ERROR FIXING
    1. "Error: unable to execute 'pdftk' application."

User interface elements

Menu Bar



Panels


Book info

Book URL:

use proxy server

The 'book info' panel holds the URL input field as well as the received book information: Title, number of pages and book ID.
After entering the book URL Hathi Download Helper reads the html document after pushing the 'get Book info' button.
If desired a proxy server could be used by selecting the corresponding checkbox.


Download settings

pdfs

images

image zoom

download OCR text

pages

In the 'Download settings' panel the user can choose between two file formats:
  • pdfs : select this option to download the book pages as single searchable pdf documents generated by Hathitrust.org.
  • images: select this option to download the book pages as image files (jpeg, png).
  • The image quality can be adjusted by selecting a zoom factor.
  • To generate 'searchable' pdfs Hathi Download Helper has the option to download the ocr text in addition to the image files. The ocr text files will be stored as html documents.
  • Using the 'pages' input field the user can decide either to download a whole book or only certain pages.


Hathi Download Helper creates the following sub-folder structure for downloaded data:
  • 'pdfs'
  • :For pdf files
  • 'images'
  • :For image files
  • 'ocr'
  • :For ocr text (*.hmtl)


PDF merge & conversion

merge pdfs

convert & merge images to pdf book

convert images to single pdf files

use plaintext (ocr text) only

set pdf resolution

In the 'PDF merge & conversion' panel the user can choose between the following options:
  • 'merge pdfs' : Merge single pdf files using the free tool 'pdftk' (http://www.pdflabs.com)
  • 'convert & merge images to pdf book': Convert and merge images to a pdf book. Page size and page margins are editable via 'Options' -> 'Page setup'
  • convert images to single pdf files: Create single pdf files for each page.
  • Sets the output resolution for pdf files generated by Hathi Download Helper from images/ocr text files.

Hathi Download Helper creates the following sub-folder structure for converted data:
  • 'pdfs'
  • :For generated pdf files. Existing files will be overwritten.
  • 'pdfs_text_only'
  • :For ocr text pdfs.

Hathi Download Helper namespace

Hathi Download Helper is using a fixed name structure for downloaded data, starting with the document ID (but with removed reserved characters) (e.g. 32101076400420) + "_page_" + page number + filetype extension, e.g. njp.32101076400420_page_001.pdf

Hathi Download Helper as PDF merger

Hathi Download Helper is able to merge any pdf files utilizing the 'pdftk' application. For this purpose the radio button "merge pdfs" has to be selected. When selecting a folder without content downloaded by Hathi Download Helper (files/folders) a corresponding file dialog for file selection will apear. If you are running a linux or MAC OS system you have to install the'pdftk' tool (http://www.pdflabs.com).

Hathi Download Helper as "Image to PDF" converter

Hathi Download Helper is able to convert a number of differnt image formats into pdf files. For this purpose the radio button "convert & merge images to pdf book" or "convert images to single pdf files" has to be selected. When selecting a folder without content downloaded by Hathi Download Helper (files/folders) a corresponding file dialog for file selection will apear."

FAQ

  • What does the name "Hathi Download Helper" mean?
  • Hathi (pronounced hah-tee) is the Hindi word for elephant, an animal highly regarded for its capability to suck a huge amount of water into its trunk, and blows the water into the mouth. In computer networks, to download means to receive data to a local system from a remote system, or to initiate such a data transfer. Helper refers to a device that helps. In combination, the words convey the key benefits users can expect from this application - to download pages or complete books in an easy way.

    ERROR FIXING

  • "Error: unable to execute 'pdftk' application."
  • For merging existing pdf files Hathi Download Helper is using the 'pdftk' application. The error may occur due to missing permissions for the pdftk file missing files. To fix this error you have to do the following actions in dependency of your OS:
    • Windows
      1. Download and install 'pdftk' from http://www.pdflabs.com
      2. Open the pdftk program folder and copy the files pdftk.exe and libiconv2.dll
      3. Open the Hathi Download Helper folder containing the hathidownloadhelper.exe file and create a new folder named pdftk
      4. Copy the files from step 2 into the pdftk subfolder.
      Hint:If you have compiled Hathi Download Helper on your own you have to place the pdftk subfolder in your Debug/Release target folder containing the HathiDownloadHelper.exe file.
      • Linux/MAC
        1. Download and install 'pdftk' from http://www.pdflabs.com or use the pdftk file placed in the pdftk subfolder attached to this project.