Opens proxy server setup dialog. Here you can set up proxy IP, port number and proxy type. For authentication user name and password may be entered. Otherwise just leave those fields empty.
Opens export dialog for creation of an html file containing links to each book page. You may like to use this feature to utilize your favorite download tool to download all book pages on your own.
Help
→ Help
:
Opens the help dialog you are currently reading.
Help
→ About
:
Hathi Download Helper about dialog.
Help
→ About Qt
:
About Qt dialog.
Help
→ Check for Update
:
Checks if a new version is available online and provides download links for source code and installer.
Help
→ Automatically check for Update
:
Enable / Disable automatic update check on start-up.
The 'book information' group box holds the URL input field as well as the received book information: Title, number of pages, book ID, publisher and author. After entering the book URL Hathi Download Helper reads the html document after pushing the 'get Book info' button.
Alternatively the book ID can be entered also.
If desired a proxy server could be used by selecting the corresponding checkbox. When the book is blocked, e.g. due to copy right restrictions, a message saying "Received empty document..." will be displayed close to the progress bar.
In the 'Download settings' group box the user can choose between two file formats:
pdfs : select this option to download the book pages as single searchable pdf documents generated by Hathitrust.org. After download you have the option to merge all pdf files. For this operation Hathi Download Helper is utilizing 'pdftk' (see: PDF merge & conversion). Note: The download of pdf files is limited to approximately 15 files/ 5 mins.
images: select this option to download the book pages as image files (jpeg, png). The image quality depends on the selected resolution. The amount of files which you can download without a waiting time is much faster compared to pdf download.
The image quality can be adjusted by selecting a zoom factor. The listed dpi-values are approximations and depend on the selected page size.
To generate 'searchable' pdfs Hathi Download Helper has the option to download ocr text files in addition to the image files. The ocr text files will be stored as html documents on your hard disk.
Using the 'pages' input field the user can decide either to download a whole book or only certain pages. Single page numbers have to be separated by commas (e.g. 1,3,5). Page ranges have to be indicated by a hyphen, starting with the smaller value (e.g.: 5-10, 20-30).
Selecting the 'create pdf book after download' check box will automatically start the pdf merge and conversion process to generate pdf files or a single pdf book file.
When the 'resume book download' option is checked the Hathi Download Helper will check if there are already files of the specified book downloaded during a previous download session. By default is option is not checked and the downloader will re-download all files.
The 'enable WebProxies' check box will activate the build-in webproxy support of Hathi Download Helper. This feature will automatically generate download request via several webproxies to bypass the download limitations (e.g. for the pdf files) of hathitrust.org. Note: Only non-restricted books, which are accessible for non-us citizens, can be downloaded. See WebProxies for details.
In the 'PDF merge & conversion' group box the user can choose between the following options:
'merge pdfs' : Merge single pdf files using the free tool 'pdftk' (http://www.pdflabs.com)
'convert & merge images to pdf book': Convert and merge images to a pdf book. Page size and page margins are editable via 'Options' → 'Page setup'
convert images to single pdf files: Create single pdf files for each page.
Sets the output resolution for pdf files generated by Hathi Download Helper from images/ocr text files.
This section holds some information about the file naming and folder structure used by Hathi Download Helper. Furthermore, you will find some explanations about Hathi Download Helper as PDF merger and Image-to-PDF converter.
Hathi Download Helper provides an option that utilizes a large amount of random webproxies to download data from hathitrust.org:
This feature will re-direct all download requests to free web proxy services to continue the download of data while the server download limitation for the user is activated. Please note the following information:
Restrictions:
• Works only for non-restricted books which are accessible for non-us citizens.
• Strongly varying download speed.
Important advice: • Since this feature utilizes a large number of random webpages an updated virus scanner is recommented.
• There is no guarantee for proper functioning.
Hathi Download Helper provides an option to enable a network proxy. For implementation the QNetworkProxy class of Qt 4.7.4 is used:
The following types are supported:
Proxy Type
Description
Default capabilities
SOCKS_5
Generic proxy for any kind of connection. Supports TCP, UDP, binding to a port (incoming connections) and authentication.
Hathi Download Helper creates the following sub-folder structure for downloaded data inside the target directory:
'pdfs'
:
Folder for downloaded pdf files
'images'
:
Folder for downloaded image files
'ocr'
:
Folder for downloaded ocr text files (*.hmtl)
Note: All downloaded data (images, pdfs, ocr files) will be kept. If you don't need them any more you have to delete them manually.
Note: When restarting a download (with the same book ID to the same destination folder) all files downloaded in the previous session will be overwritten unless you have selected the 'resume book download options'. In that case the downloader will check if a corresponding file (with the same name) already exists and will not download this file again.
Hathi Download Helper creates the following sub-folder structure for converted data inside the source directory:
'pdfs'
:
Folder for generated pdf files. Existing files will be overwritten.
'pdfs_text_only'
:
Folder for generated pdfs files with ocr text only.
Note: Since the target folder for download is the source folder for conversion all existing pdf files within the 'pdfs' folder will be overwriten when 'single pdf' conversion was selected as output option!
Hathi Download Helper is using a fixed name structure for downloaded data, starting with the document ID (but with removed reserved characters).
This namespace is used for pdf files, images and ocr text files (html-files). Example for document ID: hvd.32044038439063:
Hathi Download Helper is able to merge any pdf files utilizing the 'pdftk' (pdf toolkit) application. For this purpose the radio button "merge pdfs" has to be selected. When selecting a folder without content downloaded by Hathi Download Helper (files/folders) a corresponding file dialog for file selection will apear. If you are running a linux or MAC OS system you have to install the 'pdftk' tool (http://www.pdflabs.com). For Windows systems Hathi Download Helper brings along a copy of 'pdftk'.
Hathi Download Helper is able to convert a number of different image formats into pdf files. For this purpose the radio button "convert & merge images to pdf book" or "convert images to single pdf files" has to be selected. When selecting a folder without content downloaded by Hathi Download Helper (files/folders) a corresponding file dialog for file selection will apear."
Note: Since the target folder for download is the source folder for conversion all existing pdf files within the 'pdfs' folder will be overwriten when 'single pdf' conversion was selected as output option!
Hathi (pronounced hah-tee) is the Hindi word for elephant, an animal highly regarded for its capability to suck a huge amount of water into its trunk, and to blow the water into its mouth. In computer networks, to download means to receive data to a local system from a remote system, or to initiate such a data transfer. Helper refers to a device that helps. In combination, the words convey the key benefits users can expect from this application - to download pages or complete books in an easy way.
There is a download limitation for any files by Hathitrust.org. When downloading too many files in a short period of time you will be forced to wait for some time. In case of pdf-files the limitation is about 15 files/ 5 minutes. Afterwards you have to wait for appr. 5 minutes. You may activate the WebProxy-Feature to download data via several webproxies during this queuing period.
Hathi Download Helper uses a PDF-Printer (Qt::QPrinter), which 'prints' the images into the pdf file. Since QPrinter only supports jpg-image formats all pages are stored as jpg-images inside the pdf file. Therefore even pages with text only have to be stored in the same way like full resolution images.
Hathi Download Helper does not have any OCR functionality. Instead it uses the OCR files generated by Hathitrust.org. The downloaded OCR files are stored as html files on your hard disk. For PDF creation the OCR text will be printed on each page overlayed by the corresponding images.
For merging existing pdf files Hathi Download Helper is using the 'pdftk' application. The error may occur due to missing permissions for the pdftk files. To fix this error you have to do the following actions in dependency of your OS:
Windows
Download and install 'pdftk' from http://www.pdflabs.com
Open the pdftk program folder and copy the files pdftk.exe and libiconv2.dll
Open the Hathi Download Helper folder containing the hathidownloadhelper.exe file and create a new folder named pdftk
Copy the files from step 2 into the pdftk subfolder.
Hint:If you have compiled Hathi Download Helper on your own you have to place the pdftk subfolder in your Debug/Release target folder containing the HathiDownloadHelper.exe file.
Linux/MAC
Download and install 'pdftk' from http://www.pdflabs.com or use the pdftk file placed in the pdftk subfolder attached to this project.
E.g. when you are using Ubuntu you can install pdftk by the following command: sudo apt-get install pdftk
fixed bug in image resolution setting after 'page setup' dialog, renamed images files in qt resources, copied image files in application directory
2013.05.24:
version 1.0.2 released:
changed development environment to 4.7.4, added compiler switch for qt 5.x, tested on linux and windows system, added options for GUI style and fonts, updated GUI, bug fix for missing ocr files, reduced freezing effect of GUI during pdf creation, added 'pdftk' binary for linux/OS, added selection for proxy type.
2013.06.03:
version 1.0.3 released:
bug fix for proxy type selection. moved pdf merge & conversion into QThread worker to eliminate freezing effect of GUI during processing. Changed usage from QPixmap to QImage for pdf creation. Changed OCR text extraction method to reduce memory usage(QWebkit is really greedy). Improved text font size adjustment method. Added Author and Publisher information. Changed Windows installer creation from QT framework installer to inno setup compiler to fix kernel32.dll error on win XP.
2013.07.02:
Version 1.0.4 released:
improved download performance by using parallel download requests (it is really much faster now :-D ), added encryption for proxy password, added 'check for update' feature, added batch job feature for downloading several books at once, added link export function
2013.08.18:
version 1.0.5 released:
re-implementation of all GUI elements and dialogs, fixed text clipping of GUI elements, fixed page shrinking on pdf creation due to long ocr text, improved download speed, re-designed help file
2013.10.27:
version 1.0.6 released:
bug fixes: lost destination path for single pdf-file creation, application crash on manual file selection. Added new features for batch job dialog: 'edit book', 'load job', 'save job', added gimmicks for Halloween and Christmas, minor changes.
2014.03.30:
version 1.0.7 released:
added new download options: webproxies, resume of book downloads, added user settings dialog, added auto-update option, coding: separated GUI from file downloader.
2014.05.06
version 1.0.8 released:
adjustments due to changes in hathitrust.org link structure.