How to Read PDF Files on a PDA with iSilo™

by Ricky Spears
Last Update: January 3, 2005

Note: I first wrote this article in January, 2003. For nearly 2 years the method presented worked flawlessly and allowed users to read PDF documents on the Palm handheld devices with iSilo. Sometime in December of 2004 Adobe changed the method by which their online conversion tool works. This new tool no longer lends itself to the type of file conversion. The new tool can be found at: http://www.adobe.com/products/acrobat/access_onlinetools.html. Since the new form asks the user "Why aren't you using the Adobe Reader to read this PDF?", I would like to encourage everyone to use this tool and enter one or more of the following reasons:

I hope that if enough people will complain that they will either resolve these shortcomings in their own programs. I'm doubtful that they will ever bring back the conversion tool as it used to be.

I'm leaving this article on my web site as a tutorial to how disparate systems and formats can be coerced into working the way you need them to instead of how the designers wanted to work. Many people say, "I want to learn to be a hacker." Learning to legally use information the way you want to use it is the spirit of hacking. Enjoy!

iSilo is a wonderful off-line web browser for handheld devices. It has a desktop component called iSiloX that users can set up to retrieve web pages (or even entire web sites) and convert them into a format that they can read on a PDA. I use iSilo to bring fresh news and information into my Palm™ m515 every morning with my first HotSync® operation. Then I can read this news as I have time throughout the day regardless of where I may be at the time (waiting in line, in a boring meeting, or even in the bathroom.)

Some of the newsletters and eZines that I wanted to read were only available as PDF (Portable Document Format) files. I know that Adobe® provides a version of the free Adobe® Acrobat® Reader® program that will run on Palm OS devices (http://www.adobe.com/products/acrobat/acrrpalmdload.html) and the download includes a program to convert desktop PDF files for viewing with the Palm OS version of the Acrobat Reader. For daily news and eZines I did not find this convenient because it meant that I would have to download my PDF newsletters individually every morning, save them to my PC, and manually convert each of them for reading on the Palm. This method would also force me to use two separate programs on the Palm for reading my daily news.

After some experimenting, I found a way to bring PDF files into my Palm device in such a way that I can read them using iSilo. In this article we will look at the methods that I use to accomplish this as well as how and why those methods work. I'm including this detail in hopes that it will give you ideas on how to make computers work the way you work instead of how others want you to work. This article assumes that you have already installed iSilo on your PDA, that you have installed iSiloX on the computer to which you synchronize your PDA, and that you are already familiar with the basics of using each program. If you have questions about these programs, please consult the developer's web sites.

Note: The information in this document deals with dynamic web sites over which I have no control. I can not guarantee that this document and the information contained in it will remain current and accurate. If information is not accurate, feel free to send me an email (webmaster@rickyspears.com) with a correction and I will post the correction as I have time. Please don't waste my time and yours by simply telling me that something doesn't work or isn't right - take the initiative to send me the correct information. I will recognize those who help with this project.

The Inspiration

One of my co-workers introduced me to a nice daily newsletter called NetNoozNow. This is a small local publication but it has some nice light-hearted news that is great for water-cooler type conversation throughout the day. Unfortunately the delivery method didn't meet the way that I read my daily news. To receive the newsletter each day, you can subscribe to it by simply providing your email address on the subscription form on the site. Each morning a PDF version of the newsletter is emailed to subscribers, along with a link where they can view the PDF file online. I decided to do some investigating and see if there was some way that I could have iSilo convert NetNoozNow automatically for me as it was converting my other news for the day. This would allow me to read it along with my other news.

The Investigation

The link that NetNoozNow sent in the email was the launching point for my research. The link was to an HTML page that would display the PDF document. I knew that I wouldn't get my desired results by pointing iSiloX to this page since iSiloX can't natively convert PDF documents. I looked in the title bar of the window and noticed that it said: http://netnooznow.com/editions/triad/triad_today.pdf. I also could have found this by viewing the source code for the page, but since this was so obvious I thought that I would run with it and see where it lead me. I typed the URL into my browser and the raw PDF document was dislayed by itself. Now I knew exactly where they put the latest newsletter each day.

My next step was to figure out how to get this document from a PDF to an HTML format that iSilo could understand. I assumed that someone had already written a program to do this so I went to Google and searched for "PDF to HTML. The first result was a page on the Adobe site called Adobe PDF Conversion by Simple Form. I pasted the URL that I had recently discovered into this form, clicked the "Get This Adobe PDF Document as HTML" button, and the text from the newsletter was displayed before me in HTML format. It didn't have the pictures, but I wasn't interested in them anyway. This was going to be easier than I thought!

After reading through some of the text, I looked in the address bar of my browser to see if there was an URL there that I could just copy and paste into iSilo. All I saw was: http://access.adobe.com/perl/convertPDF.pl. This meant that they were using a Perl script to parse out the PDF document. I decied that I would see how information was sent to the Perl script.

I went back to the simple form where I had pasted my URL. I viewed the HTML source for this page and looked for the FORM tag. The ACTION attribute of the FORM tag was http://access.adobe.com/perl/convertPDF.pl, which is what I had seen in my address bar earlier. They were using the POST method. I scrolled down a little further to find the INPUT tag for the text box where I had entered the URL. I found it and noticed that it's NAME attribute was "url". I had all the information that I needed to try to use this script remotely.

In my address bar I entered the address for the script, a question mark to indicate that I was going to send the script some variables, the text "url=" to tell the script what had been entered into the INPUT field named "url", and then the url for the original PDF document. The whole line now looked like this: http://access.adobe.com/perl/convertPDF.pl?url=http://netnooznow.com/editions/triad/triad_today.pdf

I crossed my fingers, took a deep breath, hit the Enter button on my keyboard, and it worked! I saw the same HTML formatted content that I had seen before. This meant that I could use the script remotely.

My next step was to see if this would work from within iSiloX. I added a new document and gave it the customized URL above as the source. I converted the document, synched my Palm, and I was now able to read NetNoozNow on my Palm using iSilo. Mission accomplished!

Before I let the whole thing rest, I remembered that there was a link on the Simple Form page that had caught my eye. It said, "The Advanced Form". Could I have even greater control over how the document was presented? I clicked on the link to find out.

The Advanced Form offered options that would allow me to convert just certain pages from a PDF document, change the reading order, reflow the text in the paragraphs, filter non-printable characters, and provide a password if the document was protected. I had to admit that the sentences were a little choppy so I decided to try this form and turn on the "Reflow Paragraphs" option. I viewed the source for the Advanced Form page and looked for the INPUT tag that displayed the checkbox for the Reflow option. I found it and it was named "reflow_p". To make the script think that I had checked the box, I added the text "&reflow_p=true" to the end of my url in iSiloX. It now read http://access.adobe.com/perl/convertPDF.pl?url=http://netnooznow.com/editions/triad/triad_today.pdf&reflow_p=true

After converting the document using this URL, my paragraph formatting was now like I wanted it and I was fully prepared to read just about any PDF file on the world wide web on my Palm with iSilo.

The Implementation

I realize that everyone doesn't understand, or want to understand, how all the HTML and form stuff works. Some people just want to be able to read a PDF file with iSilo. With this in mind, I have written a JavaScript program (below) that will do most of the hard work for you. All you have to know is the URL for the document that you want to convert, and then set the options that you would like. Clicking the "Create iSiloX Source" will create a string of characters that you can then copy and paste into a new document in ISiloX. Consult the original Advanced Form on the Adobe web site for more information on what the options in this form do, or to preview various options in your browser before creating a link for iSiloX.

URL:

First Page: (By default, the entire document is converted.)

Last Page:

Reading Order:

Reflow paragraphs? (Check if paragraphs are to be reflowed. )

Filter? (Check if non-printable characters are to be filtered. )

Password: (Required if document is protected.)

     

Text for iSiloX Source field is displayed below:

This article was reviewed by Michael J. Blotzer in his column for Occupational Hazards magazine on August 22, 2003.

Copyright © 2003 by Ricky Spears
www.RSInnovative.com