Handling PDFS

The topic today is PDFs and there is probably something for everyone here, whether you aren’t sure how to handle them or why you would want to, or whether you have Acrobat Pro and think that is all you need.

PDFs are files with the suffix .pdf and were designed by Adobe to be portable – to display the same on any machine, PC, Mac or whatever. This makes them ideal for the publishing world and it is much cheaper to send a PDF than to send paper proofs.

Receiving PDFs

As indexers they are nothing but advantageous for us. Email is much faster than couriers, so typically we get an extra day’s indexing time. Do, however, always confirm that you have received them. Get your clients used to expecting that. Email is never reliable and emails with large attachments are prone to getting blocked. You really don’t want to find that your client thinks you have been working on the index for the past week while you have been waiting for the file to arrive.

Merging PDFs

Quite often I find that the client sends a separate PDF file for each chapter of the book. Clearly we don’t want to have to ask the client to spend a chunk of their day reformatting the PDFs for our benefit, and we don’t have to.

We can use the PDFs unmerged. Create a directory and put all the PDFs for the book into that single directory. In Acrobat Reader press Shift-Control-F and in the Search dialogue you can search all documents in a particular directory. This will enable you to use the Search throughout the whole book, but if are editing your index and want to go to, say, page 174, it doesn’t help. You are not sure which file page 174 is in and even with each file open in a separate window it is still trial and error.

Merging the PDFs into a single file gives you maximum flexibility.

The easiest way of merging PDFs is to use Acrobat. Acrobat comes now in 4 versions: Reader, Standard, Pro, and Pro Extended.

The first, Reader, is completely free and can be downloaded from the Acrobat site The latest version is 9.1 and it is an improvement in every way over previous versions. There is no excuse for using an older version. Reader cannot, however, merge PDF files.
The other versions of Acrobat can all merge files and all cost money – quite a lot of money. If you are just indexing, rather than also copy-editing, typesetting etc., then you don’t need more than Standard. Check the Adobe comparison of features http://bit.ly/tRDPO


Once installed you can select the files in Explorer, right-click and select Combine in Adobe Acrobat. You then get the chance to reorder the files, because they are not picked up in the order on disk, press the button and Acrobat will chug away for a while and create a single file.

Sometimes, however, you will instead get a message saying that the file is ‘protected’ and it cannot be merged. This is where non-Adobe utilities come into their own.

Merging Protected PDFs

PDF Split and Merge screenshot

PDF Split and Merge from is available from SourceForge. It is completely free, but you can make a donation on the download page if you decide this is saving you money.
From pdfsam.org Download the basic 1.1.4 installer, the one at the top, and double-click it.

Once it starts, click on the left on “Merge/Extract”; click on Add to select the files and click Open.

Adjust the order if required and use Browse to give the output file a name
Click Run and within a second or so, you have a merged PDF file, with none of those pesky “protected file” messages.

Page Numbers

PDFs have two sorts of page numbers – physical and logical. In order to see both, in Reader 9 press ctrl-K to bring up the Options/Preferences dialogue, select “Page Display” on the left, and check the Use Logical Page Numbers box. If this box is unchecked you will always see only the physical page numbers, so the first page will always be page 1. With it checked, you may see in the navigation bar “97 (1 of 52)” which means logical page 97, physical page 1. You may see “iii (3 of 158) which means that the prelims have been numbered with roman numerals. This is more helpful for indexing because the logical page number can match the number on the actual page, which needs to appear in the index. If the publisher hasn’t already done this then you can correct the page numbers yourself using Acrobat Standard upwards. This is not ideal, however, because you don’t see the logical page number everywhere. If you use the Search function and hover your mouse over each of the results displayed, it shows only the physical page number, and so sometime in a large index you are going to confuse the two.

My preferred option is to physically move the prelims to the back of the document, or delete them altogether, aligning the physical page numbers with the logical ones. You can do that with a paid-for version of Acrobat, but you can also do it with PDF Split and Merge. Clicking on the “Split” option on the left of the screen, Add your PDF file and enter the number of the last page of the prelims into the “Split after these pages” box. That will give you two files, one of prelims and one of text with the page numbers matching the numbers on the printed page.

Seek and ye shall find

Acrobat has two seeking mechanisms which are distinct and can be used together. In earlier versions of Reader they got entangled, but in version 9 they work properly.

Acrobat Reader Find box

Find works as you might expect. You press ctrl-F and enter text into a box, press return and it takes you to that text in the document. It then gives you arrows next to the box to take you to the next or previous occurrence. Rather less obvious is a small drop-down menu on the side of the box which allows you to seek Whole Words Only or to make the operation Case Sensitive.

Acrobat Reader Search window

Search is completely separate from this and is accessed by pressing Shift-Ctrl-F. Again you enter a text but now a separate window shows each occurrence of the text, with the rest of the line displayed too. On a large document this can take some time to search through the whole document, but you don’t have to wait for it to finish. As soon as you see the occurrence you want you can click and the main window will jump straight to it. Pressing the down-arrow key will move you down the list of search results and the main window will again jump to that location (this didn’t work in some versions of Reader earlier than version 9).

It is important to understand that these two functions are completely separate. For example, you could Search for all occurrences of ‘Inquisition’ and then when examining the 4th occurrence you could Find ‘Torquemada’ without disrupting your list of Search results.

Zooming

Some of what you find may be too small to read, with its location shown on screen surrounded by a blue box. With PDFs you can ZOOM and make the text larger. The easiest way to do this is to hold the control key down and move the mouse wheel, but there are toolbar buttons and menu options too.

One thing to be wary of when you have zoomed to enlarge the text is where the page ends. It is possible to be looking at the bottom of page 7 with the top edge of page 8 showing at the bottom of the window, with the page number in the navigation bar showing 8. At the end of a long week working on a long book, at some time you will put 8 in the index instead of 7. The way to avoid this is to set the view to View> Page Display> Single Page, rather than Single Page Continuous. That will mean that you only ever have one page on the screen at a time and the number shown is the number to go in the index.

This allows you to handle whatever a publisher throws at you for indexing, but there is one further utility which is useful for the business side of indexing. That is creating your own PDFs.

Creating PDFs

Why would you want to? Well, PDFs are by far the best format in which to store any form of electronic document. If you store Word files, for example, then you need the Word software in order to read them and if you open them on a different machine from the one on which they were created then they will look different.

The utility CutePDF is free and creates an artificial printer on your machine. You then print to that printer from any application you like and you get a searchable PDF file. The paid-for versions of Acrobat already contain a similar utility but CutePDF is much faster. The latest version of Word allows you to download a utility and then save files in PDF format, but it can be very slow and sometimes crashes.

CutePDF also, of course, allows you to create PDFs from your browser, so when making a purchase online or filing your tax return, for example, you can “print” the confirmation page to PDF and store that file in your records. This is probably as close to the paperless office as you will ever get.

Share

No related posts.

About James Lamb

James Lamb has a degree in Computer Science and Mathematics from London University, worked for over 20 years as a senior IT technician and team leader, much of that time for dealing rooms of international banks, and became a full-time, professional indexer in 2004.
This entry was posted in indexing, PDFs, SIdelights (SI newsletter) and tagged , . Bookmark the permalink.

4 Responses to Handling PDFS

  1. Tom Brown says:

    James,

    I found this page contained some helpful information, especially the link to PDF Split and Merge. I have yet to make any money from my various enterprises, so the cost of Acrobat would be hard to justify. I try to get by with freeware as much as possible. I do have a paid-for desktop publishing application (PagePlusX4) that lets me create my own PDFs, and open, and edit PDFs from other sources. But the merge/split utility is probably a quicker way to combine or rearrange PDF chapter files.

    I use PDF-XChange (free version) in preference to Adobe Reader. I don’t know why, but the fonts are reproduced better on my monitor in XChange than Reader when viewing the same document side-by-side from both applications. And XChange offers more markup options.

    As a member of the local historical society I am scanning, OCRing with ABBYY Fine Reader, and saving as searchable PDFs old documents that are in danger of deteriorating so as to become illegible before long.

    Thanks for the useful information here and in your contributions to the indexing lists.
    ____
    Tom

  2. James Lamb says:

    Thanks for the comments, Tom.
    In Acrobat Reader under Edit> Preferences (or press Ctrl-K) there is a Page Display section, which has options under Rendering, allowing you to switch between “Smooth Text” on LCD screens or Monitors, and choose whether to use Local Fonts. Playing with those might improve the clarity.
    However, Acrobat Reader has a lot of functionality that most of us use rarely and often the non-Adobe PDF viewers load faster and use up less computer resources, which can be helpful if you have other programs running too.

  3. Sue Lewis says:

    Hello James
    Thanks for the very useful breakdown of info about manipulating pdfs. I’m a trainee indexer just starting to figure out the practical, physical (as opposed to cerebral) aspects of the work. I don’t yet, for the training, have a need to split and merge (that I know of) but I will very soon need to create pdf files and had considered buying Acrobat Standard. I am interested in the free software having no income in prospect from the work in the short term. Before I trot off and explore ‘pdf split and merge’ and ‘cute pdf’ can I ask if you have any significant updates to the info given about those tools since you wrote your article last summer?
    Thanks again
    Sue

  4. James Lamb says:

    Hello Sue,
    Both PDFsam and CutePDF are both still available and free and I would still recommend them. Other postings I create relating to PDFs can be found from the PDF in the Categories box in the right sidebar.
    Thanks for the comment.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>