HOW TO FIX EPUBCHECK ERRORS
You probably landed on this page because you're a Smashwords author or publisher, and AutoVetter informed you that your book is failing EpubCheck. Or, maybe you found this page from a search engine. Ordinarily, we'd want to give you a warm welcome to this page, but truthfully, we're sorry you're here because EpubCheck errors are no fun for anyone. We'll do our best to return you to your day as quickly as possible.
Step One: Before you read further, grab yourself a cup of chamomile tea or some other legal non-narcotic, non-alcoholic sedative, so you can prepare for the un-joyous journey ahead into double-plus-ungood-epubcheck-non-bliss.
First, let's start with some quick questions and answers....
Q: What's EPUBCHECK (and why is it taking me from my Happy Place?!?!?!)
A: EpubCheck is an industry-standard validation routine that checks the .epub version of your ebook for compliance to the EpubCheck specification. Eyes glazed over yet? "Ooga Booga What," you ask? In plain English, EpubCheck validation tells you whether or not your .epub ebook file is constructed according the .epub rules set by our friends at International Digital Book Forum. (Smashwords is a member, but we didn't make the rules. We would have said, "make EpubCheck easy!").
Q: Why do I care what EPUBCHECK thinks of my ebook?
A: EpubCheck is a good tool to ensure your book will operate properly on all epub e-reading devices. If a book fails EpubCheck, it will most likely be rejected by Apple iBooks because their systems perform similar checks to make sure your epub file is properly constructed to industry standards.
Q: My book was already in the Premium Catalog. Why am I getting this error now?
A: EpubCheck was previously not a requirement for Premium Catalog inclusion, yet it has always been a requirement for distribution to Apple. In May, 2011, in an effort to better-communicate the steps necessary to achieve full book distribution, we began proactively communicating the epubcheck status, whereas previously authors were left wondering why their books weren't shipping to Apple. However, if your book was previously epubcheck compliant, it's always possible that your latest book updates introduced epubcheck errors.
Q. Help! I'm feeling nauseous!
A. We understand. Have another sip of tea, then read on. Now that we set your expectations really low, we'll try to help you have a better-than-expected experience as you correct this error.
Getting on with the FUN
The following information is excerpted from Step 27a of the Smashwords Style Guide (If you didn't carefully study and implement the Smashwords Style Guide before uploading your book to Smashwords, it might explain why you're here on this page. If you did study the Guide already, then fast forward to the Nuclear Option below).
Step 27a – Check for EpubCheck Compliance
If you want your book distributed to the Apple iBookstore, the EPUB file Smashwords generates for you must pass EpubCheck, which is an industry standard compliance validation tool. We’ve built a lot of magic into Meatgrinder that automatically repairs many EpubCheck problems without your intervention, but we can’t fix them all.
If your book doesn’t pass EpubCheck, it means there’s a problem with the source file you uploaded to Smashwords, and your book won’t ship to Apple.
** How to Identify your EpubCheck errors ** : After you publish your book, and you’ve fixed any remaining AutoVetter errors, download your EPUB file to your computer’s desktop (click to your book page, click to download the ".EPUB" version, then save it to your computer's desktop), and then click to http://validator.idpf.org/ and click [browse] to locate and select (click on) your file on your desktop, then click [validate] to run the test. If your book passes, it’ll tell you. Congratulations! Go celebrate.
If it fails, it’ll toss up (mostly) incomprehensible spaghetti language telling you why your book failed. Take a deep breath (Very important). Try to study and understand the messages, but don’t pull out too much hair if it’s confusing, because it is SUPER-CONFUSING.
If you're feeling bold and adventurous, and your self-esteem is bulletproof, head over to the official EpubCheck Error Reporting Page at http://code.google.com/p/epubcheck/wiki/Errors to learn more about the errors, or to get more confused. Take another deep breath. No, you’re not stupid! The confusing errors are stupid. Welcome to the early days of the ebook revolution. We’re confident these tools will continue to get better over time.
If the tips below don’t help you, or you find they drive your EPUB-addled brain to the breaking point, one of the most effective ways to fix the problem is to reformat your book using the Nuclear Method described above in this guide, because the Nuclear Method will clear out all the gunk you cannot see (it’ll also remove all your formatting, so use it carefully). Or, if you’d rather pay someone to fix the problem for you, send an email to firstname.lastname@example.org for a low-cost emotional life preserver.
Common Reasons for EPUBCHECK Failure
- Garble in your Image's Metadata - Right mouse click on your images, then click Format Picture, and then click on the Alt Text tab. If you see garble in there, or html, delete it.
- Properties Error - If you examine the Properties in your Word file (In Word 2000 & 2003, go to File: Properties; in Word 2007 click on the round Office button at the upper left of the screen, then hover your mouse pointer over Prepare, then click Properties) and you see strange HTML characters in there, remove them.
- Missing the “http://” in Web Address - If your book contains improperly formed hyperlinks, it’ll fail EpubCheck. For example, if you right mouse click on a hyperlink, and you see the link points to www.website.com instead of http://www.website.com, it’ll fail. Smashwords' Meatgrinder will automatically fix this for you, but for those of you not yet using Smashwords (hey, why not?), this one's for you.
- Image File Misidentified - If Microsoft Word thinks an image in your .doc file is labeled as one image format but EpubCheck thinks it's actually another format, you'll receive an error from EpubCheck such as, "ERROR: [Your File Name].epub: The file tmp_[gobblygook followed by some number].gif does not appear to be of type image/gif". Or, rather than .gif it might read .jpeg or .png. Here's what one of our test file names looked like: tmp_cddc48cb_m29fb45.gif Two Solutions: THE QUICKEST OPTION - IF YOU HAVE ONLY A FEW IMAGES IN YOUR WORD .DOC: Simply delete the images and then replace them by clicking Insert: Picture: From File. Then click "Upload new version" from your Smashwords Dashboard. IF YOU HAVE MULTIPLE IMAGES here's a longer step-by-step on how to identify the corrupted image and replace it: 1. Take a look at the error message you're receiving from IDPF Validator above, and note the last few characters of the image file name in your EpubCheck error message. In my example above, they're 45.gif. By noting those characters, you can easily identify the offending image that needs to be replaced. 2. Download your .epub file from your Smashwords book page to your desktop (or better yet, to a new folder on your computer's desktop). Once the file is on your desktop, you'll note its file name ends in .epub. Manually change the file name so instead of it ending with .epub, make it end with .zip (to do this, you'll usually just click once on the file name, then highlight the characters of .epub and change to .zip). 3. Next, use an unzip program such as WinZip to open what was once your .epub file (.epub files are actually composed of a zipped collection of multiple files, including every single image you imported into Microsoft Word). 4. Scan the image files for a file name that ends with the last few characters that match the EpubCheck error message you received. In the example case above, you'd look for "45.gif." 5. Next, click on the image within your unZip program and open it (you may need to right-click and then choose a program to open it with, such as Paint or Paint.net or some other graphic program. Some browsers will open .jpeg or .png files, but not .gif files). Once you open the image file, that will help you visually identify the corrupted file in your Microsoft Word document. 6. Open the Microsoft Word .doc file you uploaded to Smashwords (your source document), find the image, then delete it. 7. In Word, click Insert: Picture: As File to re-select your original image and import it again into Word. This will fix the corruption. 8. Once you've replaced the corrupted files this way, click to your Smashwords Dashboard then click 'Upload New Version.' This will regenerate your epub. 9. After conversion completes, click to the book page and at the top if it's not warning you of an EpubCheck error that means you fixed it!
- PlayOrder Error – If you see “playorder” in the spaghetti messages, see Step 20 in the Smashwords Style Guide, and how to create a proper NCX for your .epub file. If the NCX Meatgrinder constructs for you is not formed properly, you may get this PlayOrder error.
- HTML and Styling errors – This is a catch-all for “that which we mere mortals cannot understand.” Microsoft Word often contains the remnants of old or hidden styling that you can’t see with the naked eye, especially if your book originated in a program other than Microsoft Word, or if your book was once in HTML form. Unless you’re a geek or HTML expert, these errors, even after studying the EpubCheck error reporting, are very difficult to decipher and identify. If you can’t figure it out on your own, you may need to reformat your book from scratch by implementing the Nuclear Method, which will purge all the hidden corruption. The Nuclear Method will also purge all other formatting.
- Missing the “mailto:” in front of an email address - If you’re linking to a live email address, the email address underneath the hyperlink (right mouse click then click Edit Hyperlink to see) should begin with “mailto:” so it looks like mailto:email@example.com where, of course, you’ll replace firstname.lastname@example.org with the actual email address. Meatgrinder should also fix this, so most of you Smashwords authors and publishers can move on to the next item..
- Attribute "Clear" not allowed here - This error can be caused by a floating image in your book. Open your Word .doc then click on your images and try to drag them. If they drag, they're floating. To fix a floating error, right mouse click on the image, click format picture, click Layout, then click "In Line with Text." Once you repair the images, click to your Dashboard and click "upload new version."
- Attribute "hspace" or "vspace" not allowed here or "border", "cellpadding" - These sorts of errors tend to point to the existence of text boxes in your Word file. If you have your Word document open, click on the "File" tab in the top left (in Word 2010), then "Options," "Advanced." Scroll down to the "Show document content" area and click/check "Show text boundaries." Now you can page through your document and look for any text boxes. Extra text boxes should stand out as dotted outlines, search for "text in text boxes" (no quotes) in the Style Guide to see an illustration of one. Delete the text boxes.
- Attribute "width" not allowed here - These errors indicate the existence of tables (not converted to images) in your Word file. If you used tables, then Import them as images.
- Attribute "value" not allowed here... often combined with errors beginning with Attribute "start" not allowed here... - Auto-numbered lists will get errors like this. Do not use Word's Autonumbering function. Instead manually key in your numbered lists or bullets. Some bullets will also trigger an error that looks like this if they are using an automated sequence (Top-level bullet Secondary level bullet Tertiary level bullet... The auto-numbered items will not be selectable like regular text is, you can spot them by "select-all" and then using the scroll button to browse your manuscript.
- 'AUTHOR': fragment identifier is not defined in 'tmp_57dc2d0598aece619ebfd455c43924c3__KsAqF.ch.fixed.fc.tidied.stylehacked.xfixed_split_000.html' This is an example of a bookmark in Word which spans more than one paragraph return. The bookmark was titled "AUTHOR" and the hyperlink to it in the linked Table of Contents was not functioning. Once the bookmark was deleted and re-instated to encompass two words on one line, the error no longer occurred. If recreating the bookmark and hyperlink does not work, then try naming the bookmark without capital letters, ie: author, instead of using all caps.
- Mimetype entry missing or not the first in archive or Mimetype entry must not have an extra field in its ZIP header. It appears that you attempted to pack the EPUB as a standard zip archive. Specifications state that it is acceptable to compress all contents EXCEPT for the “mimetype” file. If you use a regular file compression utility, it will incorrectly compress ALL files, thus creating an invalid EPUB. Do an internet search for "how to package an epub". Here are a few suggestions we found. For Macs there is a free utility called ePub Zip. For Windows consider ePubPack which also requires Microsoft .Net Framework (also free).
- could not parse followed by tmp_(bunch of numbers and code text)_split_000.html: duplicate id: Frame1. The duplicate id error seems to come up if you have a text box, graphic, or an automated page number in your Word doc's header or footer field. It's best to delete everything in the header and footer fields in your Smashwords-ready Word file.
HELP! Where's the Panic Button?
If the tips above didn't help you identify the problem, and you're running low on Chamomile, then the Nuclear Method is the most reliable solution. See Step 5 in the Style Guide, which I've excerpted below:
Step 5 - THE NUCLEAR METHOD (HOW TO PURGE HIDDEN CORRUPTION)
Your Microsoft Word document can become corrupted if it has been touched by multiple word processors, or if it originated in a program such as InDesign or WordPerfect, or if it originated in PDF and then was converted to Word, or if you imported your images incorrectly.
The Nuclear Method purges all your formatting and allows you to start with a fresh Word document, free of hidden formatting or corrupted styling. Most formatting professionals on Mark’s List (our list of low cost Smashwords formatters. Get the list by emailing list at smashwords.com) employ this method because it maximizes your odds of a good clean multi-format conversion.
The Nuclear Method is also recommended if previous versions of your manuscript failed to convert, or if you’re struggling with EpubCheck errors, text boxes or tables you can’t find, or if you suspect your book is corrupted.
- Make a backup of your manuscript (VERY IMPORTANT!) and set it aside in case the Nuclear Method fails you.
- Copy and paste your entire manuscript into Windows Notepad (usually found in Programs: Accessories) or any other text editor. This will strip out all your formatting. Here's how: "Select all" by typing CTRL+A (press the CTRL key, hold it down, then press the A key at the same time) then CTRL+C for “copy,” then paste into an empty Notepad document using the key combination CTRL+V.
- Close Microsoft Word, but do not shut down your computer.
- Reopen Microsoft Word so it’s showing a fresh empty document.
- In Notepad, type CTRL+A to select all, then CTRL+C to copy (or, use the Notepad menu options under "Edit") and the paste into the empty Word document using either CTRL+V (for paste) or Edit: Paste (in Word 2000 and 2003) or Home: Paste (Word 2007).
- If it looks like an ugly unformatted blob of text, success!!! If not, repeat from Step 1.
- Finally, reformat the book per the Style Guide, from the start of the Style Guide. If you don't have the time or patience to do this on you own, see "Mark's List."
- Once your book is reformatted, click to your Smashwords Dashboard, then click "Upload new version" to upload your corrected manuscript. Once the conversion completes, you'll know you were successful if AutoVetter doesn't flag the book for an EpubCheck error. Congrats!
Alternatively, there is a method that can be run in Microsoft Word to clear out formatting problems, using the replace function. In all steps EXCEPT #4, replace all until 0 changes are made.
- Remove spaces before and after paragraph returns. In the Find area, type "^p " (a paragraph return followed by a space) without the quotes. In the Replace area, put ^p on its own. Repeat this step by typing " ^p" in the Find area without the quotes.
- Remove all non-breaking spaces. Find: ^s -- Replace with nothing.
- Replace all soft returns with hard returns. Find: ^l (that's a lowercase L) -- Replace: ^p
- Replace all section breaks with page breaks. Find: ^12 -- Replace: ^12 (Yes, this is the same code, but it works. Unless you have no section/page breaks to start with, replacing all will never get to 0 changes made.) You only neesd to perform this step once.
- Remove all manual tabs. Find: ^t -- Replace with nothing.
- Save the .doc file as "Web Page, Filtered" and add "-fix" at the end of the filename (excluding the extension) to keep it distinct. This will cut out all excess formatting data that are not being used. Any images in the .doc file will be placed in a folder in the directory you've saved to.
- Close the .doc file.
- In the "Recent File" menu in Word, open the .html file into Word. If you have "Show/Hide" activated, you should see a mess of degree symbols; these are non-breaking spaces.
- Repeat Step 2. If you have images in your book, proceed to Step 10, otherwise skip to Step 12.
- Your images will appear to be present in the document -- this is sort of true. What you are seeing are links to your images. If you try to submit the document at this point, all the images will be replaced with black boxes with red X's in the epub version, because the links don't exist.
- Open the folder that holds your document's images. In Word, navigate to each image's location (Find: ^g) and remove it, replacing it immediately with the identical image from the image folder via drag-and-drop (the default alignment should be "in line with text").
- Save this file as a "Word 97-2003 Document" or .doc file (this should still be the file that ends in "-fix"). On average, a 20-30% drop in file size from the original .doc file is normal, though extremely bloated files have seen drops of 90% or more.