HOW TO FIX EPUBCHECK ERRORS
You probably landed on this page because you're a Smashwords author or publisher, and AutoVetter informed you that your book is failing EpubCheck. Or, maybe you found this page from a search engine. Ordinarily, we'd want to give you a warm welcome to this page, but truthfully, we're sorry you're here because EpubCheck errors are no fun for anyone. We'll do our best to return you to your day as quickly as possible.
Step One: Before you read further, grab yourself a cup of chamomile tea or some other legal non-narcotic, non-alcoholic sedative, so you can prepare for the un-joyous journey ahead into double-plus-ungood-epubcheck-non-bliss.
First, let's start with some quick questions and answers....
Q: What's EPUBCHECK (and why is it taking me from my Happy Place?!?!?!)
A: EpubCheck is an industry-standard validation routine that checks the .epub version of your ebook for compliance to the EpubCheck specification. Eyes glazed over yet? "Ooga Booga What," you ask? In plain English, EpubCheck validation tells you whether or not your .epub ebook file is constructed according the .epub rules set by our friends at International Digital Book Forum. (Smashwords is a member, but we didn't make the rules. We would have said, "make EpubCheck easy!").
Q: Why do I care what EPUBCHECK thinks of my ebook?
A: EpubCheck is a good tool to ensure your book will operate properly on all epub e-reading devices. If a book fails EpubCheck, it will most likely be rejected by Apple iBooks because their systems perform similar checks to make sure your epub file is properly constructed to industry standards.
Q: My book was already in the Premium Catalog. Why am I getting this error now?
A: EpubCheck was previously not a requirement for Premium Catalog inclusion, yet it has always been a requirement for distribution to Apple and now other retailers. In May, 2011, in an effort to better-communicate the steps necessary to achieve full book distribution, we began proactively communicating the epubcheck status, whereas previously authors were left wondering why their books weren't shipping to Apple. However, if your book was previously epubcheck compliant, it's always possible that your latest book updates introduced epubcheck errors. In recent years Kobo and OverDrive have been testing Epubs for compatibility with multiple devices, please see the Q. and A. below concerning your self-made or non-standard epub.
Q: I'm creating my Epub file with Adobe InDesign (or other software), any tips?
A: This can be less effective than using Microsoft Word and having our converter create the Epub for you. The problems we frequently see with self made Epubs are: 1) Missing the book cover inside them, it should be on the first "page" of the book. 2) Using embedded non-standard fonts. 3) Fixed layout pages, including 2-page layouts. You can't read these on a cell phone screen because your fixed layout gets shrunk to fit, ebooks are a different animal from PDFs, don't go there. 4) Though your self-created epub may pass EpubCheck, you should test it in Adobe Digital Editions and on your smartphone and tablets. Distribution to dozens of retail outlets requires compatibility on many devices and ereader apps. Text should "flow" and be re-sizable by your customer as well as able to conform to the "Night Reading" setting (white text on a black page). Also test if you can search for a word in your epub and highlight and add notes to the text on an ereader app.
Q. Help! I'm feeling nauseous!
A. We understand. Have another sip of tea, then read on. Now that we set your expectations really low, we'll try to help you have a better-than-expected experience as you correct this error.
Getting on with the FUN
The following information is excerpted from Step 27a of the Smashwords Style Guide (If you didn't carefully study and implement the Smashwords Style Guide before uploading your book to Smashwords, it might explain why you're here on this page. If you did study the Guide already, then fast forward to the Nuclear Option below).
Step 27a – Check for EpubCheck Compliance
If you want your book distributed to the Apple iBookstore, the EPUB file Smashwords generates for you must pass EpubCheck, which is an industry standard compliance validation tool. We’ve built a lot of magic into Meatgrinder that automatically repairs many EpubCheck problems without your intervention, but we can’t fix them all.
If your book doesn’t pass EpubCheck, it means there’s a problem with the source file you uploaded to Smashwords, and your book won’t ship to Apple.
** How to Identify your EpubCheck errors ** : After you publish your book, and you’ve fixed any remaining AutoVetter errors, download your EPUB file to your computer’s desktop (click to your book page, click to download the ".EPUB" version, then save it to your computer's desktop), and then click to http://validator.idpf.org/ and click [browse] to locate and select (click on) your .EPUB on your desktop, then click [validate] to run the test. If your book passes, it’ll tell you. Congratulations! Go celebrate.
If it fails, it’ll toss up (mostly) incomprehensible spaghetti language telling you why your book failed. Take a deep breath (Very important). Try to study and understand the messages, but don’t pull out too much hair if it’s confusing, because it is SUPER-CONFUSING.
If you're feeling bold and adventurous, and your self-esteem is bulletproof, head over to the official EpubCheck Error Reporting Page at http://code.google.com/p/epubcheck/wiki/Errors to learn more about the errors, or to get more confused. Take another deep breath. No, you’re not stupid! The confusing errors are stupid. Welcome to the early days of the ebook revolution. We’re confident these tools will continue to get better over time.
If the tips below don’t help you, or you find they drive your EPUB-addled brain to the breaking point, one of the most effective ways to fix the problem is to reformat your book using the Nuclear Method described above in this guide, because the Nuclear Method will clear out all the gunk you cannot see (it’ll also remove all your formatting, so use it carefully). Or, if you’d rather pay someone to fix the problem for you, see "Mark's List" for a low-cost emotional life preserver.
Common Reasons for EPUBCHECK Failure
- Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons' - Our internal validator will not flag this particular issue. However, if you've seen it by validating the book on your own at http://validator.idpf.org/, please feel free to disregard the error. This error pertains to coding style and as far as we know, this does not affect the epub's readability, nor are our retail partners reporting issues with books that are getting this.
- Error while parsing file 'attribute "start" not allowed here often combined with errors beginning with Attribute "value" not allowed here... - Auto-numbered lists will get errors like this. Do not use Word's Autonumbering function. Instead manually key in your numbered lists or bullets. Some bullets will also trigger an error that looks like this if they are using an automated sequence (Top-level bullet Secondary level bullet Tertiary level bullet... The auto-numbered items will not be selectable like regular text is, you can spot them by "select-all" and then using the scroll button to browse your manuscript.
- 'Image File Misidentified' - If Microsoft Word thinks an image in your .doc file is labeled as one image format but EpubCheck thinks it's actually another format, you'll receive an error from EpubCheck such as, "ERROR: [Your File Name].epub: The file tmp_[gobblygook followed by some number].gif does not appear to be of type image/gif". Or, rather than .gif it might read .jpeg or .png. Here's what one of our test file names looked like: tmp_cddc48cb_m29fb45.gif Two Solutions: THE QUICKEST OPTION - IF YOU HAVE ONLY A FEW IMAGES IN YOUR WORD .DOC: Simply delete the images and then replace them by clicking Insert: Picture: From File. Then click "Upload new version" from your Smashwords Dashboard. IF YOU HAVE MULTIPLE IMAGES here's a longer step-by-step on how to identify the corrupted image and replace it: 1. Take a look at the error message you're receiving from IDPF Validator above, and note the last few characters of the image file name in your EpubCheck error message. In my example above, they're 45.gif. By noting those characters, you can easily identify the offending image that needs to be replaced. 2. Download your .epub file from your Smashwords book page to your desktop (or better yet, to a new folder on your computer's desktop). Once the file is on your desktop, you'll note its file name ends in .epub. Manually change the file name so instead of it ending with .epub, make it end with .zip (to do this, you'll usually just click once on the file name, then highlight the characters of .epub and change to .zip). 3. Next, use an unzip program such as WinZip to open what was once your .epub file (.epub files are actually composed of a zipped collection of multiple files, including every single image you imported into Microsoft Word). 4. Scan the image files for a file name that ends with the last few characters that match the EpubCheck error message you received. In the example case above, you'd look for "45.gif." 5. Next, click on the image within your unZip program and open it (you may need to right-click and then choose a program to open it with, such as Paint or Paint.net or some other graphic program. Some browsers will open .jpeg or .png files, but not .gif files). Once you open the image file, that will help you visually identify the corrupted file in your Microsoft Word document. 6. Open the Microsoft Word .doc file you uploaded to Smashwords (your source document), find the image, then delete it. 7. In Word, click Insert: Picture: As File to re-select your original image and import it again into Word. This will fix the corruption. 8. Once you've replaced the corrupted files this way, click to your Smashwords Dashboard then click 'Upload New Version.' This will regenerate your epub. 9. After conversion completes, click to the book page and at the top if it's not warning you of an EpubCheck error that means you fixed it!
- 'could not parse' followed by tmp_(bunch of numbers and code text)_split_000.html: duplicate id: Frame1. The duplicate id error seems to come up if you have a text box, graphic, or an automated page number in your Word doc's header or footer field. It's best to delete everything in the header and footer fields in your Smashwords-ready Word file.
- Error while parsing file 'element "span" not allowed here; expected the element end-tag or element "address", "blockquote", "del", "div", "dl", "h1", "h2", "h3", "h4", "h5", "h6", "hr", "ins", "noscript", "ns:svg", "ol", "p", "pre", "script", "table" or "ul" --- This may be difficult to spot in your Word file, but we have found instances when it comes up at the start of paragraphs that are using a modified paragraph style. You might consider right-clicking on that paragraph and Creating a new quick paragraph style based on that paragraph. To read about setting up Paragraph Styles and using them in your manuscript, see Step 7 in the Smashwords Style Guide.
- PlayOrder Error – If you see “playorder” in the spaghetti messages, see Step 20 in the Smashwords Style Guide, and how to create a proper NCX for your .epub file. If the NCX Meatgrinder constructs for you is not formed properly, you may get this PlayOrder error.
- Missing the “http://” in Web Address - If your book contains improperly formed hyperlinks, it’ll fail EpubCheck. For example, if you right mouse click on a hyperlink, and you see the link points to www.website.com instead of http://www.website.com, it’ll fail. Smashwords' Meatgrinder will automatically fix this for you, but for those of you not yet using Smashwords (hey, why not?), this one's for you.
- Missing the “mailto:” in front of an email address - If you’re linking to a live email address, the email address underneath the hyperlink (right mouse click then click Edit Hyperlink to see) should begin with “mailto:” so it looks like mailto:email@example.com where, of course, you’ll replace firstname.lastname@example.org with the actual email address. Meatgrinder should also fix this, so most of you Smashwords authors and publishers can move on to the next item..
- Attribute "Clear" not allowed here - This error can be caused by a floating image in your book. Open your Word .doc then click on your images and try to drag them. If they drag, they're floating. To fix a floating error, right mouse click on the image, click format picture, click Layout, then click "In Line with Text." Once you repair the images, click to your Dashboard and click "upload new version."
- Attribute "hspace" or "vspace" not allowed here or "border", "cellpadding" - These sorts of errors tend to point to the existence of text boxes in your Word file. If you have your Word document open, click on the "File" tab in the top left (in Word 2010), then "Options," "Advanced." Scroll down to the "Show document content" area and click/check "Show text boundaries." Now you can page through your document and look for any text boxes. Extra text boxes should stand out as dotted outlines, search for "text in text boxes" (no quotes) in the Style Guide to see an illustration of one. Delete the text boxes.
- Attribute "width" not allowed here - These errors indicate the existence of tables (not converted to images) in your Word file. If you used tables, then Import them as images.
- 'AUTHOR': fragment identifier is not defined in 'tmp_57dc2d0598aece619ebfd455c43924c3__KsAqF.ch.fixed.fc.tidied.stylehacked.xfixed_split_000.html' This is an example of a bookmark in Word which spans more than one paragraph return. The bookmark was titled "AUTHOR" (all caps) and the hyperlink to it in the linked Table of Contents was not functioning. Once the bookmark was deleted and re-created as "author" (lowercase) and re-hyperlinked in the Table of Contents, the error no longer occurred. If recreating the bookmark and hyperlink does not work, then try naming the bookmark without capital letters, ie: author, instead of using all caps.
- Mimetype entry missing or not the first in archive or Mimetype entry must not have an extra field in its ZIP header. It appears that you attempted to pack the EPUB as a standard zip archive. Specifications state that it is acceptable to compress all contents EXCEPT for the “mimetype” file. If you use a regular file compression utility, it will incorrectly compress ALL files, thus creating an invalid EPUB. Do an internet search for "how to package an epub". Here are a few suggestions we found. For Macs there is a free utility called ePub Zip. For Windows consider ePubPack which also requires Microsoft .Net Framework (also free).
- CSS selector specifies absolute position. Preliminary fix is to do the first ten steps of the Deep Cleanse (found at the bottom of this webpage). We are still investigating a connection with images not being layed out as "Inline with text," or images resized inside Word or if it has something to do with page layouts set to use Headers and Footers "Different odd and even" and "Different first page."
- Garble in your Image's Metadata - Right mouse click on your images, then click Format Picture, and then click on the Alt Text tab. If you see garble in there, or html, delete it.
- Properties Error - If you examine the Properties in your Word file (In Word 2000 & 2003, go to File: Properties; in Word 2007 click on the round Office button at the upper left of the screen, then hover your mouse pointer over Prepare, then click Properties) and you see strange HTML characters in there, remove them.
- HTML and Styling errors – This is a catch-all for “that which we mere mortals cannot understand.” Microsoft Word often contains the remnants of old or hidden styling that you can’t see with the naked eye, especially if your book originated in a program other than Microsoft Word, or if your book was once in HTML form. Unless you’re a geek or HTML expert, these errors, even after studying the EpubCheck error reporting, are very difficult to decipher and identify. If you can’t figure it out on your own, you may need to reformat your book from scratch by implementing the Nuclear Method, which will purge all the hidden corruption. The Nuclear Method will also purge all other formatting.
- Error while parsing file 'different playOrder values for navPoint/navTarget/pageTarget that refer to same target'. This will come up in books relying on Meatgrinder to create the NCX for them and the first page of the Word file begins with one of the keywords that get pulled into the NCX such as "Forward" as the first page also gets a "Start (of book)" link, thus two links to the same location, (...refer to same target).
- Filename contains spaces, therefore URI escaping is necessary. Consider removing spaces from filename. - You can eliminate the warning by replacing any space character with '_' or '-' or the not-as-readable '%20' in the file path of any CSS or image file (name) that is included in the final EPUB file. a-z, A-Z, 0-9, underscore, are probably safest. Only one dot (.), before the filename extension (".jpg", ".xhtml", etc.), as well. We're talking about the basic 128 characters (0-127) of ASCII, not any of the extended stuff (128-255), which in some encodings would contain, "é", "ñ", etc. - For example, if the _references_ to those files (within the OPF file) do not have those spaces escaped. i.e., if the reference looks like: "some file.xhtml" instead of: "some%20file.xhtml" (with '%20' representing space, ASCII character 32 (decimal) aka 20 (hex)) - If you are using InDesign, there is a plug-in available for InDesign that will batch rename and relink placed images, here is a forum posting: https://forums.adobe.com/message/2624301
- Warning: File name contains the following non-ascii characters: (example here). Consider changing the filename. These may not show up as Epubcheck Errors, but create issues that prevent some retailers such as Apple from being able to sell your book. When in doubt, using only the characters A through Z, a through z, 0 through 9, -, and _ is always safe and will reliably avoid problems. Do not use space bar spaces in file names (Example:"la designación.jpg" or "file name.html" can be renamed "la_designacion.jpg" or "file-name.html"); we recommend this approach. More detailed information follows:
To avoid the issue, please rename/have the authors rename these files using only ASCII characters and then redeliver. We recommend to not include any characters with a tilde, accent marks, etc. in the file names. For clarification, file names can generally include diacriticals, umlauts, etc. The reason for the errors has to do with "Unicode Normalization." For example, the same character in a file name inside the Epub, such as "é" can be represented in Unicode using different code points, depending on whether a given computer system uses NFC or NFD approach (see that Wikipedia page). The likelihood of that dissynchronization increases if the EPUB production process crosses several computer systems. For most characters, such as plain ASCII, but not only, and even for those seemingly more complicated ones such as Japanese, Chinese, and Korean, this doesn't matter at all, because their NFC and NFD representations are identical. But unfortunately this very much matters for diacriticals, umlauts, and such - as that Wikipedia page explains:
For compatibility or other reasons, Unicode sometimes assigns two different code points to entities that are essentially the same character. For example, the character "Å" can be encoded as U+00C5 (standard name "LATIN CAPITAL LETTER A WITH RING ABOVE", a letter of the alphabet in Swedish and several other languages) or as U+212B ("ANGSTROM SIGN").
We sometimes run into the situation where the names of the files don't match the references to those files in the manifest and elsewhere in the book, even though they look absolutely the same. It doesn't mean that EVERY such character causes the issue, just those that had the misfortune to fall victim to NFC / NFD representation difference. Here is Apple's Cross-platform filename Best practices and Conventions FAQ
HELP! Where's the Panic Button?
If the tips above didn't help you identify the problem, and you're running low on Chamomile, then the Nuclear Method is the most reliable solution. See Step 5 in the Style Guide, which I've excerpted below:
Step 5 - THE NUCLEAR METHOD (HOW TO PURGE HIDDEN CORRUPTION)
Your Microsoft Word document can become corrupted if it has been touched by multiple word processors, or if it originated in a program such as InDesign or WordPerfect, or if it originated in PDF and then was converted to Word, or if you imported your images incorrectly.
The Nuclear Method purges all your formatting and allows you to start with a fresh Word document, free of hidden formatting or corrupted styling. Most formatting professionals on Mark’s List (our list of low cost Smashwords formatters. Get the list by emailing list at smashwords.com) employ this method because it maximizes your odds of a good clean multi-format conversion.
The Nuclear Method is also recommended if previous versions of your manuscript failed to convert, or if you’re struggling with EpubCheck errors, text boxes or tables you can’t find, or if you suspect your book is corrupted.
- Make a backup of your manuscript (VERY IMPORTANT!) and set it aside in case the Nuclear Method fails you.
- Copy and paste your entire manuscript into Windows Notepad (usually found in Programs: Accessories) or any other text editor. This will strip out all your formatting. Here's how: "Select all" by typing CTRL+A (press the CTRL key, hold it down, then press the A key at the same time) then CTRL+C for “copy,” then paste into an empty Notepad document using the key combination CTRL+V.
- Close Microsoft Word, but do not shut down your computer.
- Reopen Microsoft Word so it’s showing a fresh empty document.
- In Notepad, type CTRL+A to select all, then CTRL+C to copy (or, use the Notepad menu options under "Edit") and the paste into the empty Word document using either CTRL+V (for paste) or Edit: Paste (in Word 2000 and 2003) or Home: Paste (Word 2007).
- If it looks like an ugly unformatted blob of text, success!!! If not, repeat from Step 1.
- Finally, reformat the book per the Style Guide, from the start of the Style Guide. If you don't have the time or patience to do this on you own, see "Mark's List."
- Once your book is reformatted, click to your Smashwords Dashboard, then click "Upload new version" to upload your corrected manuscript. Once the conversion completes, you'll know you were successful if AutoVetter doesn't flag the book for an EpubCheck error. Congrats!
The Deep Cleanse
Alternatively, there is a method that can be run in Microsoft Word to clear out formatting problems, using the replace function. In all steps EXCEPT #4, replace all until 0 changes are made.
- Remove spaces before and after paragraph returns. In the Find area, type "^p " (a paragraph return followed by a space) without the quotes. In the Replace area, put ^p on its own. Repeat this step by typing " ^p" in the Find area without the quotes.
- Remove all non-breaking spaces. Find: ^s -- Replace with nothing.
- Replace all soft returns with hard returns. Find: ^l (that's a lowercase L) -- Replace: ^p
- Replace all section breaks with page breaks. Find: ^12 -- Replace: ^12 (Yes, this is the same code, but it works. Unless you have no section/page breaks to start with, replacing all will never get to 0 changes made.) You only need to perform this step once.
- Remove all manual tabs. Find: ^t -- Replace with nothing.
- Save the .doc file as "Web Page, Filtered" and add "-fix" at the end of the filename (excluding the extension) to keep it distinct. This will cut out all excess formatting data that are not being used. Any images in the .doc file will be placed in a folder in the directory you've saved to.
- Close the .doc file.
- In the "Recent File" menu in Word, open the .html file into Word. If you have "Show/Hide" activated, you should see a mess of degree symbols; these are non-breaking spaces.
- Repeat Step 2 and save your file as a .doc; if you have no images in your book, you are done, otherwise continue to Step 10.
- Your images will appear to be present in the document — this is sort of true. What you are seeing are links to your images. If you try to submit the document at this point, all the images will be replaced with black boxes with red X’s in the .epub conversion, because the images don’t exist in the submitted file. Close it and reopen it. You can find images by searching for ^g
- Press Ctrl+A to select all text, then Alt+E followed by K on its own. Make sure all links are selected, click the “Save picture in document” checkbox, and click “OK”.
- Save this file as a “Word 97-2003 Document” or .doc file (this should still be the file that ends in “-fix”). On average, a 20-30% drop in file size from the original .doc file is normal, though extremely bloated files have seen drops of 70% or more.