The HTML5 cache manifest file

HTML5 introduced a “manifest file” as an attribute to the <html> element. This file will tell the browser what to cache, what not to cache, and some other goodies we will see in this section.

Below is an example of the inclusion of a manifest file (it is best practice/recommendation to use the .appcache suffix):

  1. <html manifest=“myCache.appcache”>
  2. </html>

You must include the  manifest attribute on every page for which you want to cache resources (images included in the page, JavaScript files, CSS files). The manifest file will contain lines that will dictate which image/js/css needs to be cached, which must never be cached, etc.

The page with the manifest is de facto included in the list of files that will be cached. This means that any page the user navigates to that includes a manifest will be implicitly added to the application cache.

The browser does not cache a page if it does not contain the manifest attribute. The default cache behavior, prior to HTML5, will be used (depending on the browser versions). If you want a Web site or a Web app to work offline, then please include a manifest in every HTML page!

The manifest attribute value is a relative path (relative to the page URL), but an absolute URL can be used (not recommended) if it respects the same origin policy.

THE MANIFEST FILE MUST BE SERVED WITH THE CORRECT MIME TYPE

The HTTP server that serves your files must be configured so that .appcache files are served with the MIME type text/cache-manifest. For example, with the Apache server, this line must be added in the HTTP.conf configuration file (or in the .htaccess files):

  1. AddType text/cachemanifest .appcache

WHAT DO WE PUT IN THE MANIFEST FILE?

First example: cache an HTML page that displays the current time (clock). It uses three pages: HTML, CSS and JavaScript. This example is taken from the W3C specification, you can try the online example here.

The manifest file is:

  1. CACHE MANIFEST
  2. clock.html
  3. clock.css
  4. clock.js

Lines 2-4 show that a given HTML page, in which this manifest is included, asks the browser to cache three files: the HTML page itself (clock.html, cached by default, but the specification recommends adding it to the manifest as best practice), a CSS file clock.css and a JavaScript file clock.js.

Note that the first line “CACHE MANIFEST” is mandatory.

PITFALLS TO AVOID WHEN USING THE HTML5 CACHE

PITFALL #1 : When a file is available in the cache and on the remote HTTP server, it will always be retrieved from the cache! An upcoming section is dedicated to “updating the cache”, and we will explain how to control this and update the files in the cache.

PITFALL #2: If one file cannot be retrieved and cached, zero files will be updated in the cache. There is no such thing as “partial update”. A best practice is to always validate your manifest file using one of the tools listed at the end of this chapter.

With the same example, Firefox asks if you agree to cache some data (the sentence is in French here but it says: “This Web site would like to save data on your computer for offline use. Authorize? Never? Just for this time?”):

Let’s have a look at another example of manifest.appcache (this one comes from the webdirections.org Web site), that does a little more:

  1. CACHE MANIFEST
  2. CACHE:
  3. #images
  4. /images/image1.png
  5. /images/image2.png
  6. #pages
  7. /pages/page1.html
  8. /pages/page2.html
  9. #CSS
  10. /style/style.css
  11. #scripts
  12. /js/script.js
  13. FALLBACK:
  14. / /offline.html
  15. NETWORK:
  16. login.html

This time, notice a few additional things:

    • It’s possible to add comments starting with #
    • There are three different sections in capital letters: CACHE, FALLBACK and NETWORK

These three sections are optional – we did not have them in the first example. But as soon as you indicate one of them, you must indicate the others. (CACHE was defaulted in the first example as we had no explicit section declarations).

The CACHE section specifies the URLs of the resources that must be cached(generally relative to the page, but they can also be absolute and external, for example for caching jQuery from a Google repository, etc.). These resources will be 1) cached when online, and 2) available from the cache when offline.

The NETWORK section is the contrary of the CACHE section: it is useful for specifying resources that should NOT be cached. These resources 1) will not be cached when online, and consequently 2) will not be available when the user is offline. EVEN IF THE BROWSER HAS CACHED THEM IN ITS OWN “PRE HTML5” cache! In the previous example, at line 23, the login.html file (the one with the login/password form…) is never cached. Indeed, entering login/password and pressing a “login/connect/signup” button is useless if you are offline.

Using a wildcard * in this section is also common practice; this means “do not cache all files that are not in the CACHE or FALLBACK section”:

  1. NETWORK:
  2. *

Partial URLs may also be used in this section, such as “/images”, which means, all URLs that end with images/*… should not be cached. Notice that wildcards and partial URLs are not allowed in the CACHE section, where all individual  files must be explicitly specified.

The FALLBACK section specifies resources that will be displayed when the user requests a resource that is not available when offline. For example, a login.html file must not be cached, nor be available when offline. In this case, accessing  http://…/login.html will cause offline.html to be displayed (and this file will be cached, this is forced by being in the FALLBACK section). The “/ /offline.html” in the FALLBACK section of the example says that for any resource that is not available in the cache (here, “/” means “any resource”), show the offline.html page.

Partial URLs can be used too. For example:

  1. /images/ /images/missing.png

… tells us that all images in the sub-directory “images” relative to the Web page that includes the manifest, if unavailable in the cache when the browser is offline, will be replaced by an image named “missing.png”.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s