Why caching is important
Browsers will often save copies of local static assets to reduce load time and minimize the amount of data that must be transferred, which is called caching.
Retrieving data from a source on the network takes longer than retrieving from the local, which is obvious because the connection from an external server is always weaker than connecting in the local environment. So caching will help reduce load times, along with not downloading unnecessary data also helps reduce the amount of traffic that must be transferred.
How does browser caching work?
Case 1: The user has never visited the site
In this case, the browser has no cache files yet, so it will download the entire data from the server.
Below is a screenshot of the resources that were downloaded when we first visited the Wiki home page. The status bar below shows that 265kb has been transferred to the browser.
Case 2: The user has visited the site before
We can see the difference when we refresh the Wiki homepage:
The amount of data transferred was reduced to 928 bytes – equivalent to 0.3% of the original amount of data. The Size column indicates that most of the data is taken from the cache.
Chrome will retrieve the file from both memory cache and disk cache. Since we haven’t closed the browser window from case 1, the data is still in the memory cache.
Show cache in the browser
In Chrome, we can access and
chrome://cache to view the cache contents. It will display the links to the pages containing the specific cache contents of the pages.
How does the browser know which file to retrieve from the cache?
The browser will check the HTTP response’s header from the server to see what content to download. There will be 4 commonly used headers for caching:
An ETag (or Entity Tag) is a string used as a validation token cache. It is usually the hash of the file contents.
The server can add the ETag header to the HTTP response, later the browser can use this header in the following requests (in case the file has expired cache) to check if the file content has changed, because using the function hash, so even if the file contents change a bit, the ETag string will be different.
If the hash string is kept intact, meaning that the resource content has not changed, the server will return code 304 (Not modified) with an empty body. This tells the browser that the file can still be cached.
Note that ETag is only used in requests when the file has expired cache.
The Cache-Control header has a number of values we can use to control behavior, expiration, and validation of the cache. All of the above cache properties can be mixed together.
public means resource content can be cached by any type of cache (browser, CDN, …)
private means only the browser is allowed to cache the content.
no-store means this content must always be downloaded from the server
no-cache is the most misleading value, no does not mean “don’t have cache”. This value tells the browser to cache the file but only use it when validated with the server that this is the latest version of the file. Validation will be used with the ETag header.
This value is often used with HTML files because browsers often have to check for the latest markup.
This value sets the time at which the file should be cached. Values after the = sign are calculated in seconds. So in the example above, the file will be cached for 1 minute (60 seconds). RFC recommends that this value not be exceeded for more than 1 year (max-age = 31536000)
In addition, for caching on CDN, we can install the following:
This value will require the browser to always validate the cache (using ETag) regardless of the expires value.
Expires are headers from HTTP 1.0, but there are still many pages that use this header. This header provides an expiration date for the files, after which time the files will become invalid.
Expires: Wed, 25 Jul 2018 21:00:00 GMT
Note, the browser will ignore this header if Cache-Control max-age is specified
Last-Modified is also a header from HTTP 1.0, which saves the last time the file was modified:
Last-Modified: Mon, 12 Dec 2016 14:45:00 GMT
HTML Meta Tag
Before the advent of HTML5, using HTML meta tags to control Cache-Control was also a common way:
<meta http-equiv="Cache-control" content="no-cache">
However, using meta tags like this is currently not allowed. Because with the meta tag, only the browser can read and cache the data, and the intermidate cache will not be able to understand.
So always use HTTP headers for caching.
Take a look at the following HTTP response example:
Date: Tue, 25 Jul 2017 17:26:16 GMT
Expires: Tue, 25 Jul 2017 18:26:16 GMT
Keep-Alive: timeout=5, max=93
Last-Modified: Wed, 12 Jul 2017 17:26:05 GMT
- Line 2 tells us that the max-age is 1 hour
- Line 5 indicates that the file in question is a png image
- Line 7 sends to ETag so that the browser can check if the file has been changed after 1 hour after receiving the file
- Line 8 will be ignored due to the use of Cache-Control max-age
- Line 10 shows the last time the file was modified