Tiny struggles

Always hacking something 🧑‍🔬.

Browser caching with Django & Webpack

What is browser caching and why is it useful?

Fetching stuff from the internet can be a lot of work and take a long time. What if your browser could save itself all this work and return you the result semi-instantly?

It just needs to save a file locally and return it to you next time you want it. We call it browser HTTP caching. Passive operations like “getting” the page will usually be cached.

This is great as long as the file at this address doesn’t change. But the thing is that it often does, especially if your site is under active development.

When things are just not working the way they are supposed to or styles are off, it’s very likely that the problem is unintended caching.

Crude approaches

Two popular crude approaches are:

  • disabling caching
  • manually managing the filenames after edit

Disable all caching!

One simple, but crude approach is to disable caching.

If caching is the source of my problem, let’s disable it!

In your browser

This is a good approach if you are not sure what is your problem and want to quickly verify if caching is to be blamed.

Modern browsers have an option to disable cache in the developer tools:

Disable cache

After you disable caching in the options, reload the page to get new content.

This is one of the “works on my machine” 🤦 types of solutions and you can’t expect your users to clear or disable their cache just because your app doesn’t handle caching well.

We need something better!

On the server side

The good news is that the browser will do what you tell it to do. You can set certain response headers ( Cache-Control ) to control the browser cache behavior when you send back responses from your server.

If you use nginx (a web proxy server often used with Django), you can disable caching there.

If you use the Django’s static convention, nginx config that disable caching would be:

server {

    location /static {
        alias /var/www/mysite.com/static;

        # kill cache
        add_header Last-Modified $date_gmt;
        add_header Cache-Control 'no-store, no-cache';
        if_modified_since off;
        expires off;
        etag off;
    }
    ...
}

This approach works, but it has a disadvantage. Files that don’t change and could be cached, don’t get cached and you are missing out on site speed ups.

Change the filename/import every time you edit

If files get cached if the URL is the same, why not change the URL if the file changes?

Renaming the file every time you edit would be tedious and in general, we have version control so we don’t have to create a new file every time we make changes.

But there exist a related technique of changing the URL by adding “?version=XXX” a versioning query parameter. If you are just serving static files, without any custom logic, the additional parameters are ignored by your server, but the browser will treat it as a new URL.

So instead of:

<script src="/static/my_script.js"></script>
<link rel="stylesheet" href="/static/base.css">

you would write:

<script src="/static/my_script.js?version=123"></script>
<link rel="stylesheet" href="/static/base.css?version=123">

and in case of django with the static template tag:

Instead of:

<link rel="stylesheet" href="{% static 'base.css' %}">

you would write:

<link rel="stylesheet" href="{% static 'base.css' %}?version=123">

With renaming you get a finer control than just disabling cache, because the things that can be cached stay cached. But it’s still not a great solution.

Using a version parameter in the URL is a better approach than just renaming the filename every time you make the edits yourself, but it’s still manual and error prone. It’s easy to forget to do it. Why not automate it?

Content hashes in filenames

One good way of updating the file name any time the file changes is by using hashes of the content as part of a file name.

Doing this all the time, would be a bit expensive, so it’s usually done during a build process.

If you are using Django together with React (or Vue, or…) in a way that Django serves page that then loads a JavaScript bundle built with webpack then you build has two phases:

  • webpack build to build your JS assets
  • Django collectstatic to do any preprocessing and move files to the static location.

Typically the way it’s set up is that your JS code is in some directory, e.g. called assets and when your JavaScript bundle is compiled the bundle is put into /static directory, the default location for Django’s JS and CSS files.

You can read about how to set up Django with JS like this here.

Let Django do it for you

Turns out Django has a way to automatically add hashes to your static files any time the content changes.

Just update your staticfiles storage implementation to ManifestStaticFilesStorage in your Django settings.py file. ManifestStaticFilesStorage computes the hashes based on file’s content, adds the hashes to the file names and then updates all the import paths that use static tag appropriately.

Nice!

Well, that works, unless you use code splitting and hashing with webpack. Unless your JS app is really small, you shouldn’t have it all in just one file. This is where the trouble starts with using the Django manifest solution.

Let webpack do it for you

If you are still reading, you are probably curious how manage things on the JavaScript side if your bundle consists of many files.

Well, webpack has support for caching by generating file names with content hashes too.

But then once you have a bundle with a hash in its name, how to integrate it to work with Django.

Google search suggests to use django-webpack-loader

Best practices

This tutorial talks in detail how to set it up.

This solution requires a custom plugin on the JavaScript side and a custom package and app on the Django side.

Quite a setup to just load a bunch of correct files. I’m always skeptical when it comes to taking on new dependencies.

The django-webpack-loader solution is not native to webpack and it broke in 2018 when webpack changed some of its defaults. django-webpack-loader solution is rather fragile, it depends on reading webpack stats that are a byproduct of a webpack build (if configured).

While reading the Webpack caching documentation I came up with a simpler solution that doesn’t require any new dependencies.

Webpack & Django integration without any additional packages

Loading a bundle with a hash in a name is a common problem, not specific to Django. And webpack now has a native solution to solve it. It does that by generating a html file that will load your bundle.

A html file, just like the Django templates…

Can webpack generate a Django template? Of course it can! If you don’t need anything Django specific then you don’t even have to customize it, but if you do, it’s easy to specify a “template” file for webpack to use.

Here is how it can be set up:

const path = require('path');
const HtmlWebpackPlugin = require('html-webpack-plugin');

module.exports = {
    plugins: [
        new HtmlWebpackPlugin({
            // This is the template webpack will use to generate html file.
            template: 'templates/index.tmpl.html',
            // This is where the generated file will end up (relative to the `static` directory).
            filename: '../templates/index.webpack.html',
        })
    ],
    entry: './assets/index.js', // Path to our input file.
    output: {
        filename: '[name].[contenthash].index-bundle.js', // Output bundle file name.
        path: path.resolve(__dirname, './static'), // Path to our Django `static` directory.
    },
    // Code Splitting and bundle optimizations.
    optimization: {
        usedExports: true,
        moduleIds: 'deterministic',
        runtimeChunk: 'single',
        splitChunks: {
            cacheGroups: {
                vendor: {
                    test: /[\\/]node_modules[\\/]/,
                    name: 'vendors',
                    chunks: 'all',
                },
            },
        },
    },
    // Other webpack configuration...
};

Contents of templates/index.tmpl.html

{% load static %}
<!DOCTYPE html>
<html lang="en">

<head>
    <title>My page</title>

    <!-- Webpack imports with auto generated bundle names that avoid caching if data changes. -->

    <!-- My other things... -->

    {% block more_head %}
    {% endblock %}
</head>

<body>
    {% block content %}
    {% endblock %}

</body>

</html>

Now when webpack builds your JavaScript assets it will place its outputs in the /static directory and also create an additional file in your templates directory. In the case of above example it will generate /templates/index.webpack.html that will load relevant bundle entry points (one or more).

This solution might look a bit complicated, but it’s actually pretty straightforward, it just has two build steps and webpack creates files in two directories, in /static directory and in /templates .

Combine Django and webpack file caching management

You might have figured out that if both webpack and Django generate content hashes, you might end up with a lot of hashes and that might break lazy module imports in your JavaScript app.

The easiest way to circumvent that is to either stop using the Django solution, or to make it ignore the webpack generated files.

Probably the simplest way is to change the way you call collectstatic . As in the webpack configuration in an example above all the files generated by webpack have index-bundle.js suffix:

python manage.py collectstatic --ignore *.index-bundle.js

Conclusion & resources

So now we have multiple solutions for always loading the fresh version of the file when it changes. And all the files that don’t change can now be cached for a very long time speeding up our site ✨.

I hope you enjoyed this article, please follow me on twitter where I talk more about tech, python, Django and building software!

If you are hungry for more, here is a list of resources I refer to in this article you can read as well:

This is my mathjax support partial