back to homepage

Base64: Encoding Binary Data

Base64 is a binary-to-text encoding scheme that transforms 8-bit binary data, in chunks of 3, into 6-bit ASCII characters. This is useful for the transfer of data in environments that are restricted to ASCII, or to avoid accidentally triggering control characters. It is used to create Data URLs, which allow the embedding of media—such as images—or other binary assets into textual HTML, XML, and CSS files; Earlier forms of SMTP only supported 7-bit ASCII, and Base64 was used to transfer attachments.

Data encoded with Base64 experience around a 33 percent increase in size, about 2 percent more if line breaks occur every 76 characters, as enforced by MIME.

If the data being encoded does not fit neatly into chunks of 3, equal signs are commonly used to pad the output. The 64 characters used for Base64 are, in order from 0–63: A–Z, a–z, 0–9, then two symbols that vary. RFC 4648 specifies those two symbols as the plus sign, then the forward slash.

Here is my implementation of Base64 in C:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

const int WRAP_AT = 76;
const char *B64_CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

void encode_block(char buf[3], char out[4]) {
    out[0] = B64_CHARS[buf[0] >> 2 & 0x3f];
    out[1] = B64_CHARS[(buf[0] & 0x03) << 4 | buf[1] >> 4 & 0x0f];
    out[2] = B64_CHARS[(buf[1] & 0x0f) << 2 | buf[2] >> 6 & 0x03];
    out[3] = B64_CHARS[buf[2] & 0x3f];
}

void putchar_wrap(char c, int *read) {
    if(*read >= WRAP_AT) {
        putchar('\n');
        *read = 0;
    }
    putchar(c);
    ++*read;
}

void encode_to_stdout(FILE *fp) {
    int read = 0;
    char buf[3], out[4];

    while(1) {
        /* read */
        for(int i=0; i<3; ++i) {
            int c;
            if((c=fgetc(fp)) == EOF) {
                if(ferror(fp)) {
                    perror("fgetc");
                    exit(EXIT_FAILURE);
                }

                /* last few bits: encode, pad, and finish. */
                if(i != 0) {
                    memset(buf+i, 0, 3-i);
                    encode_block(buf, out);

                    for(int j=0; j<4; ++j) {
                        putchar_wrap(j < i+1 ? out[j] : '=', &read);
                    }
                }
                putchar('\n');
                goto finish;
            }
            buf[i] = c;
        }

        /* encode */
        encode_block(buf, out);
        for(int i=0; i<4; ++i) {
            putchar_wrap(out[i], &read);
        }
    }
finish:
}

int main(int argc, char **argv) {
    if(argc == 1) {
        if(!freopen(NULL, "rb", stdin)) {
            perror("freopen");
            exit(EXIT_FAILURE);
        }
        encode_to_stdout(stdin);
    }

    for(int i=1; i<argc; ++i) {
        FILE *fp = fopen(argv[i], "rb");
        if(!fp) {
            perror("fopen");
            exit(EXIT_FAILURE);
        }
        encode_to_stdout(fp);
        fclose(fp);
    }
}

The script will encode any input fed to it from stdin. Any command line arguments will be read from and encoded. The script uses RFC 4648's Base64 alphabet, and produces line breaks every 76 characters.

For example, here's the output of my program's encoding of a compressed, 256x256 WebP logo of my website:

$ ./a.out favicon.webp 
UklGRpgKAABXRUJQVlA4WAoAAAASAAAA/wAA/wAAQU5JTQYAAAD/////AABBTk1GbAoAAAQAABYA
AOwAAKIAAAAAAAJBTFBI1QkAAAGwh23bObnZM7O2N2w33SK2bRu1tXVj2zanKWJMUkZHGzv7pdqN
WcRZWzPzfEh/7/M+7/v7zX/fERETAE84VU4AiQFONbeA7uujynVlBKOaJf+fRey4r06lKfHwxFfj
YmT4jnAc/MvD7M5hx1DtPTKy4bOjHSeymWT97JyT1CoCWPondkpauv+GS1rxBeecpFbhYLbxrd+a
vTOlkCz31+0zXm8eCwoG1uo30nHwbw/Fle/mvdO2HJh6B7KnQfGqmQQOsIC6ir6K4rt8vRa/gyh+
JgS8ljUofjEavJbhKH4nAbyWbi6xnHqg1cjyiYmJ0VFR1qBGNgqXdAYd+jzR5cPFey6lF+A/e9Kv
nNr9xYSXW1QwsdjrKOx5FZQP6Dj9QC7KzDq86LXavmYUeArFh4LiUW/uykeWRcmrkxoGmIttI4rP
A6V9ev9QgqxLf/tiRJ+qfmYxAcW32lUKH3ML1Sy7tucZMxjgFjsUAOrGTslAhRubQIN8FE6NBGXL
LcpHpavpr8ItFL5ZHlS1v5aOilfUXlAyCqdXBVUbnkXasutHtzlWrXY4tnx37MJ9SWG6s21D4cIW
oGjIchcS5nz7YaNAMB5cu9/UH7Op3HbdzUJhVz9QtNp5FC/d2dcfiO3NZpwnyQHNPe8R8rwNir6Q
h8L5Cx8DuY3Wlojd1lyLYhSeCGoGrEBh9xcVQX7CHqFLeqvyAIUdoGbYYRS+3AJY2k6KJGstLBWF
d/mqEZ2MwquDgOlCkQM6s/+AwmdCQMnyqSha+BqwfUnkW50tQeGrcaBkwjUUzWoNfKuJrNfYWyh8
...

The ellipsis above isn't output from my program, but just to signify three more pages worth of gibberish. That gibberish can be appended to a Data URL. Data URL's begin with data:, then a MIME type indicating the type of data, then an optional ;base64 to hint that the data is in Base64, and is concluded by a comma and the data.

For example, this is the Data URL of my logo: ....

It's in WebP format, so it has a mime type of image/webp. The data is the output of my program, minus the newlines, so the optional Base64 token is there.

Try clicking on the link, in most browsers you should see my logo, a big K followed by a small C. Originally I encoded the SVG version of my logo, constructed it into a data URL, and used that as a demonstration, but it turns out Mozilla Firefox (and Chrome also, I think) prevents the opening of any Data URL with mime type image/svg+xml because of security issues.