The [fine manual][1] has this to say:
> **encode64(bin)**
> Returns the Base64-encoded version of bin. This method complies with RFC 2045.
Section 6.8 of [RFC 2045][2] says:
> **6.8. Base64 Content-Transfer-Encoding**
>
> The Base64 Content-Transfer-Encoding is designed to represent arbitrary sequences of octets in a form that need not be humanly readable. [...]
>
> A 65-character subset of US-ASCII is used, enabling 6 bits to be represented per printable character. (The extra 65th character, "=", is used to signify a special processing function.)
So Base64 encodes *bytes* into ASCII. If those bytes actually represent a UTF-8 encoded string then the UTF-8 string will be broken down into individual bytes and those bytes will be converted to Base64; for example, if you have a UTF-8 string `'µ'` then you'll end up encoding the bytes `0xc2` and `0xb5` (in that order) to the Base64 representation `"wrU=\n"`. If you start out with a binary string `"\xc2\xb5"` (which just happens to match the UTF-8 version of `'µ'`) then you'll get the same `"wrU=\n"` output.
When you decode `"wrU=\n"`, you'll get the bytes `"\xc2\xb5"` and you'll have to know that those bytes are supposed to be UTF-8 encoded text rather than some arbitrary blob of bits. This is why you have separate content type and character set meta data attached to the Base64.
Similarly, if you have a UTF-16 string then it will be broken into bytes and those bytes will be encoded just like any other byte string. Of course this case is a little more complicated due to byte order issues but that's why we have content type and character set headers and BOMs.
The main point is that Base64 works with *bytes*, not characters. What format (UTF-8 text, UTF-16 text, a PNG image, ...) is someone else's problem. Base64 just converts a byte stream to a subset of US ASCII and then back to bytes; the format of those bytes must be specified separately.
---
I did some poking around in the source and the results might be of interest even if they're not completely relevant. The [`encode64` method][3] is simply this:
def encode64(bin)
[bin].pack("m")
end
Then if you look through [`Array#pack`][4]:
static VALUE
pack_pack(VALUE ary, VALUE fmt)
{
/*...*/
int enc_info = 1; /* 0 - BINARY, 1 - US-ASCII, 2 - UTF-8 */
and keep an eye on `enc_info`, you'll see that a `'m'` format will leave `enc_info` alone so the packed string will come out as US-ASCII and so `encode64` will produce US ASCII output as expected.
[1]:
[To see links please register here]
[2]:
[To see links please register here]
[3]:
[To see links please register here]
[4]:
[To see links please register here]