Multiple Escape Character Encoding

MECE improves encoding efficiency in comparison with quoted printable and percent encoding by using 5 escape characters instead of 1, and a base 52 character set instead of hexadecimal. 5 × 52 = 260, which is sufficient to represent 256 characters in unicode page zero. For evenly distributed binary data, it encodes to 1.758 of the original size in average, compared to 1.333 of base 64 encoding.
MECE is a generic encoding scheme and thus does not have a predefined set of reserved characters. The encoders decide which characters need to be escaped depend on intended usage, the decoder is universal and will pass on any unescaped characters unmodified. For example, only characters defined in RFC3986 section 2.2 need to be escaped when encoding a URL string. The example implementation at the bottom of this page encodes everything except alphanumerics, spaces and underscore.

Base 52 is defined as "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOP". MECE never use any non-alphanumeric characters, unlike base 64 where the last two characters have spawned multiple variants due to availibity issues in some systems.

When encoding, escaped characters are placed after base 52 data (reverse order of quoted printable). This avoids having to escape commonly occured sequences in first word of a sentence where first character is in uppercase followed by a lowercase character. If an escape character (one of "VWXYZ") in the original data is preceded by a base 52 character or one of "QRSTU", it must be escaped. Otherwise, it may be left intact to conserve space and preserve readability.
The higher order digits will always appear first in the encoded sequence. For example 63, which is 15 × 5 + 3, will be encoded as "fX". Characters above unicode page zero are represented with 4 symbol sequences. For example, 2919 encodes to "2cTZ". The third symbol consists of a base 5 character set of "QRSTU", avoiding overlaps with base 52 characters. Characters beyond BMP are encoded in surrogate pairs.
To further improve readability, 4 is added to the integer value of the escaped character before encoded, this maps the escape characters to itself plus a base 52 prefix. For example, "X" is mapped to "iX" instead of "hY" without adding 4.

When decoding, a buffer needs to hold up to 3 previous symbols before they are released to the decoded data stream. When an escape character is seen, the previous symbol is examined. If the previous character is neither base 52 nor "QRSTU", the previous and current character are treated as raw data. If the previous character is one of the base 52 characters, it is removed from the buffer and forms a 2 symbol sequence with the current character which represents a character under unicode page zero. If it is one of "QRSTU", it is a 4 symbol sequence which may represent a character in the unicode BMP, or it must be discarded if the first two symbols are not base 52 characters. Discarding an invalid 4 symbol sequence allows insertion of a soft line break when needed, presence signature (if the text has been encoded), and other implementation specific sidebands.

Estimated sizes of encoded data for each unicode range:
Unicode lower bound00*8010080010000
Unicode upper bound7f7fff7ffffff10ffff
UTF8 Quoted Printable1366912
MECE case sensitive122448
MECE case insensitive124448
UTF8 Base641⅓1⅓2⅔2⅔45⅓
UTF9 Base6433
* Reserved characters

Example Implementation

Case insensitive systems require a different scheme. All digits are used as escape characters while the alphabets form base 26 data. To map digits back to itself, 2 is added to the integer value before encoding. Non-ASCII characters in BMP are represented by 4 symbol sequences where the third symbol is a base 13 digit between N and Z.