The Base3z encoding method encodes binary data as sequences of Basic Multilingual Plane (BMP) "private-use" code points, as defined in the Unicode® Standard. The encoded data can be contained within a stream of UTF-8 or UTF-16 code units and subsequently decoded to yield the original binary data. Each encoded datum includes type and encoding length information, enhancing parse and search operation performance. The type system includes elements for creating structured data, and supports application defined extensions.

How It Works

A block of 4096 code points from U+E000 to U+EFFF is used to encode the type tagged binary data. Every 12 bits of information is OR'd with the constant 0xE000 to construct one of these code points. The decode operation extracts the original data from each code point as the Boolean AND of the code point with the constant 0x0FFF. The Base3z encoding of unsigned integers is illustrated below:

Type-SizeType-TagValueCode Point Sequence
16-bitsC01234HEC01, E234
32-bits212345678HE212, E345, E678
64-bitsC4123456789ABCDEF0HEC41, E234, E567, E89A, EBCD, EEF0

This encoding method requires minimal processing for both encoding and decoding operations, and yields a 75% storage efficiency limit using UTF-16 code units. The UTF-16 encode and decode operation times are less than 1 nano-second per code point executing well-optimized C-code on a 2.5 GHz Intel® processor core.

Base3z Specification

The Base3z Encoding Specification can be viewed and downloaded as a PDF file: Base3z Encoding Model A.1.

Product Support

Encoding software source code and custom engineering services are available from bitLab customer support.

License Program

The Base3z license program offers models for both commercial and non-commercial uses of the technology.

Intel is a registered trademark of Intel Corporation.
Unicode is a registered trademark of Unicode, Inc.


Terms of Use Privacy Policy Trademarks
Copyright © 2011 bitLab, LLC. All rights reserved.