In this day and age, you don’t usually need to think about saving memory. Need a whole number? Just use an int! Who really cares if the maximum value will never get above 255? Using a byte is just archaic!
Well, if you’re looking at optimizing network traffic or database usage, those bytes could add up. For example, say you have a number that you know (based on business requirements, of course) will never get larger than 255. Say, a day of the week (expressed as a number, 0 – 6). If you stored it as a 32-bit integer, you’re basically guaranteed 3 bytes of wasted space.
Now consider hundreds of thousands of records.
If you have 200,000 records x 3 bytes, now you’re talking about 600000 bytes, or ~600 KB. That could make a sizeable difference in an application that is relying on network traffic (after all, the typical maximum transmission unit is only 1500 bytes.
Pack it up
A day of the week (0 – 6) can really be expressed in 3 bits:
Day | Binary | Decimal |
Sunday | 000 | 0 |
Monday | 001 | 1 |
Tuesday | 010 | 2 |
Wednesday | 011 | 3 |
Thursday | 100 | 4 |
Friday | 101 | 5 |
Saturday | 110 | 6 |
Hot Damn! A byte has 8 bits, so we could even put TWO days into a single byte. That would double our storage density!
For simplicity, we can just let each day take up four bits so that the byte is neatly bisected (but you could apply the same principle to any division of the 8 bits).
Bit Shifting and Bitwise operations
This works by shifting the bits and doing some bitwise arithmetic to the byte in question.
>> shifts all of the bits to the right. Any new bits that are shifted in are 0s. In a little-endian systems (most Intel CPUs), this means the integer 5 is stored as 00000101. If you shift it 4 bits to the right (5 >> 4), you end up with 0 because the bits are pushed outside the bounds of the byte. Conversely, if you have the integer 80 (01010000) and shift it four bits to the right, you end up with 5 (00000101).
<< shifts all of the bits to the left. As above, except the bits move in the other direction. This means the integer 5 (00000101) shifted 4 bits to the left will give you 80 (01010000).
& performs a bitwise AND on the byte. Think of each bit as a true or false, and then remember your truth tables for an AND –
true | false | |
true | true | false |
false | false | false |
And then apply it to the bits of a byte. So the integer 87 (01010111) AND’d with the integer 15 (00001111) gets rid of all of the higher order bits and you end up with just 7 (00000111).
0 1 0 1 0 1 1 1
0 0 0 0 1 1 1 1 &
0 0 0 0 0 1 1 1
| performs a bitwise OR on the byte. Similar to the &, but any bit that is turned on in either byte will result in the bit being turned on in the end. So the integer 87 (01010111) and the integer 15 (00001111) will become 95 (01011111).
We can use these together to essentially split the byte into two separate parts.
By right shifting the byte by 4 bits, we can get the first four bits (the high numbers), and set them again by shifting the byte to the left by 4 bits.
By AND-ing the number by 15, we can get only the lower number bits (the first four will be ignored). Setting the lower bits is easy, since the number is already low.
I’ve found that I’ve had to do this more than once, so I figured I’d leave this here for myself to find later.
Michael West
Cool. Thanks for sharing. Should Saturday be 110?
Robert
Yep, you’re right! I’ll fix it