How Computers Store Data: Binary, Bits, and Bytes
Why This Matters (Everything Is Just Numbers)
Here’s something that blew my mind when I first learned it: every single thing your computer does—displaying this text, playing videos, running your code—is built on top of billions of tiny switches that can only be “on” or “off.” That’s it. No magic. Just on and off.
Understanding how computers represent data isn’t just trivia. It’s the key to understanding why an integer can overflow, why floating-point math is weird, why Unicode exists, and why you can’t just “add more memory” to fix every performance problem. Once you get this, entire categories of bugs suddenly make sense.
Binary: The Language of Computers
Computers don’t understand decimal numbers (0-9) like we do. They only understand two states: on or off, which we represent as 1 and 0.
This is called binary (base-2), as opposed to our normal decimal (base-10) system.
How Decimal Works (A Quick Refresher)
In decimal, each position represents a power of 10:
1 1 2 5
2 10² 10¹ 10⁰
3 (100)(10)(1)
4
5125 = (1 × 100) + (2 × 10) + (5 × 1)How Binary Works
In binary, each position represents a power of 2:
1 1 0 1 1
2 2³ 2² 2¹ 2⁰
3 (8) (4) (2) (1)
4
51011 = (1 × 8) + (0 × 4) + (1 × 2) + (1 × 1) = 11 in decimalExamples:
1Binary Decimal
20 0
31 1
410 2
511 3
6100 4
7101 5
8110 6
9111 7
101000 8
111111 15Why Binary?
Because it’s easy to build physical switches that have two states:
- Electrical charge present or absent
- Magnetic field pointing north or south
- Light on or off (fiber optics)
- Voltage high or low
Trying to build a switch with 10 different states would be way harder and less reliable.
Bits and Bytes: The Building Blocks
A bit (binary digit) is a single 0 or 1. It’s the smallest unit of data a computer can work with.
A byte is 8 bits grouped together. This is the standard unit for measuring data.
What Can You Store in One Byte?
With 8 bits, you can represent 2⁸ = 256 different values (0-255).
1One byte (8 bits):
200000000 = 0
300000001 = 1
400000010 = 2
5...
611111111 = 255Common byte-based units:
- 1 kilobyte (KB) = 1,024 bytes
- 1 megabyte (MB) = 1,024 KB = 1,048,576 bytes
- 1 gigabyte (GB) = 1,024 MB = ~1 billion bytes
- 1 terabyte (TB) = 1,024 GB = ~1 trillion bytes
(Yes, it’s 1,024 not 1,000 because computers work in powers of 2: 2¹⁰ = 1,024)
How Numbers Are Stored
Integers (Whole Numbers)
The simplest way to store a number is just to convert it to binary.
Example: The number 42
142 in decimal = 00101010 in binary
2
3Breaking it down:
432 + 8 + 2 = 42
5(2⁵ + 2³ + 2¹)Signed Integers (Positive and Negative)
How do you store negative numbers when you only have 1s and 0s?
Computers use a system called two’s complement. The leftmost bit indicates the sign:
- If the first bit is 0, the number is positive
- If the first bit is 1, the number is negative
Example with 8-bit signed integers:
1 01111111 = +127 (largest positive)
2 00000001 = +1
3 00000000 = 0
4 11111111 = -1
5 10000000 = -128 (largest negative)Integer Overflow: Why Numbers Wrap Around
Here’s where it gets interesting. If you use an 8-bit signed integer, the range is -128 to +127. What happens if you add 1 to 127?
1127 in binary: 01111111
2 + 1: +00000001
3 ----------
4Result: 10000000 = -128 (!!)It wraps around! This is called integer overflow, and it’s a real bug in production code.
In Java:
1byte smallNum = 127;
2smallNum++; // Now it's -128 (overflow!)
3System.out.println(smallNum); // Prints: -128In Python:
1# Python integers have unlimited precision, so they don't overflow
2big_num = 127
3big_num += 1
4print(big_num) # Prints: 128 (no problem)
5
6# But if you use NumPy with fixed-size integers:
7import numpy as np
8small_num = np.int8(127)
9small_num += 1
10print(small_num) # Prints: -128 (overflow!)Floating-Point Numbers (Decimals)
Decimal numbers like 3.14 or -0.001 are stored using a format called IEEE 754 floating-point.
It works like scientific notation:
13.14 = 314 × 10⁻²
2 = (mantissa) × (base) ^ (exponent)In binary:
1A float is stored in three parts:
2- Sign bit (1 bit): positive or negative
3- Exponent (8 bits for 32-bit float): the power
4- Mantissa (23 bits): the significant digitsFloating-Point Weirdness
Because of how floats are stored, some decimal numbers can’t be represented exactly in binary.
The classic example:
1# Python
2print(0.1 + 0.2) # Prints: 0.30000000000000004 (not exactly 0.3!)1// Java
2System.out.println(0.1 + 0.2); // Prints: 0.30000000000000004This isn’t a bug—it’s how floating-point math works. Never compare floats for exact equality.
The right way:
1# Python - check if close enough
2def almost_equal(a, b, tolerance=1e-9):
3 return abs(a - b) < tolerance
4
5print(almost_equal(0.1 + 0.2, 0.3)) # True1// Java - use a tolerance
2double result = 0.1 + 0.2;
3double expected = 0.3;
4boolean close = Math.abs(result - expected) < 0.0001;
5System.out.println(close); // trueHow Text Is Stored
Computers don’t understand letters. They only understand numbers. So how do we store text?
ASCII: The Original Character Encoding
ASCII (American Standard Code for Information Interchange) assigns a number to each character.
1Character ASCII Code (decimal) Binary
2'A' 65 01000001
3'B' 66 01000010
4'a' 97 01100001
5'0' 48 00110000
6' ' 32 00100000 (space)
7'\n' 10 00001010 (newline)ASCII uses 7 bits (0-127), fitting nicely in one byte.
Example: The word “Hi”
1'H' = 72 = 01001000
2'i' = 105 = 01101001
3
4"Hi" in binary: 01001000 01101001In Python:
1# Get ASCII code of a character
2print(ord('A')) # Prints: 65
3print(ord('a')) # Prints: 97
4
5# Convert ASCII code back to character
6print(chr(65)) # Prints: 'A'
7print(chr(97)) # Prints: 'a'In Java:
1// Get ASCII code of a character
2char letter = 'A';
3int code = (int) letter;
4System.out.println(code); // Prints: 65
5
6// Convert ASCII code back to character
7char fromCode = (char) 65;
8System.out.println(fromCode); // Prints: 'A'The Problem with ASCII
ASCII only has 128 characters. What about:
- Accented letters (é, ñ, ü)?
- Non-Latin alphabets (Greek, Arabic, Chinese)?
- Emojis (😀, 🎉)?
ASCII can’t handle any of this.
Unicode: The Universal Character Set
Unicode assigns a unique number (called a code point) to every character in every writing system in the world.
1Character Unicode Code Point
2'A' U+0041
3'é' U+00E9
4'中' U+4E2D (Chinese)
5'😀' U+1F600 (emoji)Unicode has over 140,000 characters and counting.
UTF-8: How Unicode Is Stored
UTF-8 is a way to store Unicode characters efficiently. It uses:
- 1 byte for ASCII characters (0-127)
- 2 bytes for common European characters
- 3 bytes for most other characters (including Chinese, Japanese)
- 4 bytes for emojis and rare characters
This is why an emoji takes up more space than a regular letter!
Example: Storing “Café”
1'C' = U+0043 = 1 byte: 01000011
2'a' = U+0061 = 1 byte: 01100001
3'f' = U+0066 = 1 byte: 01100110
4'é' = U+00E9 = 2 bytes: 11000011 10101001
5
6Total: 5 bytesIn Python:
1# Python 3 strings are Unicode by default
2text = "Café"
3print(len(text)) # 4 characters
4print(len(text.encode('utf-8'))) # 5 bytesIn Java:
1// Java strings are Unicode
2String text = "Café";
3System.out.println(text.length()); // 4 characters
4System.out.println(text.getBytes("UTF-8").length); // 5 bytesHow Everything Else Is Stored
Booleans
A boolean (true/false) only needs 1 bit, but most languages store it as 1 byte for efficiency.
1true = 00000001 (1)
2false = 00000000 (0)Images
Images are grids of pixels. Each pixel’s color is stored as three numbers (red, green, blue), each 0-255.
1One pixel in RGB:
2Red: 00000000 (0) = black
3Green: 00000000 (0)
4Blue: 00000000 (0)
5
6Red: 11111111 (255) = white
7Green: 11111111 (255)
8Blue: 11111111 (255)A 1920×1080 image has over 2 million pixels. Each pixel is 3 bytes. So one uncompressed image is ~6 MB!
Audio
Audio is stored as thousands of samples per second. Each sample is a number representing the sound wave’s amplitude at that moment.
CD-quality audio: 44,100 samples per second, 16 bits per sample = ~88 KB per second of stereo audio.
Video
Video is just images (frames) displayed rapidly plus audio. A 1080p video at 30 frames per second is basically 30 images per second plus audio.
This is why video files are huge!
Real-World War Story
Let me tell you about a bug that cost our team a week of debugging. We were building an e-commerce site, and the product prices kept getting corrupted.
We stored prices as floats:
1price = 19.99
2tax = 0.07
3total = price * (1 + tax) # Should be 21.3893But when we rounded to two decimal places, we’d sometimes get weird values like $21.38 or $21.40 instead of $21.39.
The problem? Floating-point precision. The number 19.99 can’t be represented exactly in binary, so tiny rounding errors accumulated.
The fix? Store money as integers (cents, not dollars):
1price_cents = 1999 # $19.99 = 1999 cents
2tax_rate = 7 # 7%
3total_cents = price_cents * (100 + tax_rate) // 100 # 2139 cents
4
5dollars = total_cents // 100 # 21
6cents = total_cents % 100 # 39
7print(f"${dollars}.{cents:02d}") # $21.39 (perfect!)The lesson: Never use floats for money. Use integers (smallest currency unit) or decimal libraries.
What You Need to Remember
Here’s what I wish someone had told me when I was learning this:
- Everything is binary - computers only understand 1s and 0s
- One byte = 8 bits - can store 0-255
- Integers have fixed ranges - they can overflow
- Never compare floats for equality - use a tolerance
- Floats can’t represent all decimals exactly - use integers for money
- ASCII is 7-bit, Unicode is universal - UTF-8 uses 1-4 bytes per character
- Text, images, audio, video are all just numbers - different interpretations of the same 1s and 0s
How This Helps Your Career
Understanding data representation shows up everywhere:
- Debugging weird behavior - “Why did my counter go negative?” (integer overflow)
- Performance optimization - “Why is this JSON so huge?” (Unicode vs ASCII encoding)
- Data processing - Working with binary file formats, network protocols
- Database design - Choosing the right data types (INT vs BIGINT, FLOAT vs DECIMAL)
- Security - Understanding buffer overflows and injection attacks
Six months from now, when you’re debugging why a price calculation is off by a penny, you’ll immediately think “floating-point precision issue” and know to use integers or decimal types. When your counter mysteriously becomes negative, you’ll recognize integer overflow.
Trust me, this foundational knowledge pays off. Every language, every platform, every database—they all work with the same binary fundamentals. Learn it once, use it forever.
Now go forth and think in binary!