How Computers Store Data: Binary, Bits, and Bytes

Why This Matters (Everything Is Just Numbers)

Here’s something that blew my mind when I first learned it: every single thing your computer does—displaying this text, playing videos, running your code—is built on top of billions of tiny switches that can only be “on” or “off.” That’s it. No magic. Just on and off.

Understanding how computers represent data isn’t just trivia. It’s the key to understanding why an integer can overflow, why floating-point math is weird, why Unicode exists, and why you can’t just “add more memory” to fix every performance problem. Once you get this, entire categories of bugs suddenly make sense.

Binary: The Language of Computers

Computers don’t understand decimal numbers (0-9) like we do. They only understand two states: on or off, which we represent as 1 and 0.

This is called binary (base-2), as opposed to our normal decimal (base-10) system.

How Decimal Works (A Quick Refresher)

In decimal, each position represents a power of 10:

1   1   2   5
2  10² 10¹ 10⁰
3 (100)(10)(1)
4
5125 = (1 × 100) + (2 × 10) + (5 × 1)

How Binary Works

In binary, each position represents a power of 2:

1   1   0   1   1
2  2³  2²  2¹  2⁰
3  (8) (4) (2) (1)
4
51011 = (1 × 8) + (0 × 4) + (1 × 2) + (1 × 1) = 11 in decimal

Examples:

 1Binary  Decimal
 20       0
 31       1
 410      2
 511      3
 6100     4
 7101     5
 8110     6
 9111     7
101000    8
111111    15

Why Binary?

Because it’s easy to build physical switches that have two states:

  • Electrical charge present or absent
  • Magnetic field pointing north or south
  • Light on or off (fiber optics)
  • Voltage high or low

Trying to build a switch with 10 different states would be way harder and less reliable.

Bits and Bytes: The Building Blocks

A bit (binary digit) is a single 0 or 1. It’s the smallest unit of data a computer can work with.

A byte is 8 bits grouped together. This is the standard unit for measuring data.

What Can You Store in One Byte?

With 8 bits, you can represent 2⁸ = 256 different values (0-255).

1One byte (8 bits):
200000000 = 0
300000001 = 1
400000010 = 2
5...
611111111 = 255

Common byte-based units:

  • 1 kilobyte (KB) = 1,024 bytes
  • 1 megabyte (MB) = 1,024 KB = 1,048,576 bytes
  • 1 gigabyte (GB) = 1,024 MB = ~1 billion bytes
  • 1 terabyte (TB) = 1,024 GB = ~1 trillion bytes

(Yes, it’s 1,024 not 1,000 because computers work in powers of 2: 2¹⁰ = 1,024)

How Numbers Are Stored

Integers (Whole Numbers)

The simplest way to store a number is just to convert it to binary.

Example: The number 42

142 in decimal = 00101010 in binary
2
3Breaking it down:
432 + 8 + 2 = 42
5(2⁵ + 2³ + 2¹)

Signed Integers (Positive and Negative)

How do you store negative numbers when you only have 1s and 0s?

Computers use a system called two’s complement. The leftmost bit indicates the sign:

  • If the first bit is 0, the number is positive
  • If the first bit is 1, the number is negative

Example with 8-bit signed integers:

1 01111111 = +127 (largest positive)
2 00000001 = +1
3 00000000 = 0
4 11111111 = -1
5 10000000 = -128 (largest negative)

Integer Overflow: Why Numbers Wrap Around

Here’s where it gets interesting. If you use an 8-bit signed integer, the range is -128 to +127. What happens if you add 1 to 127?

1127 in binary:  01111111
2  + 1:         +00000001
3              ----------
4Result:         10000000 = -128 (!!)

It wraps around! This is called integer overflow, and it’s a real bug in production code.

In Java:

1byte smallNum = 127;
2smallNum++;  // Now it's -128 (overflow!)
3System.out.println(smallNum);  // Prints: -128

In Python:

 1# Python integers have unlimited precision, so they don't overflow
 2big_num = 127
 3big_num += 1
 4print(big_num)  # Prints: 128 (no problem)
 5
 6# But if you use NumPy with fixed-size integers:
 7import numpy as np
 8small_num = np.int8(127)
 9small_num += 1
10print(small_num)  # Prints: -128 (overflow!)

Floating-Point Numbers (Decimals)

Decimal numbers like 3.14 or -0.001 are stored using a format called IEEE 754 floating-point.

It works like scientific notation:

13.14 = 314 × 10⁻²
2     = (mantissa) × (base) ^ (exponent)

In binary:

1A float is stored in three parts:
2- Sign bit (1 bit): positive or negative
3- Exponent (8 bits for 32-bit float): the power
4- Mantissa (23 bits): the significant digits

Floating-Point Weirdness

Because of how floats are stored, some decimal numbers can’t be represented exactly in binary.

The classic example:

1# Python
2print(0.1 + 0.2)  # Prints: 0.30000000000000004 (not exactly 0.3!)
1// Java
2System.out.println(0.1 + 0.2);  // Prints: 0.30000000000000004

This isn’t a bug—it’s how floating-point math works. Never compare floats for exact equality.

The right way:

1# Python - check if close enough
2def almost_equal(a, b, tolerance=1e-9):
3    return abs(a - b) < tolerance
4
5print(almost_equal(0.1 + 0.2, 0.3))  # True
1// Java - use a tolerance
2double result = 0.1 + 0.2;
3double expected = 0.3;
4boolean close = Math.abs(result - expected) < 0.0001;
5System.out.println(close);  // true

How Text Is Stored

Computers don’t understand letters. They only understand numbers. So how do we store text?

ASCII: The Original Character Encoding

ASCII (American Standard Code for Information Interchange) assigns a number to each character.

1Character  ASCII Code (decimal)  Binary
2'A'        65                    01000001
3'B'        66                    01000010
4'a'        97                    01100001
5'0'        48                    00110000
6' '        32                    00100000 (space)
7'\n'       10                    00001010 (newline)

ASCII uses 7 bits (0-127), fitting nicely in one byte.

Example: The word “Hi”

1'H' = 72 = 01001000
2'i' = 105 = 01101001
3
4"Hi" in binary: 01001000 01101001

In Python:

1# Get ASCII code of a character
2print(ord('A'))  # Prints: 65
3print(ord('a'))  # Prints: 97
4
5# Convert ASCII code back to character
6print(chr(65))  # Prints: 'A'
7print(chr(97))  # Prints: 'a'

In Java:

1// Get ASCII code of a character
2char letter = 'A';
3int code = (int) letter;
4System.out.println(code);  // Prints: 65
5
6// Convert ASCII code back to character
7char fromCode = (char) 65;
8System.out.println(fromCode);  // Prints: 'A'

The Problem with ASCII

ASCII only has 128 characters. What about:

  • Accented letters (é, ñ, ü)?
  • Non-Latin alphabets (Greek, Arabic, Chinese)?
  • Emojis (😀, 🎉)?

ASCII can’t handle any of this.

Unicode: The Universal Character Set

Unicode assigns a unique number (called a code point) to every character in every writing system in the world.

1Character  Unicode Code Point
2'A'        U+0041
3'é'        U+00E9
4'中'        U+4E2D (Chinese)
5'😀'        U+1F600 (emoji)

Unicode has over 140,000 characters and counting.

UTF-8: How Unicode Is Stored

UTF-8 is a way to store Unicode characters efficiently. It uses:

  • 1 byte for ASCII characters (0-127)
  • 2 bytes for common European characters
  • 3 bytes for most other characters (including Chinese, Japanese)
  • 4 bytes for emojis and rare characters

This is why an emoji takes up more space than a regular letter!

Example: Storing “Café”

1'C' = U+0043 = 1 byte:  01000011
2'a' = U+0061 = 1 byte:  01100001
3'f' = U+0066 = 1 byte:  01100110
4'é' = U+00E9 = 2 bytes: 11000011 10101001
5
6Total: 5 bytes

In Python:

1# Python 3 strings are Unicode by default
2text = "Café"
3print(len(text))  # 4 characters
4print(len(text.encode('utf-8')))  # 5 bytes

In Java:

1// Java strings are Unicode
2String text = "Café";
3System.out.println(text.length());  // 4 characters
4System.out.println(text.getBytes("UTF-8").length);  // 5 bytes

How Everything Else Is Stored

Booleans

A boolean (true/false) only needs 1 bit, but most languages store it as 1 byte for efficiency.

1true  = 00000001 (1)
2false = 00000000 (0)

Images

Images are grids of pixels. Each pixel’s color is stored as three numbers (red, green, blue), each 0-255.

1One pixel in RGB:
2Red:   00000000 (0)   = black
3Green: 00000000 (0)
4Blue:  00000000 (0)
5
6Red:   11111111 (255) = white
7Green: 11111111 (255)
8Blue:  11111111 (255)

A 1920×1080 image has over 2 million pixels. Each pixel is 3 bytes. So one uncompressed image is ~6 MB!

Audio

Audio is stored as thousands of samples per second. Each sample is a number representing the sound wave’s amplitude at that moment.

CD-quality audio: 44,100 samples per second, 16 bits per sample = ~88 KB per second of stereo audio.

Video

Video is just images (frames) displayed rapidly plus audio. A 1080p video at 30 frames per second is basically 30 images per second plus audio.

This is why video files are huge!

Real-World War Story

Let me tell you about a bug that cost our team a week of debugging. We were building an e-commerce site, and the product prices kept getting corrupted.

We stored prices as floats:

1price = 19.99
2tax = 0.07
3total = price * (1 + tax)  # Should be 21.3893

But when we rounded to two decimal places, we’d sometimes get weird values like $21.38 or $21.40 instead of $21.39.

The problem? Floating-point precision. The number 19.99 can’t be represented exactly in binary, so tiny rounding errors accumulated.

The fix? Store money as integers (cents, not dollars):

1price_cents = 1999  # $19.99 = 1999 cents
2tax_rate = 7  # 7%
3total_cents = price_cents * (100 + tax_rate) // 100  # 2139 cents
4
5dollars = total_cents // 100  # 21
6cents = total_cents % 100     # 39
7print(f"${dollars}.{cents:02d}")  # $21.39 (perfect!)

The lesson: Never use floats for money. Use integers (smallest currency unit) or decimal libraries.

What You Need to Remember

Here’s what I wish someone had told me when I was learning this:

  1. Everything is binary - computers only understand 1s and 0s
  2. One byte = 8 bits - can store 0-255
  3. Integers have fixed ranges - they can overflow
  4. Never compare floats for equality - use a tolerance
  5. Floats can’t represent all decimals exactly - use integers for money
  6. ASCII is 7-bit, Unicode is universal - UTF-8 uses 1-4 bytes per character
  7. Text, images, audio, video are all just numbers - different interpretations of the same 1s and 0s

How This Helps Your Career

Understanding data representation shows up everywhere:

  • Debugging weird behavior - “Why did my counter go negative?” (integer overflow)
  • Performance optimization - “Why is this JSON so huge?” (Unicode vs ASCII encoding)
  • Data processing - Working with binary file formats, network protocols
  • Database design - Choosing the right data types (INT vs BIGINT, FLOAT vs DECIMAL)
  • Security - Understanding buffer overflows and injection attacks

Six months from now, when you’re debugging why a price calculation is off by a penny, you’ll immediately think “floating-point precision issue” and know to use integers or decimal types. When your counter mysteriously becomes negative, you’ll recognize integer overflow.

Trust me, this foundational knowledge pays off. Every language, every platform, every database—they all work with the same binary fundamentals. Learn it once, use it forever.

Now go forth and think in binary!