Understanding “disk space math”

Posted by Pat Kelly on November 11, 2019

Everything in a PC, laptop, or server is represented as binary digits (a.k.a. bits, where each bit can only be 1 or 0). There are no characters like we use for writing or numbers as we write them anywhere in a computer’s memory or secondary storage such as disk drives. For general purposes, the unit of measure for groups of binary bits is the byte — eight bits. Bytes are an agreed-upon measure that helped standardize computer memory, storage, and how computers handled data.

There are various terms in use to specify the capacity of a disk drive (either magnetic or electronic). The same measures are applied to a computers random access memory (RAM) and other memory devices that inhabit your computer. So now let’s see how the numbers are made up.

Prefixes are used with the number that specifies the capacity of the device. The prefixes designate a multiplier that is to be applied to the number that preceded the prefix. Commonly used prefixes are:

Kilo = 10³ = 1,000 (one thousand)
Mega = 10⁶ = 1,000,000 (one million)
Giga = 10⁹ = 1000,000,000 (one billion)
Tera = 10¹² = 1,000,000,000,000 (one trillion)

As an example 500 GB (gigabytes) is 500,000,000,000 bytes.

The units that memory and storage are specified in advertisements, on boxes in the store, and so on are in the decimal system as shown above. However since computers only use binary bits, the actual capacity of these devices is different than the advertised capacity.

You saw that the decimal numbers above were shown with their equivalent powers of ten. In the binary system numbers can be represented as powers of two. The table below shows how bits are used to represent powers of two in an 8 bit Byte. At the bottom of the table there is an example of how the decimal number 109 can be represented as a binary number that can be held in a single byte of 8 bits (01101101).

Eight bit binary number
	Bit 7	Bit 6	Bit 5	Bit 4	Bit 3	Bit 2	Bit 1	Bit 0
Power of 2	2⁷	2⁶	2⁵	2⁴	2³	2²	2¹	2⁰
Decimal Value	128	64	32	16	8	4	2	1
Example Number	0	1	1	0	1	1	0	1

The example bit values comprise the binary number 01101101. To get the equivalent decimal value just add the decimal values from the table where the bit is set to 1. That is 64 + 32 + 8 + 4 + 1 = 109.

By the time you get out to 2³⁰you have decimal 1,073,741,824 with just 31 bits (don’t forget the 2⁰) You’ve got a large enough number to start specifying memory and storage sizes.

Now comes what you have been waiting for. The table below lists common designations as they are used for labeling decimal and binary values.

Decimal

Binary

KB (Kilobyte)

1KB = 1,000 bytes

KiB (Kibibyte)

1KiB = 1,024 bytes

MB (Megabyte)

1MB = 1,000,000 bytes

MiB (Mebibyte)

1MiB = 1,048,576 bytes

GB (Gigabyte)

1GB = 1,000,000,000 bytes

GiB (Gibibyte)

1 GiB (Gibibyte) = 1,073,741,824 bytes

TB (Terabyte)

1TB = 1,000,000,000,000

TiB (Tebibyte)

1TiB = 1,099,511,627,776 bytes

Note that all of the quantities of bytes in the table above are expressed as decimal numbers. They are not shown as binary numbers because those numbers would be more than 30 characters long.

Most users and programmers need not be concerned with the small differences between the binary and decimal storage size numbers. If you’re developing software or hardware that deals with data at the binary level you may need the binary numbers.

As for what this means to your PC: Your PC will make use of the full capacity of your storage and memory devices. If you want to see the capacity of your disk drives, thumb drives, etc, the Disks utility in Fedora will show you the actual capacity of the storage device in number of bytes as a decimal number.

There are also command line tools that can provide you with more flexibility in seeing how your storage bytes are being used. Two such command line tools are du (for files and directories) and df (for file systems). You can read about these by typing man du or man df at the command line in a terminal window.

Photo by Franck V. on Unsplash.

Fedora Project community

Pat Kelly

I am an 8 year user of Fedora and support a small group of users. For about the last two years, I have been helping in the Fedora Quality group.

31 Comments

Kees de Jong

Thanks for this article. A lot of people are confused about this basic computer science understanding. For example Redis, their notation is not the standard: https://github.com/antirez/redis/blob/unstable/redis.conf

And many howto’s, manual pages and whatnot. I guess the problem stems from the fact that Americans don’t use the metric system. Anything with kilo, mega, etc. are powers of 10, not of 2. Weirdly I also see people that should know the metric system write a KB as 1024… 🙂

November 11, 2019
- t0w0i7ne
  
  Some people learned their computer science either before the existence of the KiB style units, or from books and other materials that predate them. The confusion actually caused the adoption of the difference between KB and KiB as SI unit standards.
  
  November 12, 2019
  - cmurf
    
    Strictly speaking, K is Kelvin, k is kilo, so the SI unit is kB, and the IEC unit is KiB.
    
    November 12, 2019
atolstoy

Thanks for the article. But, I’m afraid the material is much too basic for the Fedora Magazine. We’re already tech-savvy!

November 11, 2019
- Paul W. Frields
  
  @atolstoy: According to feedback from other readers and forums, not everyone is. Glad you didn’t need the article, though.
  
  November 11, 2019
Peter

Sorry, but it is more like:
bit7 bit6 bit5 … bit0

Your explanation is not wrong but very misleading. Could you please correct this?

November 11, 2019
- Paul W. Frields
  
  @Peter: The table has been updated to be more correct.
  
  November 11, 2019
  - Pat Kelly
    
    I really hadn’t planned to change the table. Correct has to be qualified by use and architecture design choice.
    
    November 11, 2019
    - Peter
      
      Ponder this – How do you read this decimal number: 123?
      
      [ ] one hundred and twenty three
      [ ] three hundred and twenty one
      
      110^2 + 210 ^1 + 3*10^0
      
      Does this help?
      
      November 12, 2019
    - Jim Simmonds
      
      The decimal numbers are only what suppliers want to use to give you less than they should – they should all stick with binary so that everyone knows what they are getting namely 1024 bytes, megabytes, gigabytes, or even in times to come terabytes – all calculated in binary
      
      November 12, 2019
- Pat Kelly
  
  I considered covering Big-Endian and Little-Endian in this article but since this is a magazine article I didn’t want it to get too long. I used Little-Endian to see if I would get feed pack on this point. I’ve been thinking about proposing another article to cover this, the other larger data structures that bytes are used to form, and some of the uses. Thanks for your comment you have encouraged me to go ahead with that proposal.
  
  November 11, 2019
Joao Rodrigues

Even though the disk manufacturer may say that the disk has a capacity of 1,000,000,000,000 bytes (or 1 terabyte), it’s the raw capability of storage. It doesn’t mean you can store 1 terabyte of data on it.

Some of that space will be lost in partitioning, partition alignment and filesystem structure.

A very cool tool to analyze disk usage in gnome is baobab
https://wiki.gnome.org/action/show/Apps/DiskUsageAnalyzer

Also, in the short scale vs. long scale war:

Long scale users (mostly european) use the following nomenclature:
10^9 is a thousand millon or a milliard
10^12 is a billion
10^15 is a thousand billon or a billiard
10^18 is a trillion
10^21 is a thousand trillion or a trilliard

November 11, 2019
Jakfrost

Not that I want to pick nits, but there are 4 Bits in a Byte, two Bits in a nibble, two Bytes in a Word (8 Bit Word). A 16 Bit Word was originally termed a Double Word I think, and a 32 Bit word is a Long Word. You need at least 32 Bits in order to express a single precision floating point value in a PC. Binary is merely a Base 2 Number system (0..1 range), just as Decimal is a Base 10 number system (0…9 range).

November 11, 2019
- Paul W. Frields
  
  There are 8 bits in a byte, and 4 bits in a nibble — although the byte was never strictly defined, that’s been the length as long as I can remember. Words are ambiguous due to processor architecture differences, though many of the platforms have maintained a word at 16 bits, and a double word at 32 bits.[1]
  
  November 11, 2019
  - Stuart Gathman
    
    My favorite machine was the Dec-20 with 36-bit words. There were instructions to unpack words in bytes of different sizes. The system software used 7-bit bytes (ASCII) for strings. The compiler used 6-bit bytes internally for names in its symbol table. The favorite interpreted language was PPL – Polymorphic Programming Language. (I used it to generate a Sociology paper from a random transition network of buzzphrases that got a B+) I still daydream sometimes about how to implement the C language standard on such an architecture.
    
    I also used a CDC with 60-bit words, but it wasn’t as endearing for some reason.
    
    November 12, 2019
- AdamN
  
  Standardised bytes are 8 bits, though in the early years a byte has been known at 6 bits. A nibble has always been 4 bits. A word varies in bit length according to the natural size of the unit of data handled in a single operation of processing.
  
  November 13, 2019
Stuart Gathman

You left out the most important part. A KiB is 1024 bytes. 1024 is 2^10, which is close to 1000. So all the binary multipliers are multiples of 2^10. This approximate equivalence is handy for all sorts of estimations. 3 decimal digits ~= 10 binary bits. How many bits needed to count to 10 billion? 10*10^9 is approx 10 * 2^30, 4 bits are needed to count to 10, so 34 bits are needed. A MiB is 2^10 * 2^10.

Historically, 1024 bytes was called a KiloByte in the context of binary computers until a few decades ago, and 2^20 was called a MegaByte. Eventually, enough lay people were confused, exacerbated by deceptive marketing that used decimal in a binary computer context, that a standards committee was formed to come up with new terms for the binary prefixes.

As usual, the committee solution was hated by all. 2^10 bytes was now to be called a “KibiByte”, 2^20 bytes is a “MibiByte”, and worst of all, 2^30 bytes is a “GiBiByte”. Hence the new nomenclature was pronounced “GiBiRish”. Fortunately, the abbreviations were more acceptable.

2^10 bytes KiBiByte KiB
2^20 bytes MiBiByte MiB
2^30 bytes GiBiByte GiB
2^40 bytes TiBiByte TiB
2^50 bytes PiBiByte PiB

November 11, 2019
- Martin
  
  Thank you for this clarification.. As i was reading the article i was silently telling the author that his labeling was the other way around – I grew up with KB = 1024 🙂
  
  November 12, 2019
Stuart Gathman

Confusingly, many unix utilities in Fedora still use the old convention of Kilo = 1024 in a binary computer context. For instance, df uses K,M,G,T,P,E,Z,Y to mean powers of 1024. It then added KB,MB,… for powers of 1000 and KiB,MiB,… for powers of 1024.

The binary prefixes get more unpronounceable past PeBiByte.

2^60 bytes ExBiByte EiB ~ Exabyte
2^70 bytes ZeBiByte ZiB ~ Zettabyte
2^80 bytes YoBiByte YiB ~ Yottabyte

November 11, 2019
Ray McCaffity

One thing that helps, is newer versions of many utilities include the “h” option.

df -h
free -mh
ls -lh

h = human readable, it automatically converts the number length to kilobytes, megabytes, gigabytes or terrabytes for you.

November 11, 2019
Bartosz Lis

The prefix for 1000-fold is lowercase k not uppercase K (see articles “Metric prefix” and “Binary prefix” on wikipedia or consult any high school physics textbook). Thus 1kg = 1000g (one kilogram is one thousand grams), 1km=1000m (one kilometer is one thousand meters), 1kHz = 1000Hz (one kilo hertz is one thousand hertz), 1kb=1000b (one kilobit is one thousand bits), 1kB=1000B (one kilobyte is one thousand bytes) and so on. The prefix K was introduced long long ago to distinguish 1KB=2^10B=1024B from 1kB=10^3B=1000B. There were no distinction between higher powers of 1024 an 1000. 1MB was used as 10241024B in the context of RAM and 10001000B in the context of HDD. The difference was c.a. 5%. However the difference between 1024^4 and 1000^4 is c.a.10%, so it is significant. Thus, when sizes of terabyte (TB) came to use it appeared necessary to distinguish binary multipliers from decimal multipliers and introduce this distinction as scientific/industrial standard. Finally standardization organizations introduced binary prefixes (Ki, Mi, Gi, Ti,…) for powers of 1024.

The prefix K was used for 1024-fold and now should be replaced with Ki. The prefix K was never used for 1000-fold!

November 12, 2019
Leslie Satenstein

The Decimal system is has 1000 as the base, so that 1k is 1000.
The binary system uses 1024 as a base, so that 1k is 1024 in Decimal.

A 1 terabyte drive (decimal) is much less storage when it is expressed as 1 terabyte (binary) than what one expects. That is why we see storage described in decimal as opposed to base binary.

November 12, 2019
- Pat Kelly
  
  Leslie I’m glad you commented. I haven’t been on the Forum much lately so I haven’t seen you around. You should try this; I’m sure there are some topics you would like to write about.
  
  November 12, 2019
Tracy Baker

“Your PC will make use of the full capacity of your storage …” This statement is incorrect.

One cannot create an 11GiB logical volume from an 11GiB volume group. When using the lvcreate command, the -L 11G option will not work; -L 10G will. This creates a 10GiB logical volume. When -L 10G is used, 254 4MiB extents (1016MiB, assuming a 4MiB physical/logical extent size) are left unused.

Almost all of the space can be used using the -l (lowercase) option. That requires specifying extents and doing math. Most administrators don’t/won’t do the math.

Even so: if extents are used you’ll end up with a 10.99GiB logical volume, not 11GiB. The resulting logical volume will be short of full capacity by 2 extents, or 8MiB.

…and let’s not get into slack space.

November 12, 2019
- Pat Kelly
  
  Thanks Tracy. I could have definitely phrased that sentence better. I had thought about the complications of setting up volumes, partitioning, formatting, but I didn’t want this article to get too big. On the other hand, in the creation of volumes, partitions, formatting, etc The user is making use of the disk’s space. That use just results in space being used that is not available for executables, user data, etc. Though it certainly is true that in these processes some space it wasted and is of no use at all. One of the things I like about Fedora Magazine is that there are users such as yourself who are willing to fill in where the author of an article has missed thing.
  
  November 12, 2019
Yazan Al Monshed

Thanks for the basic, A lot of people find confusing to understand.

November 12, 2019
Felix Pojtinger

Loving the CS-themed articles on the magazine so far!

November 14, 2019
Andrej

I’m not sure if there is a special logic in this article, but what is meant by “suffix” should be “prefix” since it comes before a unit, not after it.

November 14, 2019
- Paul W. Frields
  
  Not sure how this was missed by the editors… -ENOTENOUGHCOFFEE perhaps? Fixed!
  
  November 15, 2019
  - Andrej
    
    Great. Tnx 🙂
    
    November 15, 2019
- Pat Kelly
  
  Right you are; it should have been prefix.
  
  November 15, 2019