Understanding “disk space math”

Everything in a PC, laptop, or server is represented as binary digits (a.k.a. bits, where each bit can only be 1 or 0). There are no characters like we use for writing or numbers as we write them anywhere in a computer’s memory or secondary storage such as disk drives. For general purposes, the unit of measure for groups of binary bits is the byte — eight bits. Bytes are an agreed-upon measure that helped standardize computer memory, storage, and how computers handled data.

There are various terms in use to specify the capacity of a disk drive (either magnetic or electronic). The same measures are applied to a computers random access memory (RAM) and other memory devices that inhabit your computer. So now let’s see how the numbers are made up.

Prefixes are used with the number that specifies the capacity of the device. The prefixes designate a multiplier that is to be applied to the number that preceded the prefix. Commonly used prefixes are:

  • Kilo = 103 = 1,000 (one thousand)
  • Mega = 106 = 1,000,000 (one million)
  • Giga = 109 = 1000,000,000 (one billion)
  • Tera = 1012 = 1,000,000,000,000 (one trillion)

As an example 500 GB (gigabytes) is 500,000,000,000 bytes.

The units that memory and storage are specified in  advertisements, on boxes in the store, and so on are in the decimal system as shown above. However since computers only use binary bits, the actual capacity of these devices is different than the advertised capacity.

You saw that the decimal numbers above were shown with their equivalent powers of ten. In the binary system numbers can be represented as powers of two. The table below shows how bits are used to represent powers of two in an 8 bit Byte. At the bottom of the table there is an example of how the decimal number 109 can be represented as a binary number that can be held in a single byte of 8 bits (01101101).

Eight bit binary number

 

Bit 7

Bit 6

Bit 5

Bit 4

Bit 3

Bit 2

Bit 1

Bit 0

Power of 2

27

26

25

24

23

22

21

20

Decimal Value

128

64

32

16

8

4

2

1

Example Number

0

1

1

0

1

1

0

1

The example bit values comprise the binary number 01101101. To get the equivalent decimal value just add the decimal values from the table where the bit is set to 1. That is 64 + 32 + 8 + 4 + 1 = 109.

By the time you get out to 230 you have decimal 1,073,741,824 with just 31 bits (don’t forget the 20) You’ve got a large enough number to start specifying memory and storage sizes.

Now comes what you have been waiting for. The table below lists common designations as they are used for labeling decimal and binary values.

Decimal

Binary

KB (Kilobyte)

1KB = 1,000 bytes

KiB (Kibibyte)

1KiB = 1,024 bytes

MB (Megabyte)

1MB = 1,000,000 bytes

MiB (Mebibyte)

1MiB = 1,048,576 bytes

GB (Gigabyte)

1GB = 1,000,000,000 bytes

GiB (Gibibyte)

1 GiB (Gibibyte) = 1,073,741,824 bytes

TB (Terabyte)

1TB = 1,000,000,000,000

TiB (Tebibyte)

1TiB = 1,099,511,627,776 bytes

Note that all of the quantities of bytes in the table above are expressed as decimal numbers. They are not shown as binary numbers because those numbers would be more than 30 characters long.

Most users and programmers need not be concerned with the small differences between the binary and decimal storage size numbers. If you’re developing software or hardware that deals with data at the binary level you may need the binary numbers.

As for what this means to your PC: Your PC will make use of the full capacity of your storage and memory devices. If you want to see the capacity of your disk drives, thumb drives, etc, the Disks utility in Fedora will show you the actual capacity of the storage device in number of bytes as a decimal number.

There are also command line tools that can provide you with more flexibility in seeing how your storage bytes are being used. Two such command line tools are du (for files and directories) and df (for file systems). You can read about these by typing man du or man df at the command line in a terminal window.


Photo by Franck V. on Unsplash.

Fedora Project community

31 Comments

  1. Kees de Jong

    Thanks for this article. A lot of people are confused about this basic computer science understanding. For example Redis, their notation is not the standard: https://github.com/antirez/redis/blob/unstable/redis.conf

    And many howto’s, manual pages and whatnot. I guess the problem stems from the fact that Americans don’t use the metric system. Anything with kilo, mega, etc. are powers of 10, not of 2. Weirdly I also see people that should know the metric system write a KB as 1024… 🙂

    • t0w0i7ne

      Some people learned their computer science either before the existence of the KiB style units, or from books and other materials that predate them. The confusion actually caused the adoption of the difference between KB and KiB as SI unit standards.

      • cmurf

        Strictly speaking, K is Kelvin, k is kilo, so the SI unit is kB, and the IEC unit is KiB.

  2. Thanks for the article. But, I’m afraid the material is much too basic for the Fedora Magazine. We’re already tech-savvy!

  3. Peter

    Sorry, but it is more like:
    bit7 bit6 bit5 … bit0

    Your explanation is not wrong but very misleading. Could you please correct this?

    • @Peter: The table has been updated to be more correct.

      • I really hadn’t planned to change the table. Correct has to be qualified by use and architecture design choice.

        • Peter

          Ponder this – How do you read this decimal number: 123?

          [ ] one hundred and twenty three
          [ ] three hundred and twenty one

          110^2 + 210 ^1 + 3*10^0

          Does this help?

        • Jim Simmonds

          The decimal numbers are only what suppliers want to use to give you less than they should – they should all stick with binary so that everyone knows what they are getting namely 1024 bytes, megabytes, gigabytes, or even in times to come terabytes – all calculated in binary

    • I considered covering Big-Endian and Little-Endian in this article but since this is a magazine article I didn’t want it to get too long. I used Little-Endian to see if I would get feed pack on this point. I’ve been thinking about proposing another article to cover this, the other larger data structures that bytes are used to form, and some of the uses. Thanks for your comment you have encouraged me to go ahead with that proposal.

  4. Joao Rodrigues

    Even though the disk manufacturer may say that the disk has a capacity of 1,000,000,000,000 bytes (or 1 terabyte), it’s the raw capability of storage. It doesn’t mean you can store 1 terabyte of data on it.

    Some of that space will be lost in partitioning, partition alignment and filesystem structure.

    A very cool tool to analyze disk usage in gnome is baobab
    https://wiki.gnome.org/action/show/Apps/DiskUsageAnalyzer

    Also, in the short scale vs. long scale war:

    Long scale users (mostly european) use the following nomenclature:
    10^9 is a thousand millon or a milliard
    10^12 is a billion
    10^15 is a thousand billon or a billiard
    10^18 is a trillion
    10^21 is a thousand trillion or a trilliard

  5. Jakfrost

    Not that I want to pick nits, but there are 4 Bits in a Byte, two Bits in a nibble, two Bytes in a Word (8 Bit Word). A 16 Bit Word was originally termed a Double Word I think, and a 32 Bit word is a Long Word. You need at least 32 Bits in order to express a single precision floating point value in a PC. Binary is merely a Base 2 Number system (0..1 range), just as Decimal is a Base 10 number system (0…9 range).

    • There are 8 bits in a byte, and 4 bits in a nibble — although the byte was never strictly defined, that’s been the length as long as I can remember. Words are ambiguous due to processor architecture differences, though many of the platforms have maintained a word at 16 bits, and a double word at 32 bits.[1]

      • My favorite machine was the Dec-20 with 36-bit words. There were instructions to unpack words in bytes of different sizes. The system software used 7-bit bytes (ASCII) for strings. The compiler used 6-bit bytes internally for names in its symbol table. The favorite interpreted language was PPL – Polymorphic Programming Language. (I used it to generate a Sociology paper from a random transition network of buzzphrases that got a B+) I still daydream sometimes about how to implement the C language standard on such an architecture.

        I also used a CDC with 60-bit words, but it wasn’t as endearing for some reason.

    • AdamN

      Standardised bytes are 8 bits, though in the early years a byte has been known at 6 bits. A nibble has always been 4 bits. A word varies in bit length according to the natural size of the unit of data handled in a single operation of processing.

  6. You left out the most important part. A KiB is 1024 bytes. 1024 is 2^10, which is close to 1000. So all the binary multipliers are multiples of 2^10. This approximate equivalence is handy for all sorts of estimations. 3 decimal digits ~= 10 binary bits. How many bits needed to count to 10 billion? 10*10^9 is approx 10 * 2^30, 4 bits are needed to count to 10, so 34 bits are needed. A MiB is 2^10 * 2^10.

    Historically, 1024 bytes was called a KiloByte in the context of binary computers until a few decades ago, and 2^20 was called a MegaByte. Eventually, enough lay people were confused, exacerbated by deceptive marketing that used decimal in a binary computer context, that a standards committee was formed to come up with new terms for the binary prefixes.

    As usual, the committee solution was hated by all. 2^10 bytes was now to be called a “KibiByte”, 2^20 bytes is a “MibiByte”, and worst of all, 2^30 bytes is a “GiBiByte”. Hence the new nomenclature was pronounced “GiBiRish”. Fortunately, the abbreviations were more acceptable.

    2^10 bytes KiBiByte KiB
    2^20 bytes MiBiByte MiB
    2^30 bytes GiBiByte GiB
    2^40 bytes TiBiByte TiB
    2^50 bytes PiBiByte PiB

    • Martin

      Thank you for this clarification.. As i was reading the article i was silently telling the author that his labeling was the other way around – I grew up with KB = 1024 🙂

  7. Confusingly, many unix utilities in Fedora still use the old convention of Kilo = 1024 in a binary computer context. For instance, df uses K,M,G,T,P,E,Z,Y to mean powers of 1024. It then added KB,MB,… for powers of 1000 and KiB,MiB,… for powers of 1024.

    The binary prefixes get more unpronounceable past PeBiByte.

    2^60 bytes ExBiByte EiB ~ Exabyte
    2^70 bytes ZeBiByte ZiB ~ Zettabyte
    2^80 bytes YoBiByte YiB ~ Yottabyte

  8. Ray McCaffity

    One thing that helps, is newer versions of many utilities include the “h” option.

    df -h
    free -mh
    ls -lh

    h = human readable, it automatically converts the number length to kilobytes, megabytes, gigabytes or terrabytes for you.

  9. Bartosz Lis

    The prefix for 1000-fold is lowercase k not uppercase K (see articles “Metric prefix” and “Binary prefix” on wikipedia or consult any high school physics textbook). Thus 1kg = 1000g (one kilogram is one thousand grams), 1km=1000m (one kilometer is one thousand meters), 1kHz = 1000Hz (one kilo hertz is one thousand hertz), 1kb=1000b (one kilobit is one thousand bits), 1kB=1000B (one kilobyte is one thousand bytes) and so on. The prefix K was introduced long long ago to distinguish 1KB=2^10B=1024B from 1kB=10^3B=1000B. There were no distinction between higher powers of 1024 an 1000. 1MB was used as 10241024B in the context of RAM and 10001000B in the context of HDD. The difference was c.a. 5%. However the difference between 1024^4 and 1000^4 is c.a.10%, so it is significant. Thus, when sizes of terabyte (TB) came to use it appeared necessary to distinguish binary multipliers from decimal multipliers and introduce this distinction as scientific/industrial standard. Finally standardization organizations introduced binary prefixes (Ki, Mi, Gi, Ti,…) for powers of 1024.

    The prefix K was used for 1024-fold and now should be replaced with Ki. The prefix K was never used for 1000-fold!

  10. Leslie Satenstein

    The Decimal system is has 1000 as the base, so that 1k is 1000.
    The binary system uses 1024 as a base, so that 1k is 1024 in Decimal.

    A 1 terabyte drive (decimal) is much less storage when it is expressed as 1 terabyte (binary) than what one expects. That is why we see storage described in decimal as opposed to base binary.

    • Leslie I’m glad you commented. I haven’t been on the Forum much lately so I haven’t seen you around. You should try this; I’m sure there are some topics you would like to write about.

  11. Tracy Baker

    “Your PC will make use of the full capacity of your storage …” This statement is incorrect.

    One cannot create an 11GiB logical volume from an 11GiB volume group. When using the lvcreate command, the -L 11G option will not work; -L 10G will. This creates a 10GiB logical volume. When -L 10G is used, 254 4MiB extents (1016MiB, assuming a 4MiB physical/logical extent size) are left unused.

    Almost all of the space can be used using the -l (lowercase) option. That requires specifying extents and doing math. Most administrators don’t/won’t do the math.

    Even so: if extents are used you’ll end up with a 10.99GiB logical volume, not 11GiB. The resulting logical volume will be short of full capacity by 2 extents, or 8MiB.

    …and let’s not get into slack space.

    • Thanks Tracy. I could have definitely phrased that sentence better. I had thought about the complications of setting up volumes, partitioning, formatting, but I didn’t want this article to get too big. On the other hand, in the creation of volumes, partitions, formatting, etc The user is making use of the disk’s space. That use just results in space being used that is not available for executables, user data, etc. Though it certainly is true that in these processes some space it wasted and is of no use at all. One of the things I like about Fedora Magazine is that there are users such as yourself who are willing to fill in where the author of an article has missed thing.

  12. Yazan Al Monshed

    Thanks for the basic, A lot of people find confusing to understand.

  13. Loving the CS-themed articles on the magazine so far!

  14. Andrej

    I’m not sure if there is a special logic in this article, but what is meant by “suffix” should be “prefix” since it comes before a unit, not after it.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

The opinions expressed on this website are those of each author, not of the author's employer or of Red Hat. Fedora Magazine aspires to publish all content under a Creative Commons license but may not be able to do so in all cases. You are responsible for ensuring that you have the necessary permission to reuse any work on this site. The Fedora logo is a trademark of Red Hat, Inc. Terms and Conditions

%d bloggers like this: