Use Diffoscope in packager workflows

Photo by Christian Liebel on Unsplash

In the role of a packager, updating packages is a recurring task. For some projects, a packager is involved in upstream maintenance, or well written release notes make it easy to figure out what changed between the releases. This isn’t always the case, for instance with some small project maintained by one or two people somewhere on github, and it can be useful to verify what exactly changed. Diffoscope can help determine the changes between package releases.

Diffoscope is a “smart binary diff” tool that was born in the Reproducible Builds project in Debian, which is also available in Fedora. It “knows” about various types of text and binary formats, and will try to recursively unpack and compare two blobs. In particular it knows that some objects need to be decompressed before comparing, that archives need to be unpacked, and how to deconstruct binary objects like ELF programs and libraries, Java .jar files, Windows .cab files, etc.

Just today I received a bug report stating that there is a new version of python-libarchive-c available (3.2, while 3.1 is what is currently packaged). It is a simple Python package. But even a simple Python package has some binary files, so a straightforward diff on the unpackaged rpms doesn’t really work. Let’s see how Diffoscope can be used to show the differences between the binary packages in detail.

Comparing upstream archives

The first step is to compare the upstream archives:

$ diffoscope python-libarchive-c-3.{1,2}.tar.gz
+++ python-libarchive-c-3.2.tar.gz
│   --- python-libarchive-c-3.1.tar        ❶
├── +++ python-libarchive-c-3.2.tar
│ ├── file list
│ │ @@ -1,46 +1,46 @@                      ❷
│ │ -drwxrwxr-x  0 root (0) root (0)     0 2021-06-01 07:32:24.000000 python-libarchive-c-3.1/                     
│ │ --rw-rw-r--  0 root (0) root (0)    25 2021-06-01 07:32:24.000000 python-libarchive-c-3.1/.gitattributes
│ │ -drwxrwxr-x  0 root (0) root (0)     0 2021-06-01 07:32:24.000000 python-libarchive-c-3.1/.github/
│ │ --rw-rw-r--  0 root (0) root (0)    20 2021-06-01 07:32:24.000000 python-libarchive-c-3.1/.github/FUNDING.yml
...
│ │ --rw-rw-r--  0 root (0) root (0)  1331 2021-06-01 07:32:24.000000 python-libarchive-c-3.1/version.py
│ │ +drwxrwxr-x  0 root (0) root (0)     0 2021-10-06 12:40:03.000000 python-libarchive-c-3.2/
│ │ +-rw-rw-r--  0 root (0) root (0)    25 2021-10-06 12:40:03.000000 python-libarchive-c-3.2/.gitattributes
│ │ +drwxrwxr-x  0 root (0) root (0)     0 2021-10-06 12:40:03.000000 python-libarchive-c-3.2/.github/
│ │ +-rw-rw-r--  0 root (0) root (0)    20 2021-10-06 12:40:03.000000 python-libarchive-c-3.2/.github/FUNDING.yml
...
│ │ +-rw-rw-r--  0 root (0) root (0)  1331 2021-10-06 12:40:03.000000 python-libarchive-c-3.2/version.py
...
│ │   --- python-libarchive-c-3.1/libarchive/ffi.py
│ ├── +++ python-libarchive-c-3.2/libarchive/ffi.py
│ │┄ Files 0% similar despite different names
│ │ @@ -43,15 +43,15 @@
│ │  SEEK_CALLBACK = CFUNCTYPE(
│ │ -    c_longlong, c_int, c_void_p, c_longlong, c_int ❸
│ │ +    c_longlong, c_void_p, c_void_p, c_longlong, c_int
│ │  )
│ │   --- python-libarchive-c-3.1/libarchive/read.py
│ ├── +++ python-libarchive-c-3.2/libarchive/read.py
│ │┄ Files 2% similar despite different names
│ │ @@ -61,17 +61,18 @@
│ │      close_cb = CLOSE_CALLBACK(close_func) if close_func else NO_CLOSE_CB
│ │ +    seek_cb = SEEK_CALLBACK(seek_func)
│ │      with new_archive_read(format_name, filter_name, passphrase) as archive_p:
│ │          if seek_func:
│ │ -            ffi.read_set_seek_callback(archive_p, SEEK_CALLBACK(seek_func))
│ │ +            ffi.read_set_seek_callback(archive_p, seek_cb)
│ │          ffi.read_open(archive_p, None, open_cb, read_cb, close_cb)
│ │          yield archive_read_class(archive_p)
...
│ │   --- python-libarchive-c-3.1/libarchive/write.py
│ ├── +++ python-libarchive-c-3.2/libarchive/write.py
│ │┄ Files identical despite different names
│ │   --- python-libarchive-c-3.1/setup.py              ❹
│ ├── +++ python-libarchive-c-3.2/setup.py
│ │┄ Files identical despite different names
...
│ │   --- python-libarchive-c-3.1/version.py
│ ├── +++ python-libarchive-c-3.2/version.py
│ │┄ Files 1% similar despite different names
│ │ @@ -9,15 +9,15 @@
│ │  def get_version():
│ │      # Return the version if it has been injected into the file by git-archive
│ │ -    version = tag_re.search('HEAD -> master, tag: 3.1')
│ │ +    version = tag_re.search('HEAD -> master, tag: 3.2') ❺
│ │      if version:
│ │          return version.group(1)

At ❶ we see that we’re comparing two archives. (Note: this diff output has been heavily trimmed for readability.)

At ❷ we see that the listings of both archives are different, but the difference is as expected: the version number is included in the name of the top-level directory in the archives, so all paths are different. We also see that the two archives were created at different dates (2021-06-01 07:32:24 and 2021-10-06 12:40:03 respectively). This listing would alert us if upstream added or removed files unexpectedly.

At ❸ we see that there were some code changes in the SEEK_CALLBACK function.
Upstream release notes about it say that “this release fixes the seek callbacks passed to libarchive by the custom_reader and stream_reader functions”, so that change looks reasonable.

Most files are not changed, so at ❹ diffoscope dutifully reports that they are the same despite the file name change… And at ❺ we get another version releated change.

And that’s it — no more changes in the upstream tarball.
Let’s build the package then and compare it with a previous build.

Comparing binary packages

After adjusting the spec file for the new version, the second step is to build the package and compare with an older version. fedpkg mockbuild puts the resulting packages in a subdirectory named after the version:

$ fedpkg mockbuild
...
INFO: Results and/or logs in: ~/fedora/python-libarchive-c/results_python-libarchive-c/3.2/1.fc36

$ diffoscope results_python-libarchive-c/3.1/3.fc36/python3-libarchive-c-3.1-3.fc36.noarch.rpm \
             results_python-libarchive-c/3.2/1.fc36/python3-libarchive-c-3.2-1.fc36.noarch.rpm
+++ results_python-libarchive-c/3.2/1.fc36/python3-libarchive-c-3.2-1.fc36.noarch.rpm
├── header
│ @@ -1,79 +1,79 @@            ❶
│ -HEADERIMMUTABLE: 0000003d00001ed50000003f...1000003e8000000060000+HEADERIMMUTABLE: 0000003d00001e8d0000003f...1000003e8000000060000

In the case of the binary packages, there are many more differences than in the upstream tarball. But let’s try to go through them.

│  HEADERI18NTABLE:
│   - C
 -SIGSIZE: 26733              
 -SIGMD5: 1aa148ac91484fe8cb55fe3334aae10b
 -SHA1HEADER: 1659a1431af930a0a824c193780e27f28fc2d03e
 -SHA256HEADER: 60e4f84e905bd42693cabe88e63542916a7dfffef052f3e7499cb80a1770c736
+SIGSIZE: 26678+SIGMD5: e35e3157e01b6ec26b8e1981f0ba38af+SHA1HEADER: faab0b7ee86b23f753b2c49a52d6a8f3deefc7ca+SHA256HEADER: 66d39a70dc9e081ba5cc4243e72f434e953d68c9bce8be1127b2b87fa1923d06
│  NAME: python3-libarchive-c
│ -VERSION: 3.1                ❸
│ -RELEASE: 3.fc36+VERSION: 3.2
 +RELEASE: 1.fc36
│  SUMMARY: Python interface to libarchive
│  DESCRIPTION: The libarchive library provides a flexible interface for reading and writing archives in various
│  formats such as tar and cpio. libarchive also supports reading and writing archives compressed using
│  various compression filters such as gzip and bzip2.  A Python interface to libarchive. It uses the
│  standard ctypes module to dynamically load and access the C library.
│ -BUILDTIME: 1638015932+BUILDTIME: 1638015863       ❹
│  BUILDHOST: spora.local
│ -SIZE: 68979+SIZE: 69052
│  LICENSE: CC0
│  GROUP: Unspecified
│  URL: https://github.com/Changaco/python-libarchive-c
│  OS: linux
│  ARCH: noarch

We can see that Diffoscope does something similar to rpmdiff on the two archives, but with much more detail.
At ❶ we see that the rpm header changed, which is not surprising 😉 At ❷ we get the details of the signatures, and version info at ❸. Build timestamp and rpm size may also be interesting at ❹. Diffoscope then prints a comparison of the FILESIZES, FILEMTIMES, FILEMD5S tables in the rpm headers. This would be useful if we were trying to chase down some unexpected difference between packages.

│  CHANGELOGTIME:
│ + - 1638014400
│   - 1627387200
...
│ - - 1570104000 - - 1566216000
│  CHANGELOGNAME:
│ + - Zbigniew Jędrzejewski-Szmek  3.2-1
│   - Fedora Release Engineering  - 3.1-2
...
│ - - Miro Hrončok  - 2.8-10
 - - Miro Hrončok  - 2.8-9
│  CHANGELOGTEXT:
│ + - - Version 3.2 (fixes #2027027)
│   - - Second attempt - Rebuilt for   https://fedoraproject.org/wiki/Fedora_35_Mass_Rebuild
...
│ - - - Rebuilt for Python 3.8.0rc1 (#1748018)
 - - - Rebuilt for Python 3.8

Here we see that the changelog got trimmed (my entry is added, and two from Miro are dropped). In Fedora we set %_changelog_trimage to 2 years, so even if the spec file defines a longer changelog, in the built package the oldest entries are trimmed away.

Then we get some expected but important differences:

│  PROVIDEVERSION:
│ - - 3.1-3.fc36+ - 3.2-1.fc36       ❶
│  OBSOLETEVERSION:
│ - - 3.1-3.fc36+ - 3.2-1.fc36
...
│  DIRNAMES:           ❷
│ - - /usr/lib/python3.10/site-packages/libarchive_c-3.1-py3.10.egg-info/ + - /usr/lib/python3.10/site-packages/libarchive_c-3.2-py3.10.egg-info/
...
│ @@ -1 +1 @@
│ -RPM v3.0 bin i386/x86_64 python3-libarchive-c-3.1-3.fc36+RPM v3.0 bin i386/x86_64 python3-libarchive-c-3.2-1.fc36
├── content
│ ├── file list
│ │ @@ -1,35 +1,35 @@
│ │ -drwxr-xr-x   1   0 0    0 2021-10-27 12:20:33.000000 ./usr/lib/python3.10/site-packages/libarchive
│ │ --rw-r--r--   1   0 0  601 2021-06-01 07:32:24.000000 ./usr/lib/python3.10/site-packages/libarchive/__init__.py
│ │ -drwxr-xr-x   1   0 0    0 2021-10-27 12:20:34.000000 ./usr/lib/python3.10/site-packages/libarchive/__pycache__
...
│ │ +drwxr-xr-x   1   0 0    0 2021-11-27 12:24:23.000000 ./usr/lib/python3.10/site-packages/libarchive    
│ │ +-rw-r--r--   1   0 0  601 2021-10-06 12:40:03.000000 ./usr/lib/python3.10/site-packages/libarchive/__init__.py
│ │ +drwxr-xr-x   1   0 0    0 2021-11-27 12:24:24.000000 ./usr/lib/python3.10/site-packages/libarchive/__pycache__
...
│ ├── ./usr/lib/python3.10/site-packages/libarchive/ffi.py
│ │ @@ -43,15 +43,15 @@
│ │  SEEK_CALLBACK = CFUNCTYPE(
│ │ -    c_longlong, c_int, c_void_p, c_longlong, c_int ❺
│ │ +    c_longlong, c_void_p, c_void_p, c_longlong, c_int
│ │  )
│ │ @@ -61,17 +61,18 @@
│ │      close_cb = CLOSE_CALLBACK(close_func) if close_func else NO_CLOSE_CB
│ │ +    seek_cb = SEEK_CALLBACK(seek_func)
│ │      with new_archive_read(format_name, filter_name, passphrase) as archive_p:
│ │          if seek_func:
│ │ -            ffi.read_set_seek_callback(archive_p, SEEK_CALLBACK(seek_func))
│ │ +            ffi.read_set_seek_callback(archive_p, seek_cb)
│ │          ffi.read_open(archive_p, None, open_cb, read_cb, close_cb)
...

At ❶ we see that the package version change is reflected in the PROVIDEVERSION and OBSOLETEVERSION tables in the rpm header. There are also PROVIDENAME and OBSOLETENAME tables, but those are unchanged in this case. This is good: we are not expecting any changes to Provides or Obsoletes, except for the version bump.

At ❷ we again see that the version is reflected in a path.

You can see ❸ and ❹ are the heavily trimmed lists of files in both packages. All files are reported as different because the modification time changed. But please look closely at the timestamps: __init__.py is a file provided by upstream, and the mtime is preserved during installation, so we see the same timestamps as in the first listing from the upstream tarball comparison. But the __pycache__ directory was created during build and has a timestamp that shows when the build was done. We would see the same for other files produced during build.

Now comes the boring part:

│ ├── ./usr/lib/python3.10/site-packages/libarchive/__pycache__/__init__.cpython-310.pyc
│ │ @@ -1,8 +1,8 @@
│ │ -00000000: 6f0d 0d0a 0000 0000 88e2 b560 5902 0000  o..........`Y... ❶
│ │ +00000000: 6f0d 0d0a 0000 0000 2399 5d61 5902 0000  o.......#.]aY...
│ ├── ./usr/lib/python3.10/site-packages/libarchive/__pycache__/entry.cpython-310.pyc
│ │ @@ -1,8 +1,8 @@
│ │ -00000000: 6f0d 0d0a 0000 0000 88e2 b560 0c14 0000  o..........`.... ❷
│ │ +00000000: 6f0d 0d0a 0000 0000 2399 5d61 0c14 0000  o.......#.]a....
...
│ │   --- ./usr/lib/python3.10/site-packages/libarchive_c-3.1-py3.10.egg-info/PKG-INFO
│ ├── +++ ./usr/lib/python3.10/site-packages/libarchive_c-3.2-py3.10.egg-info/PKG-INFO
│ │┄ Files 0% similar despite different names
│ │ @@ -1,10 +1,10 @@
│ │  Name: libarchive-c
│ │ -Version: 3.1     ❸
│ │ +Version: 3.2
│ │  Summary: Python interface to libarchive
│ │  Home-page: https://github.com/Changaco/python-libarchive-c

… yes, diffoscope shows the diff between the hexdump listings of .pyc files (Python bytecode at ❶ and ❷). Those files were created during the build, so we know that those changes correspond to the changes to the sources shown in previous listings. At the end we again see the version changed in the PKG-INFO file (❸).

Currently, diffoscope does not know how to show Python bytecode in a better way. But it is possible that in the future it will be able to deassemble the bytecode back into some more readable form and show the diff on that. New parsers are regularly being added to diffoscope. For compiled programs, it will already show a diff on the disassembled machine code.

Conclusion

After looking at all those diffs, I think one can say with some confidence that the upgrade from python-libarchive-c-3.1 to python-libarchive-c-3.2 is safe. In particular, it is suitable even for a stable release because it has only a bug fix.

Big shout out to Chris Lamb and the other maintainers of diffoscope.

For Developers

6 Comments

  1. Karlis K.

    This is awesome, I hadn’t even thought of binary diff’ing. Always great to read about something new.

  2. Christopher

    Uhm… interesting plot twist.

    I just installed diffoscope-193-1.fc35 on my Fedora 35 workstation, and it… has a dependency on apt.

    Or am I just dumb?

    $ diffoscope Template-0.2.{2,5}.tgz
    Traceback (most recent call last):
    File “/usr/lib/python3.10/site-packages/diffoscope/main.py”, line 751, in main
    sys.exit(run_diffoscope(parsed_args))
    File “/usr/lib/python3.10/site-packages/diffoscope/main.py”, line 706, in run_diffoscope
    difference = compare_root_paths(path1, path2)
    File “/usr/lib/python3.10/site-packages/diffoscope/comparators/utils/compare.py”, line 66, in compare_root_paths
    file1 = specialize(FilesystemFile(path1, container=container1))
    File “/usr/lib/python3.10/site-packages/diffoscope/comparators/utils/specialize.py”, line 53, in specialize
    for cls in ComparatorManager().classes:
    File “/usr/lib/python3.10/site-packages/diffoscope/comparators/init.py”, line 127, in init
    self.reload()
    File “/usr/lib/python3.10/site-packages/diffoscope/comparators/init.py”, line 138, in reload
    mod = importlib.import_module(
    File “/usr/lib64/python3.10/importlib/init.py”, line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
    File “”, line 1050, in _gcd_import
    File “”, line 1027, in _find_and_load
    File “”, line 1006, in _find_and_load_unlocked
    File “”, line 688, in _load_unlocked
    File “”, line 883, in exec_module
    File “”, line 241, in _call_with_frames_removed
    File “/usr/lib/python3.10/site-packages/diffoscope/comparators/debian.py”, line 24, in
    from debian.deb822 import Dsc, Deb822
    File “/usr/lib/python3.10/site-packages/debian/deb822.py”, line 283, in
    import debian.debian_support
    File “/usr/lib/python3.10/site-packages/debian/debian_support.py”, line 46, in
    apt_pkg.init()
    apt_pkg.Error: W:Unable to read /etc/apt/apt.conf.d/ – DirectoryExists (2: No such file or directory), E:Unable to determine a suitable packaging system type

    • That is strange indeed. It seems that

      apt_pkg.init()

      fails… It’s a compiled module, so we don’t get the traceback for that part. But I tried both installing and removing python3-apt and even though I don’t have

      /etc/apt/apt.conf.d

      , the issue does not reproduce. Please file a bug in bugzilla, and report versions of diffscope, python3-apt, python3-debian, apt, and whether you have anything in /etc/apt/. Does

      python3 -c 'import apt_pkg; apt_pkg.init()'

      also blow up like that?

  3. Dan

    Interesting, on Fedora 34 installing diffoscope would require installing 945M of dependencies, such as ghc-compiler, mono-devel, gnumeric and goffice!

    • Yes, those are all tools that diffoscope uses (and their dependencies). They are listed as “Recommends”, not “Requires” though, so you can disable installation of weak deps to avoid that.

Comments are Closed

The opinions expressed on this website are those of each author, not of the author's employer or of Red Hat. Fedora Magazine aspires to publish all content under a Creative Commons license but may not be able to do so in all cases. You are responsible for ensuring that you have the necessary permission to reuse any work on this site. The Fedora logo is a trademark of Red Hat, Inc. Terms and Conditions