Within KDE we have a service called the binary factory. It’s a Jenkins driven build pipeline which we use in the KMyMoney project to build certain binary installable versions of the project. For the generation of our AppImage version we package all dependencies into a large tar file so that we don’t have to rebuild them every day.
Due to new versions of the online banking libraries we use, I updated some package information and let the service do its thing to create the tar file with the pre-build dependencies. This is usually a matter of a few hours. When I checked the progress after a while, I found out that the build had failed. Too bad, I thought, and took a look at the console log of the build to see what was going on. To my surprise I saw the following:
10:24:51 [ 0%] Performing download step (download, verify and extract) for 'ext_tcl' 10:24:51 -- Downloading... 10:24:51 dst='/home/appimage//appimage-workspace//downloads/core-8-6-8.zip' 10:24:51 timeout='none' 10:24:51 inactivity timeout='none' 10:24:51 -- Using src='https://github.com/tcltk/tcl/archive/core-8-6-8.zip' 10:24:51 -- [download 100% complete] 10:24:52 -- verifying file... 10:24:52 file='/home/appimage//appimage-workspace//downloads/core-8-6-8.zip' 10:24:52 -- MD5 hash of 10:24:52 /home/appimage//appimage-workspace//downloads/core-8-6-8.zip 10:24:52 does not match expected value 10:24:52 expected: '36fbbc668961044fdda89c5ee2ba67a2' 10:24:52 actual: 'b018a409832df1788f22b1a983fd7c5b'
So the checksum of a package which exists for quite some time failed due to a checksum mismatch. What? How could that be? It happened with just this very same package already in the past because someone added a file to the ZIP without changing the version number (a bad habit after all) so we had to adjust our checksum to the new value. Did this happen again? Who is so dumb and does these strange things? I started the quest to figure out what happened this time.
Downloading the file manually, I get the same checksum as the one shown as actual above. I looked for another copy of the file on the internet, but did not find it anymore. Luckily, I had a copy of the old version around on one of my disks, so I was able to compare their contents:
thb@thb-nb:~/Downloads$ md5sum tcl-core-8-6-8-old.zip tcl-core-8-6-8-new.zip
36fbbc668961044fdda89c5ee2ba67a2 tcl-core-8-6-8-old.zip
b018a409832df1788f22b1a983fd7c5b tcl-core-8-6-8-new.zip
Matches the above figures I found in the console log and checking the size shows that they have identical size:
thb@thb-nb:~/Downloads$ ls -l tcl-core-8-6-8-old.zip tcl-core-8-6-8-new.zip
-rw-r--r-- 1 thb users 7875812 13. Feb 17:05 tcl-core-8-6-8-new.zip
-rw-r--r-- 1 thb users 7875812 13. Feb 17:08 tcl-core-8-6-8-old.zip
Next, I extracted the contents into two directories and compared them recursively:
thb@thb-nb:~/Downloads$ diff -r tcl-core-8-6-8-old tcl-core-8-6-8-new
thb@thb-nb:~/Downloads$
No difference, but why does the checksum fail? Guess it needs a deep dive using binary comparison:
thb@thb-nb:~/Downloads$ diff -u <(hexdump -C tcl-core-8-6-8-old.zip) <(hexdump -C tcl-core-8-6-8-new.zip) --- /dev/fd/63 2021-02-13 17:38:06.439702799 +0100 +++ /dev/fd/62 2021-02-13 17:38:06.435702801 +0100 @@ -492233,8 +492233,8 @@ 00782c80 75 00 74 63 6c 2d 63 6f 72 65 2d 38 2d 36 2d 38 |u.tcl-core-8-6-8| 00782c90 2f 77 69 6e 2f 74 63 6c 73 68 2e 72 63 55 54 05 |/win/tclsh.rcUT.| 00782ca0 00 01 47 7c 39 5a 50 4b 05 06 00 00 00 00 2c 08 |..G|9ZPK......,.| -00782cb0 2c 08 cc 02 03 00 da 29 75 00 28 00 39 32 33 39 |,......)u.(.9239| -00782cc0 61 62 64 65 65 62 39 31 36 38 66 63 31 34 39 65 |abdeeb9168fc149e| -00782cd0 31 66 61 32 63 32 36 35 64 63 30 35 62 63 63 64 |1fa2c265dc05bccd| -00782ce0 38 66 30 66 |8f0f| +00782cb0 2c 08 cc 02 03 00 da 29 75 00 28 00 31 37 32 35 |,......)u.(.1725| +00782cc0 62 37 34 36 39 35 36 30 66 38 30 32 66 31 33 32 |b7469560f802f132| +00782cd0 61 38 37 61 62 63 65 35 39 61 65 33 32 32 38 64 |a87abce59ae3228d| +00782ce0 64 30 65 64 |d0ed| 00782ce4
782ce04
hex is 7875812
dec so that diff is at the very end of the file and up to the address 782cb0
hex the files have the same content. On one hand good news, but on the other I still have no idea why that is. Let’s see what the built-in test of unzip has to report:
thb@thb-nb:~/Downloads$ unzip -t tcl-core-8-6-8-old.zip | grep -v OK Archive: tcl-core-8-6-8-old.zip 9239abdeeb9168fc149e1fa2c265dc05bccd8f0f No errors detected in compressed data of tcl-core-8-6-8-old.zip. thb@thb-nb:~/Downloads$ unzip -t tcl-core-8-6-8-new.zip | grep -v OK Archive: tcl-core-8-6-8-new.zip 1725b7469560f802f132a87abce59ae3228dd0ed No errors detected in compressed data of tcl-core-8-6-8-new.zip.
Everything OK (the details of which I have suppressed here using grep -v OK
) and the numbers spit out are the numbers I see at the end of the file.
Looking at Wikipedia’s article about the ZIP format unveils that the difference is in a field called comment
. Great! Still the question remains: why did it change in the first place and when does it happen again (maybe other modules are affected as well)?