June 13, 2025

Reproducible Builds (diffoscope)

diffoscope 298 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 298. This version includes the following changes:

[ Chris Lamb ]
* Handle RPM's HEADERSIGNATURES and HEADERIMMUTABLE specially to avoid
  unncessarily large diffs. Based almost entirely on code by Daniel Duan.
  (Closes: reproducible-builds/diffoscope#410)
* Update copyright years.

You find out more by visiting the project homepage.

13 June, 2025 12:00AM

June 12, 2025

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

#50: Introducing ‘almm: Activate-Linux (based) Market Monitor’

Welcome to post 50 in the R4 series.

Today we reconnect to a previous post, namely #36 on pub/sub for live market monitoring with R and Redis. It introduced both Redis as well as the (then fairly recent) extensions to RcppRedis to support the publish-subscibe (“pub/sub”) model of Redis. In short, it manages both subscribing clients as well as producer for live, fast and lightweight data transmission. Using pub/sub is generally more efficient than the (conceptually simpler) ‘poll-sleep’ loops as polling creates cpu and network load. Subscriptions are lighterweight as they get notified, they are also a little (but not much!) more involved as they require a callback function.

We should mention that Redis has a recent fork in Valkey that arose when the former did one of these non-uncommon-among-db-companies licenuse suicides—which, happy to say, they reversed more recently—so that we now have both the original as well as this leading fork (among others). Both work, the latter is now included in several Linux distros, and the C library hiredis used to connect to either is still licensed permissibly as well.

All this came about because Yahoo! Finance recently had another ‘hickup’ in which they changed something leading to some data clients having hiccups. This includes GNOME applet Stocks Extension I had been running. There is a lively discussion on its issue #120 suggestions for example a curl wrapper (which then makes each access a new system call).

Separating data acquisition and presentation becomes an attractive alternative, especially given how the standard Python and R accessors to the Yahoo! Finance service continued to work (and how per post #36 I already run data acquisition). Moreoever, and somewhat independently, it occurred to me that the cute (and both funny in its pun, and very pretty in its display) ActivateLinux program might offer an easy-enough way to display updates on the desktop.

There were two aspects to address. First, the subscription side needed to be covered in either plain C or C++. That, it turns out, is very straightforward and there are existing documentation and prior examples (e.g. at StackOverflow) as well as the ability to have an LLM generate a quick stanza as I did with Claude. A modified variant is now in the example repo ‘redis-pubsub-examples’ in file subscriber.c. It is deliberately minimal and the directory does not even have a Makefile: just compile and link against both libevent (for the event loop controlling this) and libhiredis (for the Redis or Valkey connection). This should work on any standard Linux (or macOS) machine with those two (very standard) libraries installed.

The second aspect was trickier. While we can get Claude to modify the program to also display under x11, it still uses a single controlling event loop. It took a little bit of probing on my event to understand how to modify (the x11 use of) ActivateLinux, but as always it was reasonably straightforward in the end: instead of one single while loop awaiting events we now first check for pending events and deal with them if present but otherwise do not idle and wait but continue … in another loop that also checks on the Redis or Valkey “pub/sub” events. So two thumbs up to vibe coding which clearly turned me into an x11-savvy programmer too…

The result is in a new (and currently fairly bare-bones) repo almm. It includes all files needed to build the application, borrowed with love from ActivateLinux (which is GPL-licensed, as is of course our minimal extension) and adds the minimal modifications we made, namely linking with libhiredis and some minimal changes to x11/x11.c. (Supporting wayland as well is on the TODO list, and I also need to release a new RcppRedis version to CRAN as one currently needs the GitHub version.)

We also made a simple mp4 video with a sound overlay which describes the components briefly:

Comments and questions welcome. I will probably add a little bit of command-line support to the almm. Selecting the symbol subscribed to is currently done in the most minimal way via environment variable SYMBOL (NB: not SYM as the video using the default value shows). I also worked out how to show the display only one of my multiple monitors so I may add an explicit screen id selector too. A little bit of discussion (including minimal Docker use around r2u) is also in issue #121 where I first floated the idea of having StocksExtension listen to Redis (or Valkey). Other suggestions are most welcome, please use issue tickets at the almm repository.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

12 June, 2025 04:42PM

June 11, 2025

hackergotchi for Gunnar Wolf

Gunnar Wolf

Understanding Misunderstandings - Evaluating LLMs on Networking Questions

This post is a review for Computing Reviews for Understanding Misunderstandings - Evaluating LLMs on Networking Questions , a article published in Association for Computing Machinery (ACM), SIGCOMM Computer Communication Review

Large Language Models have awed the world, emerging as the fastest-growing application of all time — ChatGPT reached 100 million active users in January 2023, just two months after its launch. After an initial cycle, they been gradually mostly accepted and incorporated in various workflows, and their basic mechanics are no longer beyond the understanding of people with moderate computer literacy. Now, given the technology is better understood, we face the question of how convenient LLM chatbots are for different occupations. This article embarks on the question of how much LLMs can be useful for networking applications.

This article systematizes querying three popular LLMs (GPT-3.5, GPT-4 and Claude 3) with questions taken from several network management online courses and certifications, and presents a taxonomy of six axes along which the incorrect responses were classified: Accuracy (correctness of the answers provided by LLMs), Detectability (how easily errors in the LLM output can be identified), Cause (for each incorrect answer, the underlying causes behind the error), Explainability (the quality of explanations with which the LLMs support their answers), Effects (impact of wrong answers on the users) and Stability (whether a minor change, such as the change of the order of prompts, yields vastly different answers for a single query).

The authors also measure four strategies towards improving answers: Self-correction (giving back the LLM the original question and received answer, as well as the expected correct answer, as part of the prompt), One-shot prompting (adding to the prompt, “when answering user questions, follow this example” followed by a similar correct answer), Majority voting (using the answer that most models agree upon) and Fine tuning (further train on a specific dataset to adapt the LLM to the particular task or domain). The authors noted that they observed that, while some of thos strategies were marginally useful, they sometimes resulted in degraded performance.

The authors queried the commercially available instances of Gemini and GPT, reaching quite high results (89.4% for Claude 3, 88.7% for GPT-4 and 76.0% for GPT-3.5), reaching scores over 90% for basic subjects, but faring notably worse in topics that require understanding and converting between different numeric notations, such as working with IP addresses, even if they are trivial (i.e. presenting the subnet mask for a given network address expressed as the typical IPv4 dotted-quad representation).

As a last item in the article, the authors menioned they also compared performance with three popular open source models (Llama3.1, Gemma2 and Mistral with their default settings). They mention that, although those models are almost 20 times smaller than the GPT-3.5 commercial model used, they reached comparable performance levels. Sadly, the article does not delve deeper into these models, that can be deployed locally and adapted to specific scenarios.

The article is easy to read and does not require deep mathematical or AI-related knowledge. It presents a clear comparison along the described axes for the 503 multiple-choice questions presented. This article can be used as a guide for structuring similar studies over different fields.

11 June, 2025 09:58PM

Sven Hoexter

HaProxy: Two Ways of Activating PROXY Protocol

If you ever face the need to activate the PROXY Protocol in HaProxy (e.g. if you're as unlucky as I'm, and you have to use Google Cloud TCP proxy load balancer), be aware that there are two ways to do that. Both are part of the frontend configuration.

accept-proxy

This one is the big hammer and forces the usage of the PROXY protocol on all connections. Sample:

      frontend vogons
          bind *:2342 accept-proxy ssl crt /etc/haproxy/certs/vogons/tls.crt

tcp-request connection expect-proxy

If you have to, e.g. during a phase of migrations, receive traffic directly, without the PROXY protocol header and from a proxy with the header there is also a more flexible option based on a tcp-request connection action. Sample:

      frontend vogons
          bind *:2342 ssl crt /etc/haproxy/certs/vogons/tls.crt
          tcp-request connection expect-proxy layer4 if { src 35.191.0.0/16 130.211.0.0/22 }

Source addresses here are those of GCP global TCP proxy frontends. Replace with whatever suites your case. Since this is happening just after establishing a TCP connection, there is barely anything else available to match on beside of the source address.

HaProxy Documentation

11 June, 2025 03:54PM

Iustin Pop

This blog finally goes git-annex!

A long, long time ago…

I have a few pictures on this blog, mostly in earlier years, because even with small pictures, the git repository became 80MiB soon—this is not much in absolute terms, but the actual Markdown/Haskell/CSS/HTML total size is tiny compared to the picture, PDFs and fonts. I realised I need a better solution, probably about ten years ago, and that I should investigate git-annex. Then time passed, and I heard about git-lfs, so I thought that’s the way forward.

Now, I recently got interested again into doing something about this repository, and started researching.

Detour: git-lfs

I was sure that git-lfs, being supported by large providers, would be the modern solution. But to my surprise, git-lfs is very server centric, which in hindsight makes sense, but for a home setup, it’s not very good. Maybe I misunderstood, but git-lfs is more a protocol/method for a forge to store files, rather than an end-user solution. But then you need to backup those files separately (together with the rest of the forge), or implement another way of safeguarding them.

Further details such as the fact that it keeps two copies of the files (one in the actual checked-out tree, one in internal storage) means it’s not a good solution. Well, for my blog yes, but not in general. Then posts on Reddit about horror stories—people being locked out of github due to quota, as an example, or this Stack Overflow post about git-lfs constraining how one uses git, convinced me that’s not what I want. To each their own, but not for me—I might want to push this blog’s repo to github, but I definitely wouldn’t want in that case to pay for github storage for my blog images (which are copies, not originals). And yes, even in 2025, those quotas are real—GitHub limits—and I agree with GitHub, storage and large bandwidth can’t be free.

Back to the future: git-annex

So back to git-annex. I thought it’s going to be a simple thing, but oh boy, was I wrong. It took me half a week of continuous (well, in free time) reading and discussions with LLMs to understand a bit how it works. I think, honestly, it’s a bit too complex, which is why the workflows page lists seven (!) levels of workflow complexity, from fully-managed, to fully-manual. IMHO, respect to the author for the awesome tool, but if you need a web app to help you manage git, it hints that the tool is too complex.

I made the mistake of running git annex sync once, to realise it actually starts pushing to my upstream repo and creating new branches and whatnot, so after enough reading, I settled on workflow 6/7, since I don’t want another tool to manage my git history. Maybe I’m an outlier here, but everything “automatic” is a bit too much for me.

Once you do managed yourself how git-annex works (on the surface, at least), it is a pretty cool thing. It uses a git-annex git branch to store metainformation, and that is relatively clean. If you do run git annex sync, it creates some extra branches, which I don’t like, but meh.

Trick question: what is a remote?

One of the most confusing things about git-annex was understanding its “remote” concept. I thought a “remote” is a place where you replicate your data. But not, that’s a special remote. A normal remote is a git remote, but which is expected to be git/ssh/with command line access. So if you have a git+ssh remote, git-annex will not only try to push it’s above-mentioned branch, but also copy the files. If such a remote is on a forge that doesn’t support git-annex, then it will complain and get confused.

Of course, if you read the extensive docs, you just do git config remote.<name>.annex-ignore true, and it will understand that it should not “sync” to it.

But, aside, from this case, git-annex expects that all checkouts and clones of the repository are both metadata and data. And if you do any annex commands in them, all other clones will know about them! This can be unexpected, and you find people complaining about it, but nowadays there’s a solution:

git clone … dir && cd dir
git config annex.private true
git annex init "temp copy"

This is important. Any “leaf” git clone must be followed by that annex.private true config, especially on CI/CD machines. Honestly, I don’t understand why by default clones should be official data stores, but it is what it is.

I settled on not making any of my checkouts “stable”, but only the actual storage places. Except those are not git repositories, but just git-annex storage things. I.e., special remotes.

Is it confusing enough yet ? 😄

Special remotes

The special remotes, as said, is what I expected to be the normal git annex remotes, i.e. places where the data is stored. But well, they exist, and while I’m only using a couple simple ones, there is a large number of them. Among the interesting ones: git-lfs, a remote that allows also storing the git repository itself (git-remote-annex), although I’m bit confused about this one, and most of the common storage providers via the rclone remote.

Plus, all of the special remotes support encryption, so this is a really neat way to store your files across a large number of things, and handle replication, number of copies, from which copy to retrieve, etc. as you with.

And many of other features

git-annex has tons of other features, so to some extent, the sky’s the limit. Automatic selection of what to add git it vs plain git, encryption handling, number of copies, clusters, computed files, etc. etc. etc. I still think it’s cool but too complex, though!

Uses

Aside from my blog post, of course.

I’ve seen blog posts/comments about people using git-annex to track/store their photo collection, and I could see very well how the remote encrypted repos—any of the services supported by rclone could be an N+2 copy or so. For me, tracking photos would be a bit too tedious, but it could maybe work after more research.

A more practical thing would probably be replicating my local movie collection (all legal, to be clear) better than “just run rsync from time to time” and tracking the large files in it via git-annex. That’s an exercise for another day, though, once I get more mileage with it - my blog pictures are copies, so I don’t care much if they get lost, but movies are primary online copies, and I don’t want to re-dump the discs. Anyway, for later.

Migrating to git-annex

Migrating here means ending in a state where all large files are in git-annex, and the plain git repo is small. Just moving the files to git annex at the current head doesn’t remove them from history, so your git repository is still large; it won’t grow in the future, but remains with old size (and contains the large files in its history).

In my mind, a nice migration would be: run a custom command, and all the history is migrated to git-annex, so I can go back in time and the still use git-annex. I naïvely expected this would be easy and already available, only to find comments on the git-annex site with unsure git-filter-branch calls and some web discussions. This is the discussion on the git annex website, but it didn’t make me confident it would do the right thing.

But that discussion is now 8 years old. Surely in 2025, with git-filter-repo, it’s easier? And, maybe I’m missing something, but it is not. Not from the point of view of plain git, that’s easy, but because interacting with git-annex, which stores its data in git itself, so doing this properly across successive steps of a repo (when replaying the commits) is, I think, not well defined behaviour.

So I was stuck here for a few days, until I got an epiphany: As I’m going to rewrite the repository, of course I’m keeping a copy of it from before git-annex. If so, I don’t need the history, back in time, to be correct in the sense of being able to retrieve the binary files too. It just needs to be correct from the point of view of the actual Markdown and Haskell files that represent the “meat” of the blog.

This simplified the problem a lot. At first, I wanted to just skip these files, but this could also drop commits (git-filter-repo, by default, drops the commits if they’re empty), and removing the files loses information - when they were added, what were the paths, etc. So instead I came up with a rather clever idea, if I might say so: since git-annex replaces files with symlinks already, just replace the files with symlinks in the whole history, except symlinks that are dangling (to represent the fact that files are missing). One could also use empty files, but empty files are more “valid” in a sense than dangling symlinks, hence why I settled on those.

Doing this with git-filter-repo is easy, in newer versions, with the new --file-info-callback. Here is the simple code I used:

import os
import os.path
import pathlib

SKIP_EXTENSIONS={'jpg', 'jpeg', 'png', 'pdf', 'woff', 'woff2'}
FILE_MODES = {b"100644", b"100755"}
SYMLINK_MODE = b"120000"

fas_string = filename.decode()
path = pathlib.PurePosixPath(fas_string)
ext = path.suffix.removeprefix('.')

if ext not in SKIP_EXTENSIONS:
  return (filename, mode, blob_id)

if mode not in FILE_MODES:
  return (filename, mode, blob_id)

print(f"Replacing '{filename}' (extension '.{ext}') in {os.getcwd()}")

symlink_target = '/none/binary-file-removed-from-git-history'.encode()
new_blob_id = value.insert_file_with_contents(symlink_target)
return (filename, SYMLINK_MODE, new_blob_id)

This goes and replaces files with a symlink to nowhere, but the symlink should explain why it’s dangling. Then later renames or moving the files around work “naturally”, as the rename/mv doesn’t care about file contents. Then, when the filtering is done via:

git-filter-repo --file-info-callback <(cat ~/filter-big.py ) --force

It is easy to onboard to git annex:

  • remove all dangling symlinks
  • copy the (binary) files from the original repository
  • since they’re named the same, and in the same places, git sees a type change
  • then simply run git annex add on those files

For me it was easy as all such files were in a few directories, so just copying those directories back, a few git-annex add commands, and done.

Of course, then adding a few rsync remotes, git annex copy --to, and the repository was ready.

Well, I also found a bug in my own Hakyll setup: on a fresh clone, when the large files are just dangling symlinks, the builder doesn’t complain, just ignores the images. Will have to fix.

Other resources

This is a blog that I read at the beginning, and I found it very useful as an intro: https://switowski.com/blog/git-annex/. It didn’t help me understand how it works under the covers, but it is well written. The author does use the ‘sync’ command though, which is too magic for me, but also agrees about its complexity 😅

The proof is in the pudding

And now, for the actual first image to be added that never lived in the old plain git repository. It’s not full-res/full-size, it’s cropped a bit on the bottom.

Earlier in the year, I went to Paris for a very brief work trip, and I walked around a bit—it was more beautiful than what I remembered from way way back. So a bit random selection of a picture, but here it is:

Un bateau sur la Seine Un bateau sur la Seine

Enjoy!

11 June, 2025 02:41PM

John Goerzen

I Learned We All Have Linux Seats, and I’m Not Entirely Pleased

I recently wrote about How to Use SSH with FIDO2/U2F Security Keys, which I now use on almost all of my machines.

The last one that needed this was my Raspberry Pi hooked up to my DEC vt510 terminal and IBM mechanical keyboard. Yes I do still use that setup!

To my surprise, generating a key on it failed. I very quickly saw that /dev/hidraw0 had incorrect permissions, accessible only to root.

On other machines, it looks like this:

crw-rw----+ 1 root root 243, 16 May 24 16:47 /dev/hidraw16

And, if I run getfacl on it, I see:

# file: dev/hidraw16
# owner: root
# group: root
user::rw-
user:jgoerzen:rw-
group::---
mask::rw-
other::---

Yes, something was setting an ACL on it. Thus began to saga to figure out what was doing that.

Firing up inotifywatch, I saw it was systemd-udevd or its udev-worker. But cranking up logging on that to maximum only showed me that uaccess was somehow doing this.

I started digging. uaccess turned out to be almost entirely undocumented. People say to use it, but there’s no description of what it does or how. Its purpose appears to be to grant access to devices to those logged in to a machine by dynamically adding them to ACLs for devices. OK, that’s a nice goal, but why was machine A doing this and not machine B?

I dug some more. I came across a hint that uaccess may only do that for a “seat”. A seat? I’ve not heard of that in Linux before.

Turns out there’s some information (older and newer) about this out there. Sure enough, on the machine with KDE, loginctl list-sessions shows me on seat0, but on the machine where I log in from ttyUSB0, it shows an empty seat.

But how to make myself part of the seat? I tried various udev rules to add the “seat” or “master-of-seat” tags, but nothing made any difference.

I finally gave up and did the old-fashioned rule to just make it work already:

TAG=="security-device",SUBSYSTEM=="hidraw",GROUP="mygroup"

I still don’t know how to teach logind to add a seat for ttyUSB0, but oh well. At least I learned something. An annoying something, but hey.

This all had a laudable goal, but when there are so many layers of indirection, poorly documented, with poor logging, it gets pretty annoying.

11 June, 2025 02:12PM by John Goerzen

Scarlett Gately Moore

KDE Application snaps 25.04.2 released!

KDE MascotKDE Mascot

Release notes: https://kde.org/announcements/gear/25.04.2/

Now available in the snap store!

Along with that, I have fixed some outstanding bugs:

Ark: now can open/save files in removable media

Kasts: Once again has sound

WIP: Updating Qt6 to 6.9 and frameworks to 6.14

Enjoy everyone!

Unlike our software, life is not free. Please consider a donation, thanks!

11 June, 2025 01:14PM by sgmoore

hackergotchi for Freexian Collaborators

Freexian Collaborators

Monthly report about Debian Long Term Support, May 2025 (by Roberto C. Sánchez)

Like each month, have a look at the work funded by Freexian’s Debian LTS offering.

Debian LTS contributors

In May, 22 contributors have been paid to work on Debian LTS, their reports are available:

  • Abhijith PA did 8.0h (out of 0.0h assigned and 8.0h from previous period).
  • Adrian Bunk did 26.0h (out of 26.0h assigned).
  • Andreas Henriksson did 1.0h (out of 15.0h assigned and 3.0h from previous period), thus carrying over 17.0h to the next month.
  • Andrej Shadura did 3.0h (out of 10.0h assigned), thus carrying over 7.0h to the next month.
  • Bastien Roucariès did 20.0h (out of 20.0h assigned).
  • Ben Hutchings did 8.0h (out of 20.0h assigned and 4.0h from previous period), thus carrying over 16.0h to the next month.
  • Carlos Henrique Lima Melara did 12.0h (out of 11.0h assigned and 1.0h from previous period).
  • Chris Lamb did 15.5h (out of 0.0h assigned and 15.5h from previous period).
  • Daniel Leidert did 25.0h (out of 26.0h assigned), thus carrying over 1.0h to the next month.
  • Emilio Pozuelo Monfort did 21.0h (out of 16.75h assigned and 11.0h from previous period), thus carrying over 6.75h to the next month.
  • Guilhem Moulin did 11.5h (out of 8.5h assigned and 6.5h from previous period), thus carrying over 3.5h to the next month.
  • Jochen Sprickerhof did 3.5h (out of 8.75h assigned and 17.5h from previous period), thus carrying over 22.75h to the next month.
  • Lee Garrett did 26.0h (out of 12.75h assigned and 13.25h from previous period).
  • Lucas Kanashiro did 20.0h (out of 18.0h assigned and 2.0h from previous period).
  • Markus Koschany did 20.0h (out of 26.25h assigned), thus carrying over 6.25h to the next month.
  • Roberto C. Sánchez did 20.75h (out of 24.0h assigned), thus carrying over 3.25h to the next month.
  • Santiago Ruano Rincón did 15.0h (out of 12.5h assigned and 2.5h from previous period).
  • Sean Whitton did 6.25h (out of 6.0h assigned and 2.0h from previous period), thus carrying over 1.75h to the next month.
  • Sylvain Beucler did 26.25h (out of 26.25h assigned).
  • Thorsten Alteholz did 15.0h (out of 15.0h assigned).
  • Tobias Frost did 12.0h (out of 12.0h assigned).
  • Utkarsh Gupta did 1.0h (out of 15.0h assigned), thus carrying over 14.0h to the next month.

Evolution of the situation

In May, we released 54 DLAs.

The LTS Team was particularly active in May, publishing a higher than normal number of advisories, as well as helping with a wide range of updates to packages in stable and unstable, plus some other interesting work. We are also pleased to welcome several updates from contributors outside the regular team.

  • Notable security updates:
    • containerd, prepared by Andreas Henriksson, fixes a vulnerability that could cause containers launched as non-root users to be run as root
    • libapache2-mod-auth-openidc, prepared by Moritz Schlarb, fixes a vulnerability which could allow an attacker to crash an Apache web server with libapache2-mod-auth-openidc installed
    • request-tracker4, prepared by Andrew Ruthven, fixes multiple vulnerabilities which could result in information disclosure, cross-site scripting and use of weak encryption for S/MIME emails
    • postgresql-13, prepared by Bastien Roucariès, fixes an application crash vulnerability that could affect the server or applications using libpq
    • dropbear, prepared by Guilhem Moulin, fixes a vulnerability which could potentially result in execution of arbitrary shell commands
    • openjdk-17, openjdk-11, prepared by Thorsten Glaser, fixes several vulnerabilities, which include denial of service, information disclosure or bypass of sandbox restrictions
    • glibc, prepared by Sean Whitton, fixes a privilege escalation vulnerability
  • Notable non-security updates:
    • wireless-regdb, prepared by Ben Hutchings, updates information reflecting changes to radio regulations in many countries

This month’s contributions from outside the regular team include the libapache2-mod-auth-openidc update mentioned above, prepared by Moritz Schlarb (the maintainer of the package); the update of request-tracker4, prepared by Andrew Ruthven (the maintainer of the package); and the updates of openjdk-17 and openjdk-11, also noted above, prepared by Thorsten Glaser.

Additionally, LTS Team members contributed stable updates of the following packages:

  • rubygems and yelp/yelp-xsl, prepared by Lucas Kanashiro
  • simplesamlphp, prepared by Tobias Frost
  • libbson-xs-perl, prepared by Roberto C. Sánchez
  • fossil, prepared by Sylvain Beucler
  • setuptools and mydumper, prepared by Lee Garrett
  • redis and webpy, prepared by Adrian Bunk
  • xrdp, prepared by Abhijith PA
  • tcpdf, prepared by Santiago Ruano Rincón
  • kmail-account-wizard, prepared by Thorsten Alteholz

Other contributions were also made by LTS Team members to packages in unstable:

  • proftpd-dfsg DEP-8 tests (autopkgtests) were provided to the maintainer, prepared by Lucas Kanashiro
  • a regular upload of libsoup2.4, prepared by Sean Whitton
  • a regular upload of setuptools, prepared by Lee Garrett

Freexian, the entity behind the management of the Debian LTS project, has been working for some time now on the development of an advanced CI platform for Debian-based distributions, called Debusine. Recently, Debusine has reached a level of feature implementation that makes it very usable. Some members of the LTS Team have been using Debusine informally, and during May LTS coordinator Santiago Ruano Rincón has made a call for the team to help with testing of Debusine, and to help evaluate its suitability for the LTS Team to eventually begin using as the primary mechanism for uploading packages into Debian. Team members who have started using Debusine are providing valuable feedback to the Debusine development team, thus helping to improve the platform for all users. Actually, a number of updates, for both bullseye and bookworm, made during the month of May were handled using Debusine, e.g. rubygems’s DLA-4163-1.

By the way, if you are a Debian Developer, you can easily test Debusine following the instructions found at https://wiki.debian.org/DebusineDebianNet.

DebConf, the annual Debian Conference, is coming up in July and, as is customary each year, the week preceding the conference will feature an event called DebCamp. The DebCamp week provides an opportunity for teams and other interested groups/individuals to meet together in person in the same venue as the conference itself, with the purpose of doing focused work, often called “sprints”. LTS coordinator Roberto C. Sánchez has announced that the LTS Team is planning to hold a sprint primarily focused on the Debian security tracker and the associated tooling used by the LTS Team and the Debian Security Team.

Thanks to our sponsors

Sponsors that joined recently are in bold.

11 June, 2025 12:00AM by Roberto C. Sánchez

Debian Contributions: Updated Austin, DebConf 25 preparations continue and more! (by Anupa Ann Joseph)

Debian Contributions: 2025-05

Contributing to Debian is part of Freexian’s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

Updated Austin, by Colin Watson and Helmut Grohne

Austin is a frame stack sampling profiler for Python. It allows profiling Python applications without instrumenting them while losing some accuracy in the process, and is the only one of its kind presently packaged for Debian. Unfortunately, it hadn’t been uploaded in a while and hence the last Python version it worked with was 3.8. We updated it to a current version and also dealt with a number of architecture-specific problems (such as unintended sign promotion, 64bit time_t fallout and strictness due to -Wformat-security ) in cooperation with upstream. With luck, it will migrate in time for trixie.

Preparing for DebConf 25, by Stefano Rivera and Santiago Ruano Rincón

DebConf 25 is quickly approaching, and the organization work doesn’t stop. In May, Stefano continued supporting the different teams. Just to give a couple of examples, Stefano made changes in DebConf 25 website to make BoF and sprints submissions public, so interested people can already know if a BoF or sprint for a given subject is planned, allowing coordination with the proposer; or to enhance how statistics are made public to help the work of the local team.

Santiago has participated in different tasks, including the logistics of the conference, like preparing more information about the public transportation that will be available. Santiago has also taken part in activities related to fundraising and reviewing more event proposals.

Miscellaneous contributions

  • Lucas fixed security issues in Valkey in unstable.
  • Lucas tried to help with the update of Redis to version 8 in unstable. The package hadn’t been updated for a while due to licensing issues, but now upstream maintainers fixed them.
  • Lucas uploaded around 20 ruby-* packages to unstable that weren’t updated for some years to make them build reproducible. Thanks to reproducible builds folks to point out those issues. Also some unblock requests (and follow-ups) were needed to make them reach trixie in time for the release.
  • Lucas is organizing a Debian Outreach session for DebConf 25, reaching out to all interns of Google Summer of Code and Outreachy programs from the last year. The session will be presented by in-person interns and also video recordings from the interns interested in participating but did not manage to attend the conference.
  • Lucas continuously works on DebConf Content team tasks. Replying to speakers, sponsors, and communicating internally with the team.
  • Carles improved po-debconf-manager: fixed bugs reported by Catalan translator, added possibility to import packages out of salsa, added using non-default project branches on salsa, polish to get ready for DebCamp.
  • Carles tested new “apt” in trixie and reported bugs to “apt”, “installation-report”, “libqt6widget6”.
  • Carles used po-debconf-manager and imported remaining 80 packages, reviewed 20 translations, submitted (MR or bugs) 54 translations.
  • Carles prepared some topics for translation BoF in DebConf (gathered feedback, first pass on topics).
  • Helmut gave an introductory talk about the mechanics of Linux namespaces at MiniDebConf Hamburg.
  • Helmut sent 25 patches for cross compilation failures.
  • Helmut reviewed, refined and applied a patch from Jochen Sprickerhof to make the Multi-Arch hinter emit more hints for pure Python modules.
  • Helmut sat down with Christoph Berg (not affiliated with Freexian) and extended unschroot to support directory-based chroots with overlayfs. This is a feature that was lost in transitioning from sbuild’s schroot backend to its unshare backend. unschroot implements the schroot API just enough to be usable with sbuild and otherwise works a lot like the unshare backend. As a result, apt.postgresql.org now performs its builds contained in a user namespace.
  • Helmut looked into a fair number of rebootstrap failures most of which related to musl or gcc-15 and imported patches or workarounds to make those builds proceed.
  • Helmut updated dumat to use sqop fixing earlier PGP verification problems thanks to Justus Winter and Neal Walfield explaining a lot of sequoia at MiniDebConf Hamburg.
  • Helmut got the previous zutils update for /usr-move wrong again and had to send another update.
  • Helmut looked into why debvm’s autopkgtests were flaky and with lots of help from Paul Gevers and Michael Tokarev tracked it down to a race condition in qemu. He updated debvm to trigger the problem less often and also fixed a wrong dependency using Luca Boccassi’s patch.
  • Santiago continued the switch to sbuild for Salsa CI (that was stopped for some months), and has been mainly testing linux, since it’s a complex project that heavily customizes the pipeline. Santiago is preparing the changes for linux to submit a MR soon.
  • In openssh, Colin tracked down some intermittent sshd crashes to a root cause, and issued bookworm and bullseye updates for CVE-2025-32728.
  • Colin spent some time fixing up fail2ban, mainly reverting a patch that caused its tests to fail and would have banned legitimate users in some common cases.
  • Colin backported upstream fixes for CVE-2025-48383 (django-select2) and CVE-2025-47287 (python-tornado) to unstable.
  • Stefano supported video streaming and recording for 2 miniDebConfs in May: Maceió and Hamburg. These had overlapping streams for one day, which is a first for us.
  • Stefano packaged the new version of python-virtualenv that includes our patches for not including the wheel for wheel.
  • Stefano got all involved parties to agree (in principle) to meet at DebConf for a mediated discussion on a dispute that was brought to the technical committee.
  • Anupa coordinated the swag purchase for DebConf 25 with Juliana and Nattie.
  • Anupa joined the publicity team meeting for discussing the upcoming events and BoF at DebConf 25.
  • Anupa worked with the publicity team to publish Bits post to welcome GSoc 2025 Interns.

11 June, 2025 12:00AM by Anupa Ann Joseph

June 08, 2025

Thorsten Alteholz

My Debian Activities in May 2025

Debian LTS

This was my hundred-thirty-first month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. During my allocated time I uploaded or worked on:

  • [DLA 4168-1] openafs security update of three CVEs related to theft of credentials, crashes or buffer overflows.
  • [DLA 4196-1] kmail-account-wizard security update to fix one CVE related to a man-in-the-middle attack when using http instead of https to get some configuration.
  • [DLA 4198-1] espeak-ng security update to fix five CVEs related to buffer overflow or underflow in several functions and a floating point exception. Thanks to Samuel Thibault for having a look at my debdiff.
  • [#1106867] created Bookworm pu-bug for kmail-account-wizard. Thanks to Patrick Franz for having a look at my debdiff.

I also continued my to work on libxmltok and suricata. This month I also had to do some support on seger, for example to inject packages newly needed for builds.

Debian ELTS

This month was the eighty-second ELTS month. During my allocated time I uploaded or worked on:

  • [ELA-1444-1] kmail-account-wizard security update to fix two CVEs in Buster related to a man-in-the-middle attack when using http instead of https to get some configuration. The other issue is about a misleading UI, in which the state of encryption is shown wrong.
  • [ELA-1445-1] espeak-ng security update to fix five CVEs in Stretch and Buster. The issues are related to buffer overflow or underflow in several functions and a floating point exception.

All packages I worked on have been on the list of longstanding packages. For example espeak-ng has been on this list for more than nine month. I now understood that there is a reason why packages are on this list. Some parts of the software have been almost completely reworked, so that the patches need a “reverse” rework. For some packages this is easy, but for others this rework needs quite some time. I also continued to work on libxmltok and suricata.

Debian Printing

Unfortunately I didn’t found any time to work on this topic.

Debian Astro

This month I uploaded bugfix versions of:

Debian Mobcom

This month I uploaded bugfix versions of:

misc

This month I uploaded bugfix versions of:

Thanks a lot to the Release Team who quickly handled all my unblock bugs!

FTP master

It is this time of the year when just a few packages arrive in NEW: it is Hard Freeze. So I enjoy this period and basically just take care of kernels or other important packages. As people seem to be more interested in discussions than in fixing RC bugs, my period of rest seems to continue for a while. So thanks for all this valuable discussions and really thanks to the few people who still take care of Trixie. This month I accepted 146 and rejected 10 packages. The overall number of packages that got accepted was 147.

08 June, 2025 05:48PM by alteholz

hackergotchi for Colin Watson

Colin Watson

Free software activity in May 2025

My Debian contributions this month were all sponsored by Freexian. Things were a bit quieter than usual, as for the most part I was sticking to things that seemed urgent for the upcoming trixie release.

You can also support my work directly via Liberapay or GitHub Sponsors.

OpenSSH

After my appeal for help last month to debug intermittent sshd crashes, Michel Casabona helped me put together an environment where I could reproduce it, which allowed me to track it down to a root cause and fix it. (I also found a misuse of strlcpy affecting at least glibc-based systems in passing, though I think that was unrelated.)

I worked with Daniel Kahn Gillmor to fix a regression in ssh-agent socket handling.

I fixed a reproducibility bug depending on whether passwd is installed on the build system, which would have affected security updates during the lifetime of trixie.

I backported openssh 1:10.0p1-5 to bookworm-backports.

I issued bookworm and bullseye updates for CVE-2025-32728.

groff

I backported a fix for incorrect output when formatting multiple documents as PDF/PostScript at once.

debmirror

I added a simple autopkgtest.

Python team

I upgraded these packages to new upstream versions:

  • automat
  • celery
  • flufl.i18n
  • flufl.lock
  • frozenlist
  • python-charset-normalizer
  • python-evalidate (including pointing out an upstream release handling issue)
  • python-pythonjsonlogger
  • python-setproctitle
  • python-telethon
  • python-typing-inspection
  • python-webargs
  • pyzmq
  • trove-classifiers (including a small upstream cleanup)
  • uncertainties
  • zope.testrunner

In bookworm-backports, I updated these packages:

  • python-django to 3:4.2.21-1 (issuing BSA-124)
  • python-django-pgtrigger to 4.14.0-1

I fixed problems building these packages reproducibly:

I backported fixes for some security vulnerabilities to unstable (since we’re in freeze now so it’s not always appropriate to upgrade to new upstream versions):

I fixed various other build/test failures:

I added non-superficial autopkgtests to these packages:

I packaged python-django-hashids and python-django-pgbulk, needed for new upstream versions of python-django-pgtrigger.

I ported storm to Python 3.14.

Science team

I fixed a build failure in apertium-oci-fra.

08 June, 2025 12:20AM by Colin Watson

June 07, 2025

hackergotchi for Evgeni Golov

Evgeni Golov

show your desk - 2025 edition

Back in 2020 I posted about my desk setup at home.

Recently someone in our #remotees channel at work asked about WFH setups and given quite a few things changed in mine, I thought it's time to post an update.

But first, a picture! standing desk with a monitor, laptop etc (Yes, it's cleaner than usual, how could you tell?!)

desk

It's still the same Flexispot E5B, no change here. After 7 years (I bought mine in 2018) it still works fine. If I'd have to buy a new one, I'd probably get a four-legged one for more stability (they got quite affordable now), but there is no immediate need for that.

chair

It's still the IKEA Volmar. Again, no complaints here.

hardware

Now here we finally have some updates!

laptop

A Lenovo ThinkPad X1 Carbon Gen 12, Intel Core Ultra 7 165U, 32GB RAM, running Fedora (42 at the moment).

It's connected to a Lenovo ThinkPad Thunderbolt 4 Dock. It just works™.

workstation

It's still the P410, but mostly unused these days.

monitor

An AOC U2790PQU 27" 4K. I'm running it at 150% scaling, which works quite decently these days (no comparison to when I got it).

speakers

As the new monitor didn't want to take the old Dell soundbar, I have upgraded to a pair of Alesis M1Active 330 USB.

They sound good and were not too expensive.

I had to fix the volume control after some time though.

webcam

It's still the Logitech C920 Pro.

microphone

The built in mic of the C920 is really fine, but to do conference-grade talks (and some podcasts 😅), I decided to get something better.

I got a FIFINE K669B, with a nice arm.

It's not a Shure, for sure, but does the job well and Christian was quite satisfied with the results when we recorded the Debian and Foreman specials of Focus on Linux.

keyboard

It's still the ThinkPad Compact USB Keyboard with TrackPoint.

I had to print a few fixes and replacement parts for it, but otherwise it's doing great.

Seems Lenovo stopped making those, so I really shouldn't break it any further.

mouse

Logitech MX Master 3S. The surface of the old MX Master 2 got very sticky at some point and it had to be replaced.

other

notepad

I'm still terrible at remembering things, so I still write them down in an A5 notepad.

whiteboard

I've also added a (small) whiteboard on the wall right of the desk, mostly used for long term todo lists.

coaster

Turns out Xeon-based coasters are super stable, so it lives on!

yubikey

Yepp, still a thing. Still USB-A because... reasons.

headphones

Still the Bose QC25, by now on the third set of ear cushions, but otherwise working great and the odd 15€ cushion replacement does not justify buying anything newer (which would have the same problem after some time, I guess).

I did add a cheap (~10€) Bluetooth-to-Headphonejack dongle, so I can use them with my phone too (shakes fist at modern phones).

And I do use the headphones more in meetings, as the Alesis speakers fill the room more with sound and thus sometimes produce a bit of an echo.

charger

The Bose need AAA batteries, and so do some other gadgets in the house, so there is a technoline BC 700 charger for AA and AAA on my desk these days.

light

Yepp, I've added an IKEA Tertial and an ALDI "face" light. No, I don't use them much.

KVM switch

I've "built" a KVM switch out of an USB switch, but given I don't use the workstation that often these days, the switch is also mostly unused.

07 June, 2025 03:17PM by evgeni

June 06, 2025

Reproducible Builds

Reproducible Builds in May 2025

Welcome to our 5th report from the Reproducible Builds project in 2025! Our monthly reports outline what we’ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. If you are interested in contributing to the Reproducible Builds project, please do visit the Contribute page on our website.

In this report:

  1. Security audit of Reproducible Builds tools published
  2. When good pseudorandom numbers go bad
  3. Academic articles
  4. Distribution work
  5. diffoscope and disorderfs
  6. Website updates
  7. Reproducibility testing framework
  8. Upstream patches

Security audit of Reproducible Builds tools published

The Open Technology Fund’s (OTF) security partner Security Research Labs recently an conducted audit of some specific parts of tools developed by Reproducible Builds. This form of security audit, sometimes called a “whitebox” audit, is a form testing in which auditors have complete knowledge of the item being tested. They auditors assessed the various codebases for resilience against hacking, with key areas including differential report formats in diffoscope, common client web attacks, command injection, privilege management, hidden modifications in the build process and attack vectors that might enable denials of service.

The audit focused on three core Reproducible Builds tools: diffoscope, a Python application that unpacks archives of files and directories and transforms their binary formats into human-readable form in order to compare them; strip-nondeterminism, a Perl program that improves reproducibility by stripping out non-deterministic information such as timestamps or other elements introduced during packaging; and reprotest, a Python application that builds source code multiple times in various environments in order to to test reproducibility.

OTF’s announcement contains more of an overview of the audit, and the full 24-page report is available in PDF form as well.


When good pseudorandom numbers go bad

Danielle Navarro published an interesting and amusing article on their blog on When good pseudorandom numbers go bad. Danielle sets the stage as follows:

[Colleagues] approached me to talk about a reproducibility issue they’d been having with some R code. They’d been running simulations that rely on generating samples from a multivariate normal distribution, and despite doing the prudent thing and using set.seed() to control the state of the random number generator (RNG), the results were not computationally reproducible. The same code, executed on different machines, would produce different random numbers. The numbers weren’t “just a little bit different” in the way that we’ve all wearily learned to expect when you try to force computers to do mathematics. They were painfully, brutally, catastrophically, irreproducible different. Somewhere, somehow, something broke.

Thanks to David Wheeler for posting about this article on our mailing list


Academic articles

There were two scholarly articles published this month that related to reproducibility:

Daniel Hugenroth and Alastair R. Beresford of the University of Cambridge in the United Kingdom and Mario Lins and René Mayrhofer of Johannes Kepler University in Linz, Austria published an article titled Attestable builds: compiling verifiable binaries on untrusted systems using trusted execution environments. In their paper, they:

present attestable builds, a new paradigm to provide strong source-to-binary correspondence in software artifacts. We tackle the challenge of opaque build pipelines that disconnect the trust between source code, which can be understood and audited, and the final binary artifact, which is difficult to inspect. Our system uses modern trusted execution environments (TEEs) and sandboxed build containers to provide strong guarantees that a given artifact was correctly built from a specific source code snapshot. As such it complements existing approaches like reproducible builds which typically require time-intensive modifications to existing build configurations and dependencies, and require independent parties to continuously build and verify artifacts.

The authors compare “attestable builds” with reproducible builds by noting an attestable build requires “only minimal changes to an existing project, and offers nearly instantaneous verification of the correspondence between a given binary and the source code and build pipeline used to construct it”, and proceed by determining that t”he overhead (42 seconds start-up latency and 14% increase in build duration) is small in comparison to the overall build time.”


Timo Pohl, Pavel Novák, Marc Ohm and Michael Meier have published a paper called Towards Reproducibility for Software Packages in Scripting Language Ecosystems. The authors note that past research into Reproducible Builds has focused primarily on compiled languages and their ecosystems, with a further emphasis on Linux distribution packages:

However, the popular scripting language ecosystems potentially face unique issues given the systematic difference in distributed artifacts. This Systemization of Knowledge (SoK) [paper] provides an overview of existing research, aiming to highlight future directions, as well as chances to transfer existing knowledge from compiled language ecosystems. To that end, we work out key aspects in current research, systematize identified challenges for software reproducibility, and map them between the ecosystems.

Ultimately, the three authors find that the literature is “sparse”, focusing on few individual problems and ecosystems, and therefore identify space for more critical research.


Distribution work

In Debian this month:


Hans-Christoph Steiner of the F-Droid catalogue of open source applications for the Android platform published a blog post on Making reproducible builds visible. Noting that “Reproducible builds are essential in order to have trustworthy software”, Hans also mentions that “F-Droid has been delivering reproducible builds since 2015”. However:

There is now a “Reproducibility Status” link for each app on f-droid.org, listed on every app’s page. Our verification server shows ✔️️ or 💔 based on its build results, where ✔️️ means our rebuilder reproduced the same APK file and 💔 means it did not. The IzzyOnDroid repository has developed a more elaborate system of badges which displays a ✅ for each rebuilder. Additionally, there is a sketch of a five-level graph to represent some aspects about which processes were run.

Hans compares the approach with projects such as Arch Linux and Debian that “provide developer-facing tools to give feedback about reproducible builds, but do not display information about reproducible builds in the user-facing interfaces like the package management GUIs.”


Arnout Engelen of the NixOS project has been working on reproducing the minimal installation ISO image. This month, Arnout has successfully reproduced the build of the minimal image for the 25.05 release without relying on the binary cache. Work on also reproducing the graphical installer image is ongoing.


In openSUSE news, Bernhard M. Wiedemann posted another monthly update for their work there.


Lastly in Fedora news, Jelle van der Waa opened issues tracking reproducible issues in Haskell documentation, Qt6 recording the host kernel and R packages recording the current date. The R packages can be made reproducible with packaging changes in Fedora.


diffoscope & disorderfs

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 295, 296 and 297 to Debian:

  • Don’t rely on zipdetails’ --walk argument being available, and only add that argument on newer versions after we test for that. []
  • Review and merge support for NuGet packages from Omair Majid. []
  • Update copyright years. []
  • Merge support for an lzma comparator from Will Hollywood. [][]

Chris also merged an impressive changeset from Siva Mahadevan to make disorderfs more portable, especially on FreeBSD. disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues []. This was then uploaded to Debian as version 0.6.0-1.

Lastly, Vagrant Cascadian updated diffoscope in GNU Guix to version 296 [][] and 297 [][], and disorderfs to version 0.6.0 [][].


Website updates

Once again, there were a number of improvements made to our website this month including:


Reproducibility testing framework

The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility.

However, Holger Levsen posted to our mailing list this month in order to bring a wider awareness to funding issues faced by the Oregon State University (OSU) Open Source Lab (OSL). As mentioned on OSL’s public post, “recent changes in university funding makes our current funding model no longer sustainable [and that] unless we secure $250,000 in committed funds, the OSL will shut down later this year”. As Holger notes in his post to our mailing list, the Reproducible Builds project relies on hardware nodes hosted there. Nevertheless, Lance Albertson of OSL posted an update to the funding situation later in the month with broadly positive news.


Separate to this, there were various changes to the Jenkins setup this month, which is used as the backend driver of for both tests.reproducible-builds.org and reproduce.debian.net, including:

  • Migrating the central jenkins.debian.net server AMD Opteron to Intel Haswell CPUs. Thanks to IONOS for hosting this server since 2012.
  • After testing it for almost ten years, the i386 architecture has been dropped from tests.reproducible-builds.org. This is because that, with the upcoming release of Debian trixie, i386 is no longer supported as a ‘regular’ architecture — there will be no official kernel and no Debian installer for i386 systems. As a result, a large number of nodes hosted by Infomaniak have been retooled from i386 to amd64.
  • Another node, ionos17-amd64.debian.net, which is used for verifying packages for all.reproduce.debian.net (hosted by IONOS) has had its memory increased from 40 to 64GB, and the number of cores doubled to 32 as well. In addition, two nodes generously hosted by OSUOSL have had their memory doubled to 16GB.
  • Lastly, we have been granted access to more riscv64 architecture boards, so now we have seven such nodes, all with 16GB memory and 4 cores that are verifying packages for riscv64.reproduce.debian.net. Many thanks to PLCT Lab, ISCAS for providing those.


Outside of this, a number of smaller changes were also made by Holger Levsen:

  • reproduce.debian.net-related:

    • Only use two workers for the ppc64el architecture due to RAM size. []
    • Monitor nginx_request and nginx_status with the Munin monitoring system. [][]
    • Detect various variants of network and memory errors. [][][][]
    • Add a prominent link to reproducible-builds.org. []
    • Add a rebuilderd-cache-cleanup.service and run it daily via timer. [][][][][]
    • Be more verbose what sources are being downloaded. []
    • Correctly deal with packages with an epoch in their version [] and deal with binNMUs versions with an epoch as well [][].
    • Document how to reschedule all other errors on all archs. []
    • Misc documentation improvements. [][][][]
    • Include the $HOSTNAME variable in the rebuilderd logfiles. []
    • Install the equivs package on all worker nodes. [][]
  • Jenkins nodes:

    • Permit the sudo tool to fix up permission issues. [][]
    • Document how to manage diskspace with OpenStack. []
    • Ignore a number of spurious monitoring errors on riscv64, FreeBSD, etc.. [][][][]
    • Install ntpsec-ntpdate (instead of ntpdate) as the former is available on Debian trixie and bookworm. [][]
    • Use the same SSH ControlPath for all nodes. []
    • Make sure the munin user uses the same SSH config as the jenkins user. []
  • tests.reproducible-builds.org-related:

    • Disable testing of the i386 architecture. [][][][][]
    • Document the current disk usage. [][]
    • Address some image placement now that we only test three architectures. []
    • Keep track of build performance. []
  • Misc:

    • Fix a (harmless) typo in the multiarch_versionskew script. []

In addition, Jochen Sprickerhof made a series of changes related to reproduce.debian.net:

  • Add out of memory detection to the statistics page. []
  • Reverse the sorting order on the statistics page. [][][][]
  • Improve the spacing between statistics groups. []
  • Update a (hard-coded) line number in error message detection pertaining to a debrebuild line number. []
  • Support Debian unstable in the rebuilder-debian.sh script. []]
  • Rely on rebuildctl to sync only ‘arch-specific’ packages. [][]


Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. This month, we wrote a large number of such patches, including:



Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

06 June, 2025 09:17PM

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

#49: The Two Cultures of Deploying Statistical Software

Welcome to post 49 in the R4 series.

The Two Cultures is a term first used by C.P. Snow in a 1959 speech and monograph focused on the split between humanities and the sciences. Decades later, the term was (quite famously) re-used by Leo Breiman in a (somewhat prophetic) 2001 article about the split between ‘data models’ and ‘algorithmic models’. In this note, we argue that statistical computing practice and deployment can also be described via this Two Cultures moniker.

Referring to the term linking these foundational pieces is of course headline bait. Yet when preparing for the discussion of r2u in the invited talk in Mons (video, slides), it occurred to me that there is in fact a wide gulf between two alternative approaches of using R and, specifically, deploying packages.

On the one hand we have the approach described by my friend Jeff as “you go to the Apple store, buy the nicest machine you can afford, install what you need and then never ever touch it”. A computer / workstation / laptop is seen as an immutable object where every attempt at change may lead to breakage, instability, and general chaos—and is hence best avoided. If you know Jeff, you know he exaggerates. Maybe only slightly though.

Similarly, an entire sub-culture of users striving for “reproducibility” (and sometimes also “replicability”) does the same. This is for example evidenced by the popularity of package renv by Rcpp collaborator and pal Kevin. The expressed hope is that by nailing down a (sub)set of packages, outcomes are constrained to be unchanged. Hope springs eternal, clearly. (Personally, if need be, I do the same with Docker containers and their respective Dockerfile.)

On the other hand, ‘rolling’ is fundamentally different approach. One (well known) example is Google building “everything at @HEAD”. The entire (ginormous) code base is considered as a mono-repo which at any point in time is expected to be buildable as is. All changes made are pre-tested to be free of side effects to other parts. This sounds hard, and likely is more involved than an alternative of a ‘whatever works’ approach of independent changes and just hoping for the best.

Another example is a rolling (Linux) distribution as for example Debian. Changes are first committed to a ‘staging’ place (Debian calls this the ‘unstable’ distribution) and, if no side effects are seen, propagated after a fixed number of days to the rolling distribution (called ‘testing’). With this mechanism, ‘testing’ should always be installable too. And based on the rolling distribution, at certain times (for Debian roughly every two years) a release is made from ‘testing’ into ‘stable’ (following more elaborate testing). The released ‘stable’ version is then immutable (apart from fixes for seriously grave bugs and of course security updates). So this provides the connection between frequent and rolling updates, and produces immutable fixed set: a release.

This Debian approach has been influential for any other projects—including CRAN as can be seen in aspects of its system providing a rolling set of curated packages. Instead of a staging area for all packages, extensive tests are made for candidate packages before adding an update. This aims to ensure quality and consistence—and has worked remarkably well. We argue that it has clearly contributed to the success and renown of CRAN.

Now, when accessing CRAN from R, we fundamentally have two accessor functions. But seemingly only one is widely known and used. In what we may call ‘the Jeff model’, everybody is happy to deploy install.packages() for initial installations.

That sentiment is clearly expressed by this bsky post:

One of my #rstats coding rituals is that every time I load a @vincentab.bsky.social package I go check for a new version because invariably it’s been updated with 18 new major features 😆

And that is why we have two cultures.

Because some of us, yours truly included, also use update.packages() at recurring (frequent !!) intervals: daily or near-daily for me. The goodness and, dare I say, gift of packages is not limited to those by my pal Vincent. CRAN updates all the time, and updates are (generally) full of (usually excellent) changes, fixes, or new features. So update frequently! Doing (many but small) updates (frequently) is less invasive than (large, infrequent) ‘waterfall’-style changes!

But the fear of change, or disruption, is clearly pervasive. One can only speculate why. Is the experience of updating so painful on other operating systems? Is it maybe a lack of exposure / tutorials on best practices?

These ‘Two Cultures’ coexist. When I delivered the talk in Mons, I briefly asked for a show of hands among all the R users in the audience to see who in fact does use update.packages() regularly. And maybe a handful of hands went up: surprisingly few!

Now back to the context of installing packages: Clearly ‘only installing’ has its uses. For continuous integration checks we generally install into ephemeral temporary setups. Some debugging work may be with one-off container or virtual machine setups. But all other uses may well be under ‘maintained’ setups. So consider calling update.packages() once in while. Or even weekly or daily. The rolling feature of CRAN is a real benefit, and it is there for the taking and enrichment of your statistical computing experience.

So to sum up, the real power is to use

  • install.packages() to obtain fabulous new statistical computing resources, ideally in an instant; and
  • update.packages() to keep these fabulous resources current and free of (known) bugs.

For both tasks, relying on binary installations accelerates and eases the process. And where available, using binary installation with system-dependency support as r2u does makes it easier still, following the r2u slogan of ‘Fast. Easy. Reliable. Pick All Three.’ Give it a try!

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

06 June, 2025 01:35AM

June 05, 2025

hackergotchi for Matthew Garrett

Matthew Garrett

How Twitter could (somewhat) fix their encrypted DMs

As I wrote in my last post, Twitter's new encrypted DM infrastructure is pretty awful. But the amount of work required to make it somewhat better isn't large.

When Juicebox is used with HSMs, it supports encrypting the communication between the client and the backend. This is handled by generating a unique keypair for each HSM. The public key is provided to the client, while the private key remains within the HSM. Even if you can see the traffic sent to the HSM, it's encrypted using the Noise protocol and so the user's encrypted secret data can't be retrieved.

But this is only useful if you know that the public key corresponds to a private key in the HSM! Right now there's no way to know this, but there's worse - the client doesn't have the public key built into it, it's supplied as a response to an API request made to Twitter's servers. Even if the current keys are associated with the HSMs, Twitter could swap them out with ones that aren't, terminate the encrypted connection at their endpoint, and then fake your query to the HSM and get the encrypted data that way. Worse, this could be done for specific targeted users, without any indication to the user that this has happened, making it almost impossible to detect in general.

This is at least partially fixable. Twitter could prove to a third party that their Juicebox keys were generated in an HSM, and the key material could be moved into clients. This makes attacking individual users more difficult (the backdoor code would need to be shipped in the public client), but can't easily help with the website version[1] even if a framework exists to analyse the clients and verify that the correct public keys are in use.

It's still worse than Signal. Use Signal.

[1] Since they could still just serve backdoored Javascript to specific users. This is, unfortunately, kind of an inherent problem when it comes to web-based clients - we don't have good frameworks to detect whether the site itself is malicious.

comment count unavailable comments

05 June, 2025 01:18PM

Twitter's new encrypted DMs aren't better than the old ones

(Edit: Twitter could improve this significantly with very few changes - I wrote about that here. It's unclear why they'd launch without doing that, since it entirely defeats the point of using HSMs)

When Twitter[1] launched encrypted DMs a couple
of years ago, it was the worst kind of end-to-end
encrypted - technically e2ee, but in a way that made it relatively easy for Twitter to inject new encryption keys and get everyone's messages anyway. It was also lacking a whole bunch of features such as "sending pictures", so the entire thing was largely a waste of time. But a couple of days ago, Elon announced the arrival of "XChat", a new encrypted message platform built on Rust with (Bitcoin style) encryption, whole new architecture. Maybe this time they've got it right?

tl;dr - no. Use Signal. Twitter can probably obtain your private keys, and admit that they can MITM you and have full access to your metadata.

The new approach is pretty similar to the old one in that it's based on pretty straightforward and well tested cryptographic primitives, but merely using good cryptography doesn't mean you end up with a good solution. This time they've pivoted away from using the underlying cryptographic primitives directly and into higher level abstractions, which is probably a good thing. They're using Libsodium's boxes for message encryption, which is, well, fine? It doesn't offer forward secrecy (if someone's private key is leaked then all existing messages can be decrypted) so it's a long way from the state of the art for a messaging client (Signal's had forward secrecy for over a decade!), but it's not inherently broken or anything. It is, however, written in C, not Rust[2].

That's about the extent of the good news. Twitter's old implementation involved clients generating keypairs and pushing the public key to Twitter. Each client (a physical device or a browser instance) had its own private key, and messages were simply encrypted to every public key associated with an account. This meant that new devices couldn't decrypt old messages, and also meant there was a maximum number of supported devices and terrible scaling issues and it was pretty bad. The new approach generates a keypair and then stores the private key using the Juicebox protocol. Other devices can then retrieve the private key.

Doesn't this mean Twitter has the private key? Well, no. There's a PIN involved, and the PIN is used to generate an encryption key. The stored copy of the private key is encrypted with that key, so if you don't know the PIN you can't decrypt the key. So we brute force the PIN, right? Juicebox actually protects against that - before the backend will hand over the encrypted key, you have to prove knowledge of the PIN to it (this is done in a clever way that doesn't directly reveal the PIN to the backend). If you ask for the key too many times while providing the wrong PIN, access is locked down.

But this is true only if the Juicebox backend is trustworthy. If the backend is controlled by someone untrustworthy[3] then they're going to be able to obtain the encrypted key material (even if it's in an HSM, they can simply watch what comes out of the HSM when the user authenticates if there's no validation of the HSM's keys). And now all they need is the PIN. Turning the PIN into an encryption key is done using the Argon2id key derivation function, using 32 iterations and a memory cost of 16MB (the Juicebox white paper says 16KB, but (a) that's laughably small and (b) the code says 16 * 1024 in an argument that takes kilobytes), which makes it computationally and moderately memory expensive to generate the encryption key used to decrypt the private key. How expensive? Well, on my (not very fast) laptop, that takes less than 0.2 seconds. How many attempts to I need to crack the PIN? Twitter's chosen to fix that to 4 digits, so a maximum of 10,000. You aren't going to need many machines running in parallel to bring this down to a very small amount of time, at which point private keys can, to a first approximation, be extracted at will.

Juicebox attempts to defend against this by supporting sharding your key over multiple backends, and only requiring a subset of those to recover the original. I can't find any evidence that Twitter's does seem to be making use of this,Twitter uses three backends and requires data from at least two, but all the backends used are under x.com so are presumably under Twitter's direct control. Trusting the keystore without needing to trust whoever's hosting it requires a trustworthy communications mechanism between the client and the keystore. If the device you're talking to can prove that it's an HSM that implements the attempt limiting protocol and has no other mechanism to export the data, this can be made to work. Signal makes use of something along these lines using Intel SGX for contact list and settings storage and recovery, and Google and Apple also have documentation about how they handle this in ways that make it difficult for them to obtain backed up key material. Twitter has no documentation of this, and as far as I can tell does nothing to prove that the backend is in any way trustworthy. (Edit to add: The Juicebox API does support authenticated communication between the client and the HSM, but that relies on you having some way to prove that the public key you're presented with corresponds to a private key that only exists in the HSM. Twitter gives you the public key whenever you communicate with them, so even if they've implemented this properly you can't prove they haven't made up a new key and MITMed you the next time you retrieve your key)

On the plus side, Juicebox is written in Rust, so Elon's not 100% wrong. Just mostly wrong.

But ok, at least you've got viable end-to-end encryption even if someone can put in some (not all that much, really) effort to obtain your private key and render it all pointless? Actually no, since you're still relying on the Twitter server to give you the public key of the other party and there's no out of band mechanism to do that or verify the authenticity of that public key at present. Twitter can simply give you a public key where they control the private key, decrypt the message, and then reencrypt it with the intended recipient's key and pass it on. The support page makes it clear that this is a known shortcoming and that it'll be fixed at some point, but they said that about the original encrypted DM support and it never was, so that's probably dependent on whether Elon gets distracted by something else again. And the server knows who and when you're messaging even if they haven't bothered to break your private key, so there's a lot of metadata leakage.

Signal doesn't have these shortcomings. Use Signal.

[1] I'll respect their name change once Elon respects his daughter

[2] There are implementations written in Rust, but Twitter's using the C one with these JNI bindings

[3] Or someone nominally trustworthy but who's been compelled to act against your interests - even if Elon were absolutely committed to protecting all his users, his overarching goals for Twitter require him to have legal presence in multiple jurisdictions that are not necessarily above placing employees in physical danger if there's a perception that they could obtain someone's encryption keys

comment count unavailable comments

05 June, 2025 11:02AM

June 04, 2025

hackergotchi for Gunnar Wolf

Gunnar Wolf

The subjective value of privacy • Assessing individuals' calculus of costs and benefits in the context of state surveillance

This post is an unpublished review for The subjective value of privacy • Assessing individuals' calculus of costs and benefits in the context of state surveillance

Internet users, software developers, academics, entrepreneurs – basically everybody is now aware of the importance of considering privacy as a core part of our online experience. User demand, and various national or regional laws, have made privacy a continuously present subject. However, how do regular people –like ourselves, in our many capacities– feel about privacy? Lukas Antoine presents a series of experiments aiming at better understanding how people throughout the world understands privacy, and when is privacy held as more or less important than security in different aspects,

Particularly, privacy is often portrayed as a value set at tension against surveillance, and particularly state surveillance, in the name of security: conventional wisdom presents the idea of privacy calculus. This is, it is often assumed that individuals continuously evaluate the costs and benefits of divulging their personal data, sharing data when they expect a positive net outcome, and denying it otherwise. This framework has been accepted for decades, and the author wishes to challenge it. This book is clearly his doctoral thesis on political sciences, and its contents are as thorough as expected in this kind of product.

The author presents three empirical studies based on cross-survey analysis. The first experiment explores the security justifications for surveillance and how they influence their support. The second one searches whether the stance on surveillance can be made dependent on personal convenience or financial cost. The third study explores whether privacy attitude is context-dependant or can be seen as a stable personality trait. The studies aim to address the shortcomings of published literature in the field, mainly, (a) the lack of comprehensive research on state surveillance, needed or better understanding privacy appreciation, (b) while several studies have tackled the subjective measure of privacy, there is a lack of cross-national studies to explain wide-ranging phenomena, (c) most studies in this regard are based on population-based surveys, which cannot establish causal relationships, (d) a seemingly blind acceptance of the privacy calculus mentioned above, with no strong evidence that it accurately measures people’s motivations for disclosing or withholding their data.

The book is full with theoretical references and does a very good job of explaining the path followed by the author. It is, though, a heavy read, and, for people not coming from the social sciences tradition, leads to the occasional feeling of being lost. The conceptual and theoretical frameworks and presented studies are thorough and clear. The author is honest in explaining when the data points at some of his hypotheses being disproven, while others are confirmed.

The aim of the book is for people digging deep into this topic. Personally, I have authored several works on different aspects of privacy, but this book did get me thinking on many issues I had not previously considered. My only complaint would be that, for the publication as part of its highly prestigious publisher, little attention has been paid to editorial aspects: sub-subsection depth is often excessive and unclear. Also, when publishing monographs based on doctoral works, it is customary to no longer refer to the work as a “thesis” and to soften some of the formal requirements such a work often has, with the aim of producing a more gentle and readable book; this book seems just like the mass-production of an (otherwise very interesting and well made) thesis work.

04 June, 2025 03:40PM

Humanities and big data in Ibero-America • Theory, methodology and practical applications

This post is an unpublished review for Humanities and big data in Ibero-America • Theory, methodology and practical applications

Digital humanities is a young–though established–field. It deals with different expressions in which digital data manipulation techniques can be applied and used to analyze subjects that are identified as belonging to the humanities. Although most often used to analyze different aspects of literature or social network analysis, it can also be applied to other humanistic disciplines or artistic expressions. Digital humanities employs many tools, but those categorized as big data are among the most frequently employed. This book samples different takes on digital humanities, with the particularity that it focuses on Ibero-American uses. It is worth noting that this book is the second in a series of four volumes, published or set to be published between 2022 and 2026. Being the output of a field survey, I perceive this book to be targeted towards fellow Digital Humanists – people interested in applying computational methods to further understand and research topics in the humanities. It is not a technical book in the sense Computer Science people would recognize as such, but several of the presented works do benefit from understanding some technical concepts.

The 12 articles (plus an introduction) that make up this book are organized in three parts:

(1) “Theoretical Framework” presents the ideas and techniques of data science (that make up the tools for handling big data), and explores how data science can contribute to literary analysis, all while noting that many such techniques are usually frowned upon in Latin America as data science “smells neoliberal”;

(2) “Methodological Issues” looks at specific issues through the lens of how they can be applied to big data, with specific attention given to works in Spanish; and

(3) “Practical Applications” analyzes specific Spanish works and communities based on big data techniques.

Several chapters treat a recurring theme: the simultaneous resistance and appropriation of big data by humanists. For example, at least three of the chapters describe the tensions between humanism (“aesthesis”) and cold, number-oriented data analysis (“mathesis”).

The analyzed works of Parts 2 and 3 are interesting and relatively easy to follow.

Some inescapable ideological gleans from several word uses – from the book’s and series’ name, which refers to the Spanish-speaking regions as “Ibero-America”, often seen as Eurocentric, in contrast with the “Latin America” term much more widely used throughout the region.

I will end with some notes about the specific versions of the book I reviewed. I read both an EPUB version and a print copy. The EPUB did not include links for easy navigation to footnotes, that is, the typographical superindexes are not hyperlinked to the location of the notes, so it is very impractical to try to follow them. The print version (unlike the EPUB) did not have an index, that is, the six pages before the introduction are missing from the print copy I received. For a book such as this one, not having an index hampers the ease of reading and referencing.

04 June, 2025 03:40PM

Beyond data poisoning in federated learning

This post is an unpublished review for Beyond data poisoning in federated learning

The current boom of artificial intelligence (AI) is based upon neural networks (NNs). In order for these to be useful, the network has to undergo a machine learning (ML) process: work over a series of inputs, and adjust the inner weights of the connections between neurons so that each of the data samples the network was trained on produces the right set of labels for each item. Federated learning (FL) appeared as a reaction given the data centralization power that traditional ML provides: instead of centrally controlling the whole training data, various different actors analyze disjoint subsets of data, and provide only the results of this analysis, thus increasing privacy while analyzing a large dataset. Finally, given multiple actors are involved in FL, how hard is it for a hostile actor to provide data that will confuse the NN, instead of helping it reach better performance? This kind of attack is termed a poisoning attack, and is the main focus of this paper. The authors set out to research how effective can a hyperdimensional data poisoning attack (HDPA) be to confuse a NN and cause it to misclassify both the items trained on and yet unseen items.

Data used for NN training is usually represented as a large set of orthogonal vectors, each describing a different aspect of the item, allowing for very simple vector arithmetic operations. Thus, NN training is termed as high-dimensional or hyperdimensional. The attack method described by the authors employs cosine similarity, that is, in order to preserve similarity, a target hypervector is reflected over a given dimension, yielding a cosine-similar result that will trick ML models, even if using byzantine-robust defenses.

The paper is clear, though not an easy read. It explains in detail the mathematical operations, following several related although different threat models. The authors present the results of the experimental evaluation of their proposed model, comparing it to several other well-known adversarial attacks for visual recognition tasks, over pre-labeled datasets frequently used as training data, such as MNIST, Fashion-MNIST and CIFAR-10. They show that their method is not only more effective as an attack, but falls within the same time range as other surveyed attacks.

Adversarial attacks are, all in all, an important way to advance any field of knowledge; by publishing this attack, the authors will surely spark other works to detect and prevent this kind of alteration. It is important for AI implementers to understand the nature of this field and be aware of the risks that this work, as well as others cited in it, highlight: ML will train a computer system to recognize a dataset, warts and all; efficient as AI is, if noise is allowed into the training data (particularly adversarially generated noise), the trained model might present impaired performance.

04 June, 2025 03:39PM

Computational modelling of robot personhood and relationality

This post is an unpublished review for Computational modelling of robot personhood and relationality

If humans and robots were to be able to roam around the same spaces, mutually recognizing each other for what they are, how would interaction be? How can we model such interactions in a way that we can reason about and understand the implications of a given behavior? This book aims at answering this question.

The book is split into two very different parts. Chapters 1 through 3 are mostly written with a philosophical angle. It starts by framing the possibility of having sentient androids exist in the same plane as humans, without them trying to pass as us or vice versa. The first chapters look at issues related to personhood, that is, how androids can be treated as valid interaction partners in a society with humans, and how interactions with them can be seen as meaningful. In doing so, several landmarks of the past 40 years in the AI field are reviewed. The issues of the “Significant Concerns” that make up a society and give it coherence and of “Personhood and Relationality”, describing how this permeates from a society into each of the individuals that make it up, the relations between them and the social objects that bring individuals closer together (or farther apart) are introduced and explained.

The second part of the book is written from a very different angle, and the change in pace took me somewhat by surprise. Each subsequent chapter presents a different angle of the “Affinity” system, a model that follows some aspects of human behavior over time and in a given space. Chapter 4 introduces the “Affinity” environment: a 3D simulated environment with simulated physical laws and characteristics, where a number of agents (30-50 is mentioned as usual) interact. Agents have a series of attributes (“value memory”), can adhere to different programs (“narratives”), and gain or lose on some vectors (“economy”). They can sense the world around them with sensors, and can modify the world or signal other agents using effectors.

The last two chapters round out the book, as expected: the first presents a set of results from analyzing a given set of value systems, and the second gives readers the conclusions reached by the author. However, I was expecting more–either having at least a link to download the “Affinity” system and continue exploring it or modifying some of the aspects it models to get it to model a set of agents with different stories and narratives, or extend it to yet unforseen behaviors, or at least have the author present a more complete comparison of results than the evaluation of patterns resulting from a given run. The author is a well-known, prolific author in the field, and I was expecting bigger insights from this book.

Nevertheless, the book is an interesting and fun read, with important insights in both the first and second parts. There is a certain lack of connection between their respective rhythms, and the second part indeed builds on the concepts introduced in the first one. Overall, I enjoyed reading the book despite expecting more.

04 June, 2025 03:39PM

Russell Coker

Trying DeepSeek R1

I saw this document on running DeepSeek R1 [1] and decided to give it a go. I downloaded the llama.cpp source and compiled it and downloaded the 131G of data as described. Running it with the default options gave about 7 CPU cores in use. Changing the --threads parameter to 44 caused it to use 17 CPU cores (changing it to larger numbers like 80 made it drop to 2.5 cores). I used the --n-gpu-layers parameter with the value of 1 as I currently have a GPU with only 6G of RAM (AliExpress is delaying my delivery of a PCIe power adaptor for a better GPU). Running it like this makes the GPU take 12W more power than standby and using 5.5G of VRAM according to nvidia-smi so it is doing a small amount of work, but not much. The documentation refers to the DeepSeek R1 1.58bit model which I’m using as having 61 layers so presumably less than 2% of the work is done on the GPU.

Running like this it takes 2 hours of CPU time (just over 3 minutes of elapsed time at 17 cores) to give 8 words of output. I didn’t let any tests run long enough to give complete output.

The documentation claims that it will run on CPU with 20G of RAM. In my tests it takes between 161G and 195G of RAM to run depending on the number of threads. The documentation describes running on the CPU as “very slow” which presumably means 3 words per minute on a system with a pair of E5-2699A v4 CPUs and 256G of RAM.

When I try to use more than 44 threads I get output like “system_info: n_threads = 200 (n_threads_batch = 200) / 44” and it seems that I only have a few threads actually in use. Apparently there’s some issue with having more threads than the 44 CPU cores in the system.

I was expecting this to go badly and it met my expectations in that regard. But it was interesting to see exactly how it went badly. It seems that if I had a GPU with 24G of VRAM I’d still have 54/61 layers running on the CPU so even the largest of home GPUs probably wouldn’t make much difference.

Maybe if I configured the server to have hyper-threading enabled and 88 HT cores then I could have 88 threads and about 34 CPU cores in use which might help. But even if I got the output speed from 3 to 6 words per minute that still wouldn’t be very usable.

04 June, 2025 01:02PM by etbe

June 01, 2025

hackergotchi for Ben Hutchings

Ben Hutchings

FOSS activity in May 2025

01 June, 2025 10:12PM by Ben Hutchings

hackergotchi for Guido Günther

Guido Günther

Free Software Activities May 2025

Another short status update of what happened on my side last month. Larger blocks besides the Phosh 0.47 release are on screen keyboard and cell broadcast improvements, work on separate volume streams, the switch of phoc to wlroots 0.19.0 and effort to make Phosh work on Debian's upcoming stable release (Trixie) out of the box. Trixie will ship with Phosh 0.46, if you want to try out 0.47 you can fetch it from Debian's experimental suite.

See below for details on the above and more:

phosh

  • Track volume control based on media role priority (MR)
  • Release 0.47~rc1, 0.47.0

phoc

  • Release 0.47~rc1, 0.47.0
  • More signal unlink fixes (MR)
  • Further polish the wlroots 0.19 switch and undraft (MR)
  • Start to track wlroots 0.20 dev branch (MR)
  • Remember output configuration and restore scale,mode,transform for single output configs (MR)

phosh-mobile-settings

phosh-osk-stub / stevia

  • Smoke test completers (MR)
  • Better use horizontal space in completion bar (MR)
  • Add emojis to completion bar (MR)
  • Drop separate emoji data (MR)
  • Add some keyword completions (MR)
  • Better handle lots of completions (MR)
  • Release 0.47~rc1, 0.47.0
  • Rename to Stevia (MR), (MR)
  • Release 0.48~alpha1 to ease the rename for distros
  • Some minor GTK4 preps (MR)

phosh-tour

phosh-osk-data

  • Check packaging step (MR)

pfs

xdg-desktop-portal-phosh

phrog

  • Allow systemd to fail to help bootstrapping (MR)

phosh-debs

  • Skip noinsttest profiles (MR) - no need for these in nightly builds and helps nocheck triggering errors
  • Work around phrog install failure until fixed upstream (MR)
  • Switch from osk-stub to stevia (MR)

meta-phosh

  • Support automatic restart in more projects (MR)
  • Fix glob for meson checks (MR)

feedbackd

  • Expand media role MR with volume tracking (MR) - this is basically the same as the Wireplumber MR from below
  • vibra-pattern: Don't overwrite magnitutde when changing global level (MR)
  • Update fbcli manpage (MR)
  • Drop custom script (MR)
  • Add key-{pressed,released} events (MR)
  • Release 0.8.2

feedbackd-device-themes

  • Release 0.8.3
  • Use stronger button-pressed feedback for google,sargo (MR)

gmobile

  • Install tests so they can run through ginsttest-runner (MR)
  • Release 0.3.0, 0.3.1
  • Small timer cleanups (MR)
  • Add mcc to iso country code conversion (MR)
  • Fix manpage subject (MR)

GNOME calls

  • Fix crash on shutdown (MR)
  • Drop meson version check (MR)
  • Backport fixes for 48 (MR)

Debian

  • phosh: Backport patches from 0.46 stable (MR)
  • phosh: Upload 0.47~rc1, 0.47.0
  • gnome-calls: Backport outging SIP call UI fix (MR)
  • gmobile: Upload 0.3.0 with improved autopkg tests
  • phoc: Upload 0.47~rc1, 0.47.0
  • phosh-osk-stub: 0.47.0
  • release-dom-compoenent: Drop tmp branch (MR)
  • phosh-tour: Upload 0.47.0
  • phosh-mobile-settings: Upload 0.47~rc1, 0.47.0
  • xdg-desktop-portal-phosh: Upload 0.47.0
  • wlroots: Upload 0.19.0
  • stevia: Upload 0.48~alpha1
  • meta-phosh: phosh-full: Depend on xwayland (MR)
  • feedbackd-device-themes: Upload 0.8.3
  • feedbackd: Upload 0.8.2
  • feedbackd-device-themes: Backport key-press fix for google,sargo (MR)

ModemManager

  • Simplify builds that need updated libqmi/libmbim (MR)
  • CellBroadcast: Fix QMI channel reading corner cases (MR)
  • Fix crash in CBM code (caused by unrelated bind-to refactor) (MR)
  • Handle more data codings and expose CBM's language (MR)

osmo-cbc

  • api-tool: Allow to set language of cell broadcast message (MR)

gsm-cell-testing

  • Update docs to use our new MM subproject builds (MR)

mobile-broadband-provider-info

  • Allow emergency number information (MR)
  • Allow Cell Broadcast information (MR)
  • Add .dir-locales.el and make tests easier to parse (MR)

Cellbroadcastd

  • Add channel handling (MR)
  • Indicate on DBus whether Cell Broadcasts are supported (MR)
  • Add ASAN check and fix fallout (MR)

phosh-site

pipewire

  • Simplify header use for projects that use -Wswitch-default (MR)

wireplumber

  • Add lua script to track suitable volume control when role based policy linking is in use (MR)

python-dbusmock

  • ModemManager: Add set-channel support (MR)

Bugs

  • Standardize audio stream roles (MR). Otherwise we'll have a hard time with e.g. WirePlumbers role based policy linking as apps might use all kinds of types.

Reviews

This is not code by me but reviews on other peoples code. The list is (as usual) slightly incomplete. Thanks for the contributions!

  • gmobile/pp: Ignore buttons on headsets (MR)
  • p-o-s: Add Arabic translation (MR)
  • p-o-s: pipe completer examples (MR)
  • p-o-s: US Dvorak layout (MR)
  • p-m-s: Use static library for building (MR)
  • p-m-s: pmos tweaks UI parts (MR)
  • p-m-s: Notification categories that light up the screen (MR)
  • gmobile: Border radius for oneplus,enchilada (MR)
  • m-b-p-i: Telia IOT provider (MR)
  • phosh-site: Highlight selected footnote (MR)
  • bluez: Don't start mpris proxy for root user (Patch)
  • phoc: shortcuts-inhibit: Add support for the keyboard shortcuts inhibit protocol (MR)

Help Development

If you want to support my work see donations.

Comments?

Join the Fediverse thread

01 June, 2025 02:19PM

hackergotchi for Junichi Uekawa

Junichi Uekawa

June that is.

June that is.

01 June, 2025 01:14PM by Junichi Uekawa

hackergotchi for Emmanuel Kasper

Emmanuel Kasper

ARM64 desktop as daily driver

I have bought myself an expensive ARM64 workstation, the System 76 Thelio Astra that I intend to use as my main desktop computer for the next 15 years, running Debian.

The box is basically a server motherboard repurposed in a good desktop chassis. In Europe it seems you can order similar ready systems here.

The hardware is well supported by Debian 12 and Debian testing.I had some initial issues with graphics, due to the board being designed for a server use, but I am solving these as we go.

Annoyances I got so far:

  • When you power on the machine using the power supply switch, you have to wait for the BMC to finished its startup sequence, before the front power button does anything. As starting the BMC can take 90 seconds, I thought initially the machine was dead on arrival.

  • The default graphical output is redirected to the BMC Serial over LAN, which means if you want to install Debian using an attached display you need to force the output on the attached display passing console=tty0 as an installer parameter.

  • Finally the Xorg Nouveau driver does not work with the Nvidia A400 GPU I got with the machine. After passing nodemodeset as a kernel parameter, I can force Xorg to use an unaccelerated framebuffer, which at least displays something. I passed this parameter to the installer, so that I could install in graphical mode. The driver from Nvidia works, but I’d like very much to get Nouveau running.

Ugly point

  • A server mother board we said. This mean there is NO suspend to RAM, you have to power off if you don’t want to keep the machine on all the time. As the boot sequence is long (server board again) I am pondering setting a startup time in the UEFI firmware to turn the box on at specific usage time.

Good points

  • The firmware of the machine is a standard EFI, which means you can use the debian arm64 installer on an USB stick straight away, without any kind of device tree / bootloader fiddling.
  • The 3 Nics, Wifi, bluetooth were all recognized on first boot.
  • I was afraid the machine would be loud. However it is quiet, you hear the humming of a fan, but it is quieter than most desktops I owned, from the Atari TT to an all in one Lenovo M92z I used for 10 years. I am certainly not a hardware and cooling specialist, but meseems the quietness comes from slow rotating but very large fans.
  • Due the clean design of Linux and Debian, thousands of packages working correctly on ARM64, starting with the Gnome desktop environment and Firefox.
  • The documentation from system76 is fine, their Ubuntu 20.04 setup guide was helpful to understand the needed parameters mentioned above.

Update: The display is working correctly with the nouveau driver after installing the non-free Nvidia firmware. See the Debian wiki.

01 June, 2025 08:47AM by Manu

May 31, 2025

Paul Wise

FLOSS Activities May 2025

Focus

This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

  • Crash/privacy/security issue in liferea
  • Usability in glab

Sponsors

All work was done on a volunteer basis.

31 May, 2025 10:09PM

Russell Coker

Antoine Beaupré

Traffic meter per ASN without logs

Have you ever found yourself in the situation where you had no or anonymized logs and still wanted to figure out where your traffic was coming from?

Or you have multiple upstreams and are looking to see if you can save fees by getting into peering agreements with some other party?

Or your site is getting heavy load but you can't pinpoint it on a single IP and you suspect some amoral corporation is training their degenerate AI on your content with a bot army?

(You might be getting onto something there.)

If that rings a bell, read on.

TL;DR:

... or just skip the cruft and install asncounter:

pip install asncounter

Also available in Debian 14 or later, or possibly in Debian 13 backports (soon to be released) if people are interested:

apt install asncounter

Then count whoever is hitting your network with:

awk '{print $2}' /var/log/apache2/*access*.log | asncounter

or:

tail -F /var/log/apache2/*access*.log | awk '{print $2}' | asncounter

or:

tcpdump -q -n | asncounter --input-format=tcpdump --repl

or:

tcpdump -q -i eth0 -n -Q in "tcp and tcp[tcpflags] & tcp-syn != 0 and (port 80 or port 443)" | asncounter --input-format=tcpdump --repl

Read on for why this matters, and why I wrote yet another weird tool (almost) from scratch.

Background and manual work

This is a tool I've been dreaming of for a long, long time. Back in 2006, at Koumbit a colleague had setup TAS ("Traffic Accounting System", "Система учета трафика" in Russian, apparently), a collection of Perl script that would do per-IP accounting. It was pretty cool: it would count bytes per IP addresses and, from that, you could do analysis. But the project died, and it was kind of bespoke.

Fast forward twenty years, and I find myself fighting off bots at the Tor Project (the irony...), with our GitLab suffering pretty bad slowdowns (see issue tpo/tpa/team#41677 for the latest public issue, the juicier one is confidential, unfortunately).

(We did have some issues caused by overloads in CI, as we host, after all, a fork of Firefox, which is a massive repository, but the applications team did sustained, awesome work to fix issues on that side, again and again (see tpo/applications/tor-browser#43121 for the latest, and tpo/applications/tor-browser#43121 for some pretty impressive correlation work, I work with really skilled people). But those issues, I believe were fixed.)

So I had the feeling it was our turn to get hammered by the AI bots. But how do we tell? I could tell something was hammering at the costly /commit/ and (especially costly) /blame/ endpoint. So at first, I pulled out the trusted awk, sort | uniq -c | sort -n | tail pipeline I am sure others have worked out before:

awk '{print $1}' /var/log/nginx/*.log | sort | uniq -c | sort -n | tail -10

For people new to this, that pulls the first field out of web server log files, sort the list, counts the number of unique entries, and sorts that so that the most common entries (or IPs) show up first, then show the top 10.

That, other words, answers the question of "which IP address visits this web server the most?" Based on this, I found a couple of IP addresses that looked like Alibaba. I had already addressed an abuse complaint to them (tpo/tpa/team#42152) but never got a response, so I just blocked their entire network blocks, rather violently:

for cidr in 47.240.0.0/14 47.246.0.0/16 47.244.0.0/15 47.235.0.0/16 47.236.0.0/14; do 
  iptables-legacy -I INPUT -s $cidr -j REJECT
done

That made Ali Baba and his forty thieves (specifically their AL-3 network go away, but our load was still high, and I was still seeing various IPs crawling the costly endpoints. And this time, it was hard to tell who they were: you'll notice all the Alibaba IPs are inside the same 47.0.0.0/8 prefix. Although it's not a /8 itself, it's all inside the same prefix, so it's visually easy to pick it apart, especially for a brain like mine who's stared too long at logs flowing by too fast for their own mental health.

What I had then was different, and I was tired of doing the stupid thing I had been doing for decades at this point. I had recently stumbled upon pyasn recently (in January, according to my notes) and somehow found it again, and thought "I bet I could write a quick script that loops over IPs and counts IPs per ASN".

(Obviously, there are lots of other tools out there for that kind of monitoring. Argos, for example, presumably does this, but it's a kind of a huge stack. You can also get into netflows, but there's serious privacy implications with those. There are also lots of per-IP counters like promacct, but that doesn't scale.

Or maybe someone already had solved this problem and I just wasted a week of my life, who knows. Someone will let me know, I hope, either way.)

ASNs and networks

A quick aside, for people not familiar with how the internet works. People that know about ASNs, BGP announcements and so on can skip.

The internet is the network of networks. It's made of multiple networks that talk to each other. The way this works is there is a Border Gateway Protocol (BGP), a relatively simple TCP-based protocol, that the edge routers of those networks used to announce each other what network they manage. Each of those network is called an Autonomous System (AS) and has an AS number (ASN) to uniquely identify it. Just like IP addresses, ASNs are allocated by IANA and local registries, they're pretty cheap and useful if you like running your own routers, get one.

When you have an ASN, you'll use it to, say, announce to your BGP neighbors "I have 198.51.100.0/24" over here and the others might say "okay, and I have 216.90.108.31/19 over here, and I know of this other ASN over there that has 192.0.2.1/24 too! And gradually, those announcements flood the entire network, and you end up with each BGP having a routing table of the global internet, with a map of which network block, or "prefix" is announced by which ASN.

It's how the internet works, and it's a useful thing to know, because it's what, ultimately, makes an organisation responsible for an IP address. There are "looking glass" tools like the one provided by routeviews.org which allow you to effectively run "trace routes" (but not the same as traceroute, which actively sends probes from your location), type an IP address in that form to fiddle with it. You will end up with an "AS path", the way to get from the looking glass to the announced network. But I digress, and that's kind of out of scope.

Point is, internet is made of networks, networks are autonomous systems (AS) and they have numbers (ASNs), and they announced IP prefixes (or "network blocks") that ultimately tells you who is responsible for traffic on the internet.

Introducing asncounter

So my goal was to get from "lots of IP addresses" to "list of ASNs", possibly also the list of prefixes (because why not). Turns out pyasn makes that really easy. I managed to build a prototype in probably less than an hour, just look at the first version, it's 44 lines (sloccount) of Python, and it works, provided you have already downloaded the required datafiles from routeviews.org. (Obviously, the latest version is longer at close to 1000 lines, but it downloads the data files automatically, and has many more features).

The way the first prototype (and later versions too, mostly) worked is that you feed it a list of IP addresses on standard input, it looks up the ASN and prefix associated with the IP, and increments a counter for those, then print the result.

That showed me something like this:

root@gitlab-02:~/anarcat-scripts# tcpdump -q -i eth0 -n -Q in "(udp or tcp)" | ./asncounter.py --tcpdump                                                                                                                                                                          
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode                                                                
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes                                                             
INFO: collecting IPs from stdin, using datfile ipasn_20250523.1600.dat.gz                                                                
INFO: loading datfile /root/.cache/pyasn/ipasn_20250523.1600.dat.gz...                                                                   
INFO: loading /root/.cache/pyasn/asnames.json                       
ASN     count   AS               
136907  7811    HWCLOUDS-AS-AP HUAWEI CLOUDS, HK                                                                                         
[----]  359     [REDACTED]
[----]  313     [REDACTED]
8075    254     MICROSOFT-CORP-MSN-AS-BLOCK, US
[---]   164     [REDACTED]
[----]  136     [REDACTED]
24940   114     HETZNER-AS, DE  
[----]  98      [REDACTED]
14618   82      AMAZON-AES, US                                                                                                           
[----]  79      [REDACTED]
prefix  count                                         
166.108.192.0/20        1294                                                                                                             
188.239.32.0/20 1056                                          
166.108.224.0/20        970                    
111.119.192.0/20        951              
124.243.128.0/18        667                                         
94.74.80.0/20   651                                                 
111.119.224.0/20        622                                         
111.119.240.0/20        566           
111.119.208.0/20        538                                         
[REDACTED]  313           

Even without ratios and a total count (which will come later), it was quite clear that Huawei was doing something big on the server. At that point, it was responsible for a quarter to half of the traffic on our GitLab server or about 5-10 queries per second.

But just looking at the logs, or per IP hit counts, it was really hard to tell. That traffic is really well distributed. If you look more closely at the output above, you'll notice I redacted a couple of entries except major providers, for privacy reasons. But you'll also notice almost nothing is redacted in the prefix list, why? Because all of those networks are Huawei! Their announcements are kind of bonkers: they have hundreds of such prefixes.

Now, clever people in the know will say "of course they do, it's an hyperscaler; just ASN14618 (AMAZON-AES) there is way more announcements, they have 1416 prefixes!" Yes, of course, but they are not generating half of my traffic (at least, not yet). But even then: this also applies to Amazon! This way of counting traffic is way more useful for large scale operations like this, because you group by organisation instead of by server or individual endpoint.

And, ultimately, this is why asncounter matters: it allows you to group your traffic by organisation, the place you can actually negotiate with.

Now, of course, that assumes those are entities you can talk with. I have written to both Alibaba and Huawei, and have yet to receive a response. I assume I never will. In their defence, I wrote in English, perhaps I should have made the effort of translating my message in Chinese, but then again English is the Lingua Franca of the Internet, and I doubt that's actually the issue.

The Huawei and Facebook blocks

Another aside, because this is my blog and I am not looking for a Pullitzer here.

So I blocked Huawei from our GitLab server (and before you tear your shirt open: only our GitLab server, everything else is still accessible to them, including our email server to respond to my complaint). I did so 24h after emailing them, and after examining their user agent (UA) headers. Boy that was fun. In a sample of 268 requests I analyzed, they churned out 246 different UAs.

At first glance, they looked legit, like:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36

Safari on a Mac, so far so good. But when you start digging, you notice some strange things, like here's Safari running on Linux:

Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.457.0 Safari/534.3

Was Safari ported to Linux? I guess that's.. possible?

But here is Safari running on a 15 year old Ubuntu release (10.10):

Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Ubuntu/10.10 Chromium/12.0.702.0 Chrome/12.0.702.0 Safari/534.24

Speaking of old, here's Safari again, but this time running on Windows NT 5.1, AKA Windows XP, released 2001, EOL since 2019:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-CA) AppleWebKit/534.13 (KHTML like Gecko) Chrome/9.0.597.98 Safari/534.13

Really?

Here's Firefox 3.6, released 14 years ago, there were quite a lot of those:

Mozilla/5.0 (Windows; U; Windows NT 6.1; lt; rv:1.9.2) Gecko/20100115 Firefox/3.6

I remember running those old Firefox releases, those were the days.

But to me, those look like entirely fake UAs, deliberately rotated to make it look like legitimate traffic.

In comparison, Facebook seemed a bit more legit, in the sense that they don't fake it. most hits are from:

meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)

which, according their documentation:

crawls the web for use cases such as training AI models or improving products by indexing content directly

From what I could tell, it was even respecting our rather liberal robots.txt rules, in that it wasn't crawling the sprawling /blame/ or /commit/ endpoints, explicitly forbidden by robots.txt.

So I've blocked the Facebook bot in robots.txt and, amazingly, it just went away. Good job Facebook, as much as I think you've given the empire to neo-nazis, cause depression and genocide, you know how to run a crawler, thanks.

Huawei was blocked at the web server level, with a friendly 429 status code telling people to contact us (over email) if they need help. And they don't care: they're still hammering the server, from what I can tell, but then again, I didn't block the entire ASN just yet, just the blocks I found crawling the server over a couple hours.

A full asncounter run

So what does a day in asncounter look like? Well, you start with a problem, say you're getting too much traffic and want to see where it's from. First you need to sample it. Typically, you'd do that with tcpdump or tailing a log file:

tail -F /var/log/apache2/*access*.log | awk '{print $2}' | asncounter

If you have lots of traffic or care about your users' privacy, you're not going to log IP addresses, so tcpdump is likely a good option instead:

tcpdump -q -n | asncounter --input-format=tcpdump --repl

If you really get a lot of traffic, you might want to get a subset of that to avoid overwhelming asncounter, it's not fast enough to do multiple gigabit/second, I bet, so here's only incoming SYN IPv4 packets:

tcpdump -q -n -Q in "tcp and tcp[tcpflags] & tcp-syn != 0 and (port 80 or port 443)" | asncounter --input-format=tcpdump --repl

In any case, at this point you're staring at a process, just sitting there. If you passed the --repl or --manhole arguments, you're lucky: you have a Python shell inside the program. Otherwise, send SIGHUP to the thing to have it dump the nice tables out:

pkill -HUP asncounter

Here's an example run:

> awk '{print $2}' /var/log/apache2/*access*.log | asncounter
INFO: using datfile ipasn_20250527.1600.dat.gz
INFO: collecting addresses from <stdin>
INFO: loading datfile /home/anarcat/.cache/pyasn/ipasn_20250527.1600.dat.gz...
INFO: finished reading data
INFO: loading /home/anarcat/.cache/pyasn/asnames.json
count   percent ASN AS
12779   69.33   66496   SAMPLE, CA
3361    18.23   None    None
366 1.99    66497   EXAMPLE, FR
337 1.83    16276   OVH, FR
321 1.74    8075    MICROSOFT-CORP-MSN-AS-BLOCK, US
309 1.68    14061   DIGITALOCEAN-ASN, US
128 0.69    16509   AMAZON-02, US
77  0.42    48090   DMZHOST, GB
56  0.3 136907  HWCLOUDS-AS-AP HUAWEI CLOUDS, HK
53  0.29    17621   CNCGROUP-SH China Unicom Shanghai network, CN
total: 18433
count   percent prefix  ASN AS
12779   69.33   192.0.2.0/24    66496   SAMPLE, CA
3361    18.23   None        
298 1.62    178.128.208.0/20    14061   DIGITALOCEAN-ASN, US
289 1.57    51.222.0.0/16   16276   OVH, FR
272 1.48    2001:DB8::/48   66497   EXAMPLE, FR
235 1.27    172.160.0.0/11  8075    MICROSOFT-CORP-MSN-AS-BLOCK, US
94  0.51    2001:DB8:1::/48 66497   EXAMPLE, FR
72  0.39    47.128.0.0/14   16509   AMAZON-02, US
69  0.37    93.123.109.0/24 48090   DMZHOST, GB
53  0.29    27.115.124.0/24 17621   CNCGROUP-SH China Unicom Shanghai network, CN

Those numbers are actually from my home network, not GitLab. Over there, the battle still rages on, but at least the vampire bots are banging their heads against the solid Nginx wall instead of eating the fragile heart of GitLab. We had a significant improvement in latency thanks to the Facebook and Huawei blocks... Here are the "workhorse request duration stats" for various time ranges, 20h after the block:

range mean max stdev
20h 449ms 958ms 39ms
7d 1.78s 5m 14.9s
30d 2.08s 3.86m 8.86s
6m 901ms 27.3s 2.43s

We went from two seconds mean to 500ms! And look at that standard deviation! 39ms! It was ten seconds before! I doubt we'll keep it that way very long but for now, it feels like I won a battle, and I didn't even have to setup anubis or go-away, although I suspect that will unfortunately come.

Note that asncounter also supports exporting Prometheus metrics, but you should be careful with this, as it can lead to cardinal explosion, especially if you track by prefix (which can be disabled with --no-prefixes`.

Folks interested in more details should read the fine manual for more examples, usage, and discussion. It shows, among other things, how to effectively block lots of networks from Nginx, aggregate multiple prefixes, block entire ASNs, and more!

So there you have it: I now have the tool I wish I had 20 years ago. Hopefully it will stay useful for another 20 years, although I'm not sure we'll have still have internet in 20 years.

I welcome constructive feedback, "oh no you rewrote X", Grafana dashboards, bug reports, pull requests, and "hell yeah" comments. Hacker News, let it rip, I know you can give me another juicy quote for my blog.

This work was done as part of my paid work for the Tor Project, currently in a fundraising drive, give us money if you like what you read.

31 May, 2025 02:32AM

May 30, 2025

Russell Coker

Service Setup Difficulties

Marco wrote a blog post opposing hyperscale systems which included “We want to use an hyperscaler cloud because our developers do not want to operate a scalable and redundant database just means that you need to hire competent developers and/or system administrators.” [1].

I previously wrote a blog post Why Clusters Usually Don’t Work [2] and I believe that all the points there are valid today – and possibly exacerbated by clusters getting less direct use as clustering is increasingly being done by hyperscale providers.

Take a basic need, a MySQL or PostgreSQL database for example. You want it to run and basically do the job and to have good recovery options. You could set it up locally, run backups, test the backups, have a recovery plan for failures, maybe have a hot-spare server if it’s really important, have tests for backups and hot-spare server, etc. Then you could have documentation for this so if the person who set it up isn’t available when there’s a problem they will be able to find out what to do. But the hyperscale option is to just select a database in your provider and have all this just work. If the person who set it up isn’t available for recovery in the event of failure the company can just put out a job advert for “person with experience on cloud company X” and have them just immediately go to work on it.

I don’t like hyperscale providers as they are all monopolistic companies that do anti-competitive actions. Google should be broken up, Android development and the Play Store should be separated from Gmail etc which should be separated from search and adverts, and all of them should be separated from the GCP cloud service. Amazon should be broken up, running the Amazon store should be separated from selling items on the store, which should be separated from running a video on demand platform, and all of them should be separated from the AWS cloud. Microsoft should be broken up, OS development should be separated from application development all of that should be separated from cloud services (Teams and Office 365), and everything else should be separate from the Azure cloud system.

But the cloud providers offer real benefits at small scale. Running a MySQL or PostgreSQL database for local services is easy, it’s a simple apt command to install it and then it basically works. Doing backup and recovery isn’t so easy. One could say “just hire competent people” but if you do hire competent people do you want them running MySQL databases etc or have them just click on the “create mysql database” option on a cloud control panel and then move on to more important things?

The FreedomBox project is a great project for installing and managing home/personal services [3]. But it’s not about running things like database servers, it’s for a high level running mail servers and other things for the user not for the developer.

The Debian packaging of Open Stack looks interesting [4], it’s a complete setup for running your own hyper scale cloud service. For medium and large organisations running Open Stack could be a good approach. But for small organisations it’s cheaper and easier to just use a cloud service to run things.

The issue of when to run things in-house and when to put them in the cloud is very complex. I think that if the organisation is going to spend less money on cloud services than on the salary of one sysadmin then it’s probably best to have things in the cloud. When cloud costs start to exceed the salary of one person who manages systems then having them spend the extra time and effort to run things locally starts making more sense. There is also an opportunity cost in having a good sysadmin work on the backups for all the different systems instead of letting the cloud provider just do it. Another possibility of course is to run things in-house on low end hardware and just deal with the occasional downtime to save money. Knowingly choosing less reliability to save money can be quite reasonable as long as you have considered the options and all the responsible people are involved in the discussion.

The one situation that I strongly oppose is having hyper scale services setup by people who don’t understand them. Running a database server on a cloud service because you don’t want to spend the time managing it is a reasonable choice in many situations. Running a database server on a cloud service because you don’t understand how to setup a database server is never a good choice. While the cloud services are quite resilient there are still ways of breaking the overall system if you don’t understand it. Also while it is quite possible for someone to know how to develop for databases including avoiding SQL injection etc but be unable to setup a database server that’s probably not going to be common, probably if someone can’t set it up (a generally easy task) then they can’t do the hard tasks of making it secure.

30 May, 2025 07:32AM by etbe

Machine Learning Security

I just read an interesting blog post about ML security recommended by Bruce Schneier [1].

This approach of having 2 AI systems where one processes user input and the second performs actions on quarantined data is good and solves some real problems. But I think the bigger issue is the need to do this. Why not have a multi stage approach, instead of a single user input to do everything (the example given is “Can you send Bob the document he requested in our last meeting? Bob’s email and the document he asked for are in the meeting notes file”) you could have “get Bob’s email address from the meeting notes file” followed by “create a new email to that address” and “find the document” etc.

A major problem with many plans for ML systems is that they are based around automating relatively simple tasks. The example of sending an email based on meeting notes is a trivial task that’s done many times a day but for which expressing it verbally isn’t much faster than doing it the usual way. The usual way of doing such things (manually finding the email address from the meeting notes etc) can be accelerated without ML by having a “recent documents” access method that gets the notes, having the email address be a hot link to the email program (IE wordprocessor or note taking program being able to call the MUA), having a “put all data objects of type X into the clipboard (where X can be email address, URL, filename, or whatever), and maybe optimising the MUA UI. The problems that people are talking about solving via ML and treating everything as text to be arbitrarily parsed can in many cases by solved by having the programs dealing with the data know what they have and have support for calling system services accordingly.

The blog post suggests a problem of “user fatigue” from asking the user to confirm all actions, that is a real concern if the system is going to automate everything such that the user gives a verbal description of the problem and then says “yes” many times to confirm it. But if the user is at every step of the way pushing the process “take this email address” “attach this file” it won’t be a series of “yes” operations with a risk of saying “yes” once too often.

I think that one thing that should be investigated is better integration between services to allow working live on data. If in an online meeting someone says “I’ll work on task A please send me an email at the end of the meeting with all issues related to it” then you should be able to click on their email address in the meeting software to bring up the MUA to send a message and then just paste stuff in. The user could then not immediately send the message and clicking on the email address again would bring up the message in progress to allow adding to it (the behaviour of most MUAs of creating a new message for every click on a mailto:// URL is usually not what you desire). In this example you could of course use ALT-TAB or other methods to switch windows to the email, but imagine the situation of having 5 people in the meeting who are to be emailed about different things and that wouldn’t scale.

Another thing for the meeting example is that having a text chat for a video conference is a standard feature now and being able to directly message individuals is available in BBB and probably some other online meeting systems. It shouldn’t be hard to add a feature to BBB and similar programs to have each user receive an email at the end of the meeting with the contents of every DM chat they were involved in and have everyone in the meeting receive an emailed transcript of the public chat.

In conclusion I think that there are real issues with ML security and something like this technology is needed. But for most cases the best option is to just not have ML systems do such things. Also there is significant scope for improving the integration of various existing systems in a non-ML way.

30 May, 2025 06:04AM by etbe

Utkarsh Gupta

FOSS Activites in May 2025

Here’s my 68th monthly but brief update about the activities I’ve done in the F/L/OSS world.

Debian

This was my 77th month of actively contributing to Debian. I became a DM in late March 2019 and a DD on Christmas ‘19! \o/

This month I’ve just been sort of MIA, mostly because of a combination of the Canonical engineering sprints in Frankfurt, a bit of vacation in Italy, and then being sick. So didn’t really get much done in Debian this month.


Ubuntu

This was my 53rd month of actively contributing to Ubuntu. I joined Canonical to work on Ubuntu full-time back in February 2021.

Whilst I can’t give a full, detailed list of things I did (there’s so much and some of it might not be public…yet!), here’s a quick TL;DR of what I did:


Debian (E)LTS

Debian Long Term Support (LTS) is a project to extend the lifetime of all Debian stable releases to (at least) 5 years. Debian LTS is not handled by the Debian security team, but by a separate group of volunteers and companies interested in making it a success.

And Debian Extended LTS (ELTS) is its sister project, extending support to the buster, stretch, and jessie release (+2 years after LTS support).

This was my 68th month as a Debian LTS and 55th month as a Debian ELTS paid contributor.
Due to a combination of the Canonical engineering sprints in Frankfurt, a bit of vacation in Italy, and then being sick, I was barely able to do (E)LTS work. So this month, I worked for only 1.00 hours for LTS and 0 hours for ELTS.

I did the following things:

  • [LTS] Attended the hourly LTS meeting on IRC. Summary here.

Until next time.
:wq for today.

30 May, 2025 05:41AM

Reproducible Builds (diffoscope)

diffoscope 297 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 297. This version includes the following changes:

[ Will Hollywood ]
* Add a LZMA comparator and tests.

You find out more by visiting the project homepage.

30 May, 2025 12:00AM

May 29, 2025

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

#48: r2u Talk Re-Recorded

Welcome to post 48 in the R4 series, and to video 8 in this series.

Last week I had the honour of giving the opening talk at the 11eme Rencontres R at the Université de Mons in Belgium as an invited plenary talk. Big thanks again to Philippe Grosjean and Kathy Huet for the invitation, and for organising a lovely conference.

Being the opening talk, we were still sorting out projector issues when I started so I forgot to set a timer, and consequently ran out of time like a newbie. It occured to me that I could simply re-record the talk in front of my slides just as I do for my STAT 447 students. So I sat down this morning and did this, and the video is now online:

The slides are available as well.

For questions or other feedback, please consider using the r2u GitHub repo issues section.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

29 May, 2025 08:38PM

RcppDate 0.0.6: New Upstream

RcppDate wraps the featureful date library written by Howard Hinnant for use with R. This header-only modern C++ library has been in pretty wide-spread use for a while now, and adds to C++11/C++14/C++17 what will is (with minor modifications) the ‘date’ library in C++20. The RcppDate package adds no extra R or C++ code and can therefore be a zero-cost dependency for any other project; yet a number of other projects decided to re-vendor it resulting in less-efficient duplication. Oh well. C’est la vie.

This release syncs with upstream release 3.0.4 made yesterday which contains a few PRs (including one by us) for the clang++-20 changes some of which we already had in release 0.0.5. We also made a routine update to the continuous integration.

Changes in version 0.0.6 (2025-05-29)

  • Updated to upstream version 3.0.4

Courtesy of my CRANberries, there is also a diffstat report for the most recent release. More information is available at the repository or the package page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

29 May, 2025 06:52PM

hackergotchi for Debian XMPP Team

Debian XMPP Team

XMPP/Jabber Debian 13 Trixie News

Debian 13 "Trixie" full freeze has started 2025-05-17, so this is a good time to take a look at some of the features, that this release will bring. Here we will focus on packages related to XMPP, a.k.a. Jabber.

XMPP is a universal communication protocol for instant messaging, push notifications, IoT, WebRTC, and social applications. It has existed since 1999, originally called "Jabber", it has a diverse and active developers community.

Clients

Servers

Libraries

  • libomemo-c 0.5.0 to 0.5.1
  • libstrophe, an XMPP library in C has been upgraded from 0.12.2 to 0.14.0
    It now supports XEP-0138: Stream Compression and adds various modern SCRAM mechanisms.
  • omemo-dr, an OMEMO library used by Gajim is now in Debian, in version 1.0.1
  • python-nbxmpp, a non blocking Jabber/XMPP Python 3 library, upgrade from 4.2.2 to 6.1.1
  • python-oldmemo, a python-omemo backend for OMEMO 1, 1.0.3 to 1.1.0
  • python-omemo, a Python 3 implementation of the OMEMO protocol, 1.0.2 to 1.2.0
  • python-twomemo, a python-omemo backend for OMEMO 2, 1.0.3 to 1.1.0
  • qxmpp 1.4.0 to 1.10.3
  • slixmpp-omemo new 1.2.2
  • slixmpp 1.8.3 to 1.10.0
  • strophejs, a library for writing XMPP clients has been upgraded from 1.2.14 to 3.1.0

Gateways/Transports

  • Biboumi, a gateway between XMPP and IRC, upgrades from 9.0 to 9.0+20241124.
  • Debian 13 Trixie includes Slidge 0.2.12 and Matridge 0.2.3 for the first time! It is a gateway between XMPP and Matrix, with support for many chat features.

Not in Trixie

  • Spectrum 2, a gateway from XMPP to various other messaging systems, did not make it into Debian 13, because it depends on Swift, which has release critical bugs and therefore cannot be part of a stable release.

29 May, 2025 12:00AM by Debian XMPP Team

Arthur Diniz

Bringing Kubernetes Back to Debian

I’ve been part of the Debian Project since 2019, when I attended DebConf held in Curitiba, Brazil. That event sparked my interest in the community, packaging, and how Debian works as a distribution.

In the early years of my involvement, I contributed to various teams such as the Python, Golang and Cloud teams, packaging dependencies and maintaining various tools. However, I soon felt the need to focus on packaging software I truly enjoyed, tools I was passionate about using and maintaining.

That’s when I turned my attention to Kubernetes within Debian.


A Broken Ecosystem

The Kubernetes packaging situation in Debian had been problematic for some time. Given its large codebase and complex dependency tree, the initial packaging approach involved vendorizing all dependencies. While this allowed a somewhat functional package to be published, it introduced several long-term issues, especially security concerns.

Vendorized packages bundle third-party dependencies directly into the source tarball. When vulnerabilities arise in those dependencies, it becomes difficult for Debian’s security team to patch and rebuild affected packages system-wide. This approach broke Debian’s best practices, and it eventually led to the abandonment of the Kubernetes source package, which had stalled at version 1.20.5.

Due to this abandonment, critical bugs emerged and the package was removed from Debian’s testing channel, as we can see in the package tracker.


New Debian Kubernetes Team

Around this time, I became a Debian Maintainer (DM), with permissions to upload certain packages. I saw an opportunity to both contribute more deeply to Debian and to fix Kubernetes packaging.

In early 2024, just before DebConf Busan in South Korea, I founded the Debian Kubernetes Team. The mission of the team was to repackage Kubernetes in a maintainable, security-conscious, and Debian-compliant way. At DebConf, I shared our progress with the broader community and received great feedback and more visibility, along with people interested in contributing to the team.

Our first tasks was to migrate existing Kubernetes-related tools such as kubectx, kubernetes-split-yaml and kubetail into a dedicated namespace on Salsa, Debian’s GitLab instance.

Many of these tools were stored across different teams (like the Go team), and consolidating them helped us organize development and focus our efforts.


De-vendorizing Kubernetes

Our main goal was to un-vendorize Kubernetes and bring it up-to-date with upstream releases.

This meant:

  • Removing the vendor directory and all embedded third-party code.
  • Trimming the build scope to focus solely on building kubectl, Kubernetes’ CLI.
  • Using Files-Excluded in debian/copyright to cleanly drop unneeded files during source imports.
  • Rebuilding the dependency tree, ensuring all Go modules were separately packaged in Debian.

We used uscan, a standard Debian packaging tool that fetches upstream tarballs and prepares them accordingly. The Files-Excluded directive in our debian/copyright file instructed uscan to automatically remove unnecessary files during the repackaging process:

$ uscan
Newest version of kubernetes on remote site is 1.32.3, specified download version is 1.32.3
Successfully repacked ../v1.32.3 as ../kubernetes_1.32.3+ds.orig.tar.gz, deleting 30616 files from it.

The results were dramatic. By comparing the original upstream tarball with our repackaged version, we can see that our approach reduced the tarball size by over 75%:

$ du -h upstream-v1.32.3.tar.gz kubernetes_1.32.3+ds.orig.tar.gz
14M	upstream-v1.32.3.tar.gz
3.2M	kubernetes_1.32.3+ds.orig.tar.gz

This significant reduction wasn’t just about saving space. By removing over 30,000 files, we simplified the package, making it more maintainable. Each dependency could now be properly tracked, updated, and patched independently, resolving the security concerns that had plagued the previous packaging approach.


Dependency Graph

To give you an idea of the complexity involved in packaging Kubernetes for Debian, the image below is a dependency graph generated with debtree, visualizing all the Go modules and other dependencies required to build the kubectl binary.

kubectl-depgraph

This web of nodes and edges represents every module and its relationship during the compilation process of kubectl. Each box is a Debian package, and the lines connecting them show how deeply intertwined the ecosystem is. What might look like a mess of blue spaghetti is actually a clear demonstration of the vast and interconnected upstream world that tools like kubectl rely on.

But more importantly, this graph is a testament to the effort that went into making kubectl build entirely using Debian-packaged dependencies only, no vendoring, no downloading from the internet, no proprietary blobs.


Upstream Version 1.32.3 and Beyond

After nearly two years of work, we successfully uploaded version 1.32.3+ds of kubectl to Debian unstable.

kubernetes/-/merge_requests/1

The new package also includes:

  • Zsh, Fish, and Bash completions installed automatically
  • Man pages and metadata for improved discoverability
  • Full integration with kind and docker for testing purposes

Integration Testing with Autopkgtest

To ensure the reliability of kubectl in real-world scenarios, we developed a new autopkgtest suite that runs integration tests using real Kubernetes clusters created via Kind.

Autopkgtest is a Debian tool used to run automated tests on binary packages. These tests are executed after the package is built but before it’s accepted into the Debian archive, helping catch regressions and integration issues early in the packaging pipeline.

Our test workflow validates kubectl by performing the following steps:

  • Installing Kind and Docker as test dependencies.
  • Spinning up two local Kubernetes clusters.
  • Switching between cluster contexts to ensure multi-cluster support.
  • Deploying and scaling a sample nginx application using kubectl.
  • Cleaning up the entire test environment to avoid side effects.

  • debian/tests/kubectl.sh

Popcon: Measuring Adoption

To measure real-world usage, we rely on data from Debian’s popularity contest (popcon), which gives insight into how many users have each binary installed.

popcon-graph popcon-table

Here’s what the data tells us:

  • kubectl (new binary): Already installed on 2,124 systems.
  • golang-k8s-kubectl-dev: This is the Go development package (a library), useful for other packages and developers who want to interact with Kubernetes programmatically.
  • kubernetes-client: The legacy package that kubectl is replacing. We expect this number to decrease in future releases as more systems transition to the new package.

Although the popcon data shows activity for kubectl before the official Debian upload date, it’s important to note that those numbers represent users who had it installed from upstream source-lists, not from the Debian repositories. This distinction underscores a demand that existed even before the package was available in Debian proper, and it validates the importance of bringing it into the archive.

Also worth mentioning: this number is not the real total number of installations, since users can choose not to participate in the popularity contest. So the actual adoption is likely higher than what popcon reflects.


Community and Documentation

The team also maintains a dedicated wiki page which documents:

  • Maintained tools and packages
  • Contribution guidelines
  • Our roadmap for the upcoming Debian releases

https://debian-kubernetes.org


Looking Ahead to Debian 13 (Trixie)

The next stable release of Debian will ship with kubectl version 1.32.3, built from a clean, de-vendorized source. This version includes nearly all the latest upstream features, and will be the first time in years that Debian users can rely on an up-to-date, policy-compliant kubectl directly from the archive.

By comparing with upstream, our Debian package even delivers more out of the box, including shell completions, which the upstream still requires users to generate manually.

In 2025, the Debian Kubernetes team will continue expanding our packaging efforts for the Kubernetes ecosystem.

Our roadmap includes:

  • kubelet: The primary node agent that runs on each node. This will enable Debian users to create fully functional Kubernetes nodes without relying on external packages.

  • kubeadm: A tool for creating Kubernetes clusters. With kubeadm in Debian, users will then be able to bootstrap minimum viable clusters directly from the official repositories.

  • helm: The package manager for Kubernetes that helps manage applications through Kubernetes YAML files defined as charts.

  • kompose: A conversion tool that helps users familiar with docker-compose move to Kubernetes by translating Docker Compose files into Kubernetes resources.


Final Thoughts

This journey was only possible thanks to the amazing support of the debian-devel-br community and the collective effort of contributors who stepped up to package missing dependencies, fix bugs, and test new versions.

Special thanks to:

  • Carlos Henrique Melara (@charles)
  • Guilherme Puida (@puida)
  • João Pedro Nobrega (@jnpf)
  • Lucas Kanashiro (@kanashiro)
  • Matheus Polkorny (@polkorny)
  • Samuel Henrique (@samueloph)
  • Sergio Cipriano (@cipriano)
  • Sergio Durigan Junior (@sergiodj)

I look forward to continuing this work, bringing more Kubernetes tools into Debian and improving the developer experience for everyone.

29 May, 2025 12:00AM

Bringing Kubernetes Back to Debian

I’ve been part of the Debian Project since 2019, when I attended DebConf held in Curitiba, Brazil. That event sparked my interest in the community, packaging, and how Debian works as a distribution.

In the early years of my involvement, I contributed to various teams such as the Python, Golang and Cloud teams, packaging dependencies and maintaining various tools. However, I soon felt the need to focus on packaging software I truly enjoyed, tools I was passionate about using and maintaining.

That’s when I turned my attention to Kubernetes within Debian.


A Broken Ecosystem

The Kubernetes packaging situation in Debian had been problematic for some time. Given its large codebase and complex dependency tree, the initial packaging approach involved vendorizing all dependencies. While this allowed a somewhat functional package to be published, it introduced several long-term issues, especially security concerns.

Vendorized packages bundle third-party dependencies directly into the source tarball. When vulnerabilities arise in those dependencies, it becomes difficult for Debian’s security team to patch and rebuild affected packages system-wide. This approach broke Debian’s best practices, and it eventually led to the abandonment of the Kubernetes source package, which had stalled at version 1.20.5.

Due to this abandonment, critical bugs emerged and the package was removed from Debian’s testing channel, as we can see in the package tracker.


New Debian Kubernetes Team

Around this time, I became a Debian Maintainer (DM), with permissions to upload certain packages. I saw an opportunity to both contribute more deeply to Debian and to fix Kubernetes packaging.

In early 2024, just before DebConf Busan in South Korea, I founded the Debian Kubernetes Team. The mission of the team was to repackage Kubernetes in a maintainable, security-conscious, and Debian-compliant way. At DebConf, I shared our progress with the broader community and received great feedback and more visibility, along with people interested in contributing to the team.

Our first tasks was to migrate existing Kubernetes-related tools such as kubectx, kubernetes-split-yaml and kubetail into a dedicated namespace on Salsa, Debian’s GitLab instance.

Many of these tools were stored across different teams (like the Go team), and consolidating them helped us organize development and focus our efforts.


De-vendorizing Kubernetes

Our main goal was to un-vendorize Kubernetes and bring it up-to-date with upstream releases.

This meant:

  • Removing the vendor directory and all embedded third-party code.
  • Trimming the build scope to focus solely on building kubectl, Kubernetes’ CLI.
  • Using Files-Excluded in debian/copyright to cleanly drop unneeded files during source imports.
  • Rebuilding the dependency tree, ensuring all Go modules were separately packaged in Debian.

We used uscan, a standard Debian packaging tool that fetches upstream tarballs and prepares them accordingly. The Files-Excluded directive in our debian/copyright file instructed uscan to automatically remove unnecessary files during the repackaging process:

$ uscan
Newest version of kubernetes on remote site is 1.32.3, specified download version is 1.32.3
Successfully repacked ../v1.32.3 as ../kubernetes_1.32.3+ds.orig.tar.gz, deleting 30616 files from it.

The results were dramatic. By comparing the original upstream tarball with our repackaged version, we can see that our approach reduced the tarball size by over 75%:

$ du -h upstream-v1.32.3.tar.gz kubernetes_1.32.3+ds.orig.tar.gz
14M	upstream-v1.32.3.tar.gz
3.2M	kubernetes_1.32.3+ds.orig.tar.gz

This significant reduction wasn’t just about saving space. By removing over 30,000 files, we simplified the package, making it more maintainable. Each dependency could now be properly tracked, updated, and patched independently, resolving the security concerns that had plagued the previous packaging approach.


Dependency Graph

To give you an idea of the complexity involved in packaging Kubernetes for Debian, the image below is a dependency graph generated with debtree, visualizing all the Go modules and other dependencies required to build the kubectl binary.

kubectl-depgraph

This web of nodes and edges represents every module and its relationship during the compilation process of kubectl. Each box is a Debian package, and the lines connecting them show how deeply intertwined the ecosystem is. What might look like a mess of blue spaghetti is actually a clear demonstration of the vast and interconnected upstream world that tools like kubectl rely on.

But more importantly, this graph is a testament to the effort that went into making kubectl build entirely using Debian-packaged dependencies only, no vendoring, no downloading from the internet, no proprietary blobs.


Upstream Version 1.32.3 and Beyond

After nearly two years of work, we successfully uploaded version 1.32.3+ds of kubectl to Debian unstable.

kubernetes/-/merge_requests/1

The new package also includes:

  • Zsh, Fish, and Bash completions installed automatically
  • Man pages and metadata for improved discoverability
  • Full integration with kind and docker for testing purposes

Integration Testing with Autopkgtest

To ensure the reliability of kubectl in real-world scenarios, we developed a new autopkgtest suite that runs integration tests using real Kubernetes clusters created via Kind.

Autopkgtest is a Debian tool used to run automated tests on binary packages. These tests are executed after the package is built but before it’s accepted into the Debian archive, helping catch regressions and integration issues early in the packaging pipeline.

Our test workflow validates kubectl by performing the following steps:

  • Installing Kind and Docker as test dependencies.
  • Spinning up two local Kubernetes clusters.
  • Switching between cluster contexts to ensure multi-cluster support.
  • Deploying and scaling a sample nginx application using kubectl.
  • Cleaning up the entire test environment to avoid side effects.

  • debian/tests/kubectl.sh

Popcon: Measuring Adoption

To measure real-world usage, we rely on data from Debian’s popularity contest (popcon), which gives insight into how many users have each binary installed.

popcon-graph popcon-table

Here’s what the data tells us:

  • kubectl (new binary): Already installed on 2,124 systems.
  • golang-k8s-kubectl-dev: This is the Go development package (a library), useful for other packages and developers who want to interact with Kubernetes programmatically.
  • kubernetes-client: The legacy package that kubectl is replacing. We expect this number to decrease in future releases as more systems transition to the new package.

Although the popcon data shows activity for kubectl before the official Debian upload date, it’s important to note that those numbers represent users who had it installed from upstream source-lists, not from the Debian repositories. This distinction underscores a demand that existed even before the package was available in Debian proper, and it validates the importance of bringing it into the archive.

Also worth mentioning: this number is not the real total number of installations, since users can choose not to participate in the popularity contest. So the actual adoption is likely higher than what popcon reflects.


Community and Documentation

The team also maintains a dedicated wiki page which documents:

  • Maintained tools and packages
  • Contribution guidelines
  • Our roadmap for the upcoming Debian releases

https://debian-kubernetes.org


Looking Ahead to Debian 13 (Trixie)

The next stable release of Debian will ship with kubectl version 1.32.3, built from a clean, de-vendorized source. This version includes nearly all the latest upstream features, and will be the first time in years that Debian users can rely on an up-to-date, policy-compliant kubectl directly from the archive.

By comparing with upstream, our Debian package even delivers more out of the box, including shell completions, which the upstream still requires users to generate manually.

In 2025, the Debian Kubernetes team will continue expanding our packaging efforts for the Kubernetes ecosystem.

Our roadmap includes:

  • kubelet: The primary node agent that runs on each node. This will enable Debian users to create fully functional Kubernetes nodes without relying on external packages.

  • kubeadm: A tool for creating Kubernetes clusters. With kubeadm in Debian, users will then be able to bootstrap minimum viable clusters directly from the official repositories.

  • helm: The package manager for Kubernetes that helps manage applications through Kubernetes YAML files defined as charts.

  • kompose: A conversion tool that helps users familiar with docker-compose move to Kubernetes by translating Docker Compose files into Kubernetes resources.


Final Thoughts

This journey was only possible thanks to the amazing support of the debian-devel-br community and the collective effort of contributors who stepped up to package missing dependencies, fix bugs, and test new versions.

Special thanks to:

  • Carlos Henrique Melara (@charles)
  • Guilherme Puida (@puida)
  • João Pedro Nobrega (@jnpf)
  • Lucas Kanashiro (@kanashiro)
  • Matheus Polkorny (@polkorny)
  • Samuel Henrique (@samueloph)
  • Sergio Cipriano (@cipriano)
  • Sergio Durigan Junior (@sergiodj)

I look forward to continuing this work, bringing more Kubernetes tools into Debian and improving the developer experience for everyone.

29 May, 2025 12:00AM

May 28, 2025

hackergotchi for Clint Adams

Clint Adams

Potted meat is viewed differently by different cultures

I've been working on a multi-label email classification model. It's been a frustrating slog, fraught with challenges, including a lack of training data. Labeling emails is labor-intensive and error-prone. Also, I habitually delete certain classes of email immediately after its usefulness has been reduced. I use a CRM-114-based spam filtering system (actually I use two different isntances of the same mailreaver config, but that's another story), which is differently frustrating, but I delete spam when it's detected or when it's trained. Fortunately, there's no shortage of incoming spam, so I can collect enough, but for other, arguably more important labels, they arrive infrequently. So, those labels need to be excluded, or the small sample sizes wreck the training feedback loop. Currently, I have ten active labels, and even though the point of this is not to be a spam filter, “spam” is one of the labels.

Out of curiosity, I decided to compare the performance of my three different models, and to do so on a neutral corpus (in other words, emails that none of them had ever been trained on). I grabbed the full TREC 2007 corpus and ran inference. The results were unexpected in many ways. For example, the Pearson correlation coefficient between my older CRM-114 model and my newer CRM-114 was only about 0.78.

I was even more surprised by how poorly all three performed. Were they overfit to my email? So, I decided to look at the TREC corpus for the first time, and lo and behold, the first spam-labeled email I checked was something I would definitely train all three models with as non-spam, but ham for CRM-114 and an entirely different label for my experimental model.

Posted on 2025-05-28
Tags:

28 May, 2025 06:32PM

hackergotchi for Jonathan Dowland

Jonathan Dowland

Linux Mount Namespaces

I've been refreshing myself on the low-level guts of Linux container technology. Here's some notes on mount namespaces.

In the below examples, I will use more than one root shell simultaneously. To disambiguate them, the examples will feature a numbered shell prompt: 1# for the first shell, and 2# for the second.

Preliminaries

Namespaces are normally associated with processes and are removed when the last associated process terminates. To make them persistent, you have to bind-mount the corresponding virtual file from an associated processes's entry in /proc, to another path1.

The receiving path needs to have its "propogation" property set to "private". Most likely your system's existing mounts are mostly "public". You can check the propogation setting for mounts with

1# findmnt -o+PROPAGATION

We'll create a new directory to hold mount namespaces we create, and set its Propagation to private, via a bind-mount of itself to itself.

1# mkdir /root/mntns
1# mount --bind --make-private /root/mntns /root/mntns

The namespace itself needs to be bind-mounted over a file rather than a directory, so we'll create one.

1# touch /root/mntns/1

Creating and persisting a new mount namespace

1# unshare --mount=/root/mntns/1

We are now 'inside' the new namespace in a new shell process. We'll change the shell prompt to make this clearer

PS1='inside# '

We can make a filesystem change, such as mounting a tmpfs

inside# mount -t tmpfs /mnt /mnt
inside# touch /mnt/hi-there

And observe it is not visible outside that namespace

2# findmnt /mnt
2# stat /mnt/hi-there
stat: cannot statx '/mnt/hi-there': No such file or directory

Back to the namespace shell, we can find an integer identifier for the namespace via the shell processes /proc entry:

inside# readlink /proc/$$/ns/mnt

It will be something like mnt:[4026533646]. From another shell, we can list namespaces and see that it exists:

2# lsns -t mnt
        NS TYPE NPROCS   PID USER             COMMAND
…
4026533646 mnt       1 52525 root             -bash

If we exit the shell that unshare created,

inside# exit

running lsns again should2 still list the namespace, albeit with the NPROCS column now reading 0.

2# lsns -t mnt

We can see that a virtual filesystem of type nsfs is mounted at the path we selected when we ran unshare:

2# grep /root/mntns/1 /proc/mounts 
nsfs /root/mntns/1 nsfs rw 0 0

Entering the namespace from another process

This is relatively easy:

1# nsenter --mount=/root/mntns/1
1# stat /mnt/hi-there
  File: /mnt/hi-there
…

More to come in future blog posts!

References

These were particularly useful in figuring this out:


  1. This feels really weird to me. At least at first. I suppose it fits with the "everything is a file" philosophy.
  2. I've found lsns in util-linux 2.38.1 (from 2022-08-04) doesn't list mount namespaces with no associated processes; but 2.41 (from 2025-03-18) does. The fix landed in 2022-11-08. For extra fun, I notice that a namespace can be held persistent with a file descriptor which is unlinked from the filesystem

28 May, 2025 05:53PM

hackergotchi for Yves-Alexis Perez

Yves-Alexis Perez

Running autopkgtests locally

As a small addendum to the last post, here are the relevant commands #debci helpfully provided.

First, you need to install the autopkgtest package, obviously:

# apt install autopkgtest

Then you need to create a Debian virtual machine to run the tests (put the sid.raw wherever you prefer):

# autopkgtest-build-qemu sid /tmp/sid.raw

Then you can run the tests themselves, using the just created virtual machine. The autopkgtest command can use the tests from various sources, using the last argument to the command. In my case what was the most helpful was to run the tests from my git clone (which uses gbp) so I could edit the tests directly. So I didn't give anything for testsrc (but . would work as well I guess).

$ autopkgtest -BU --add-apt-release=unstable --pin-packages=unstable=strongswan -- qemu /tmp/sid.raw --ram-size=4096 --cpus=1

Then I could play with the tests themselves, the number of CPU for the Qemu VM, and run everything in a loop.

28 May, 2025 01:12PM by Yves-Alexis (corsac@debian.org)

hackergotchi for Bits from Debian

Bits from Debian

Debian welcomes the 2025 GSOC contributors/students

GSoC logo

We are very excited to announce that Debian has selected nine contributors to work under mentorship on a variety of projects with us during the Google Summer of Code.

Here is a list of the projects and students, along with details of the tasks to be performed.


Project: Quality assurance and continuous integration for biological and medical applications inside Debian

  • Intern: Harish Chavre

Deliverables of the project: Continuous integration tests for Debian Med applications lacking a test, Quality Assurance review and bug fixing if issues might be uncovered.


Project: Device-specific Tweaks Management

  • Intern: Mohammed ElDegwi

Deliverables of the project: Analysis and discussion of the current state of device tweaks management in Debian and Mobian. Proposal for a unified, run-time approach. Packaging of this service and tweaks data/configuration for at least one device.


Project: Enhancing Debian packages with ROCm GPU acceleration

  • Interns: Spaarsh, utk4r-sh

Deliverables of the project: New Debian packages with GPU support. Enhanced GPU support within existing Debian packages. More autopackagetests running on the Debian ROCm CI.


Project: Make Debian for Raspberry Pi Build Again

  • Interns: Kurva Prashanth

Deliverables of the project: Refreshing the set of daily-built images. Having the set of daily-built images become automatic again—that is, go back to the promise of having it daily-built. Write an Ansible playbook/Chef recipe/Puppet whatsitsname to define a virtual serve and have it build daily. Do the (very basic!) hardware testing on several Raspberry computers. Do note, naturally, this will require having access to the relevant hardware.


Project: Package LLM Inference Libraries

  • Intern: k1000dai

Deliverables of the project: Eventually I hope we can make vLLM into Debian archive, based on which we can deliver something for LLM inference out-of-the-box. If the amount of work eventually turns to be beyond my expectation, I'm still happy to see how far we can go towards this goal. If the amount of work required for vLLM is less than I expected, we can also look at something else like SGLang, another open source LLM inference library.


Project: Autopkgtests for the rsync package

  • Intern: puer robustus

Deliverables of the project: Autopkgtests for the rsync package.


Project: Enhancing Salsa CI in Debian

  • Interns: Aayush (@44yu5h), aquilamacedo

Deliverables of the project: More features, robustness, speed.


Congratulations and welcome to all the contributors!

The Google Summer of Code program is possible in Debian thanks to the efforts of Debian Developers and Debian Contributors that dedicate part of their free time to mentor contributors and outreach tasks.

Join us and help extend Debian! You can follow the contributors' weekly reports on the debian-outreach mailing-list, chat with us on our IRC channel or reach out to the individual projects' team mailing lists.

28 May, 2025 10:04AM by Abhijith PA

Debian Day 2025 - call for celebration

Each year on August the 16th, we celebrate the Debian Project Anniversary.

Several communities around the world join us in celebrating "Debian Day" with local events, parties, or gatherings.

So, how about celebrating the 32nd anniversary of the Debian Project in 2025 in your city? As the 16th of August falls on a Saturday this year, we believe it is great timing to gather people around your event.

We invite you and your local community to organize a Debian Day by hosting an event with talks, workshops, a bug squashing party, or OpenPGP keysigning gathering, etc.

You could also hold a meeting with others in the Debian community in a smaller social setting like a bar/pizzeria/cafeteria/restaurant to celebrate. In other words, any type of celebrating is valid!

Remember to add your city to the Debian Day wiki page

There is a list of Debian Local Groups around the world. If your city is listed, talk to them to organize DebianDay together.

To inspire you and your local community, see some photos from 2023 and 2024

Let's use hashtags #DebianDay #DebianDay2025 on social media.

debianday-logo

28 May, 2025 07:30AM by The Debian Publicity Team

May 27, 2025

Ravi Dwivedi

Singapore Visa Process

In November 2024, Badri and I applied for a Singapore visa to visit the country. To apply for a Singapore visa, you need to visit an authorized travel agent listed by the Singapore High Commission on their website. Unlike the Schengen visa (where only VFS can process applications), the Singapore visa has many authorized travel agents to choose from. I remember that the list mentioned as many as 25 authorized agents in Chennai. For my application, I randomly selected Ria International in Karol Bagh, New Delhi from the list.

Further, you need to apply not more than a month before your travel dates. As our travel dates were in December, we applied in the month of November.

For your reference, I submitted the following documents:

  • Passport
  • My photograph (35 mm x 45 mm)
  • Visa application form (Form 14A)
  • Cover letter to the Singapore High Commission, New Delhi
  • Proof of employment
  • Hotel booking
  • Flight ticket (reservations are sufficient)
  • Bank account statement for the last 6 months

I didn’t have my photograph in the specified dimensions, so the travel agent took my photo on the spot. The visa application was ₹2,567. Furthermore, I submitted my application on a Saturday and received a call from the travel agent on Tuesday informing me that they had received my visa from the Singapore High Commission.

The next day, I visit the travel agent’s office and picked up my passport and a black and white copy of my e-visa. Later, I downloaded a PDF of my visa from the website mentioned on it, and took a colored printout myself.

Singapore granted me a multiple-entry visa for 2 months, even though I had applied for a 4-day single-entry visa. We were planning to add more countries to this trip; therefore, a multiple-entry visa would be helpful in case we wanted to use Singapore Airport, as it has good connectivity. However, it turned out that flights from Kuala Lumpur were much cheaper than those from Singapore, so we didn’t enter Singapore again after leaving.

Badri also did the same process but entirely remotely—he posted the documents to the visa agency in Chennai, and got his e-visa in a few days followed by his original passport which was delivered by courier.

He got his photo taken in the same dimensions mentioned above, and printed as matte finish as instructed. However, the visa agents asked why his photo was looking so faded. We don’t know if they thought the matte finish was faded or what. To rectify this, Badri emailed them a digital copy of the photo to them (both the cropped version and the original) and they handled the reprinting on their end (which he never got to see).

Before entering Singapore, we had to fill an arrival card - an online form asking a few details about our trip - within 72 hours of our arrival in Singapore.

That’s it for now. Meet you in the next post.

Thanks to Badri for reviewing the draft.

27 May, 2025 02:50PM

Russell Coker

Leaf ZE1

I’ve just got a second hand Nissan LEAF. It’s not nearly as luxurious as the Genesis EV that I test drove [1]. It’s also just over 5 years old so it’s not as slick as the MG4 I test drove [2]. But the going rate for a LEAF of that age is $17,000 vs $35,000 or more for a new MG4 or $130,000+ for a Genesis. At this time the LEAF is the only EV in Australia that’s available on the second hand market in quantity. Apparently the cheapest new EV in Australia is a Great Wall one which is $32,000 and which had a wait list last time I checked, so $17,000 is a decent price if you want an electric car and aren’t interested in paying the price of a new car.

Starting the Car

One thing I don’t like about most recent cars (petrol as well as electric) is that they needlessly break traditions of car design. Inserting a key and turning it clockwise to start a car is a long standing tradition that shouldn’t be broken without a good reason. With the use of traditional keys you know that when a car has the key removed it can’t be operated, there’s no situation of the person with the key walking away and leaving the car driveable and there’s no possibility of the owner driving somewhere without the key and then being unable to start it. To start a LEAF you have to have the key fob device in range, hold down the brake pedal, and then press the power button. To turn on accessories you do the same but without holding down the brake pedal. They also have patterns of pushes, push twice to turn it on, push three times to turn it off. This is all a lot easier with a key where you can just rotate it as many clicks as needed.

The change of car design for the key means that no physical contact is needed to unlock the car. If someone stands by a car fiddling with the door lock it will get noticed which deters certain types of crime. If a potential thief can sit in a nearby car to try attack methods and only walk to the target vehicle once it’s unlocked it makes the crime a lot easier. Even if the electronic key is as secure as a physical key allowing attempts to unlock remotely weakens security. Reports on forums suggest that the electronic key is vulnerable to replay attacks. I guess I just have to hope that as car thieves typically get less than 10% of the value of a car it’s just not worth their effort to steal a $17,000 car. Unlocking doors remotely is a common feature that’s been around for a while but starting a car without a key being physically inserted is a new thing.

Other Features

The headlights turn on automatically when the car thinks that the level of ambient light warrants it. There is an option to override this to turn on lights but no option to force the lights to be off. So if you have your car in the “on” state while parked the headlights will be on even if you are parked and listening to the radio.

The LEAF has a bunch of luxury features which seem a bit ridiculous like seat warmers. It also has a heated steering wheel which has turned out to be a good option for me as I have problems with my hands getting cold. According to the My Nissan LEAF Forum the seat warmer uses a maximum of 50W per seat while the car heater uses a minimum of 250W [3]. So if there are one or two people in the car then significantly less power is used by just heating the seats and also keeping the car air cool reduces window fog.

The Bluetooth audio support works well. I’ve done hands free calls and used it for playing music from my phone. This is the first car I’ve owned with Bluetooth support. It also has line-in which might have had some use in 2019 but is becoming increasingly useless as phones with Bluetooth become more popular. It has support for two devices connecting via Bluetooth at the same time which could be handy if you wanted to watch movies on a laptop or tablet while waiting for someone.

The LEAF has some of the newer safety features, it tracks lane markers and notifies the driver via beeps and vibration if they stray from their lane. It also tries to read speed limit signs and display the last observed speed limit on the dash display. It also has a skid alert which in my experience goes off under hard acceleration when it’s not skidding but doesn’t go off if you lose grip when cornering. The features for detecting changing lanes when close to other cars and for emergency braking when another car is partly in the lane (even if moving out of the lane) don’t seem well tuned for Australian driving, the common trend on Australian roads is lawful-evil to use DND terminology.

Range

My most recent driving was just over 2 hours driving with a distance of a bit over 100Km which took the battery from 62% to 14%. So it looks like I can drive a bit over 200Km at an average speed of 50Km/h. I have been unable to find out the battery size for my car, my model will have either a 40KWh or 62KWh battery. Google results say it should be printed on the B pillar (it’s not) and that it can be deduced from the VIN (it can’t). I’m guessing that my car is the cheaper option which is supposed to do 240Km when new which means that a bit over 200Km at an average speed of 50Km/h when 6yo is about what’s expected. If it has the larger battery designed to do 340Km then doing 200Km in real use would be rather disappointing.

Assuming the battery is 40KWh that means it’s 5Km/KWh or 10KW average for the duration. That means that the 250W or so used by the car heater should only make a about 2% difference to range which is something that a human won’t usually notice. If I was to drive to another state I’d definitely avoid using the heater or airconditioner as an extra 4km could really matter when trying to find a place to charge when you aren’t familiar with the area. It’s also widely reported that the LEAF is less efficient at highway speeds which is an extra difficulty for that.

It seems that the LEAF just isn’t designed for interstate driving in Australia, it would be fine for driving between provinces of the Netherlands as it’s difficult to drive for 200km without leaving that country. Driving 700km to another city in a car with 200km range would mean charging 3 times along the way, that’s 2 hours of charging time when using fast chargers. This isn’t a problem at all as the average household in Australia has 1.8 cars and the battery electric vehicles only comprise 6.3% of the market. So if a household had a LEAF and a Prius they could just use the Prius for interstate driving. A recent Prius could drive from Melbourne to Canberra or Adelaide without refuelling on the way.

If I was driving to another state a couple of times a year I could rent an old fashioned car to do that and still be saving money when compared to buying petrol all the time.

Running Cost

Currently I’m paying about $0.28 per KWh for electricity, it’s reported that the efficiency of charging a LEAF is as low as 83% with the best efficiency when fast charging. I don’t own the fast charge hardware and don’t plan to install it as that would require getting a replacement of the connection to my home from the street, a new switchboard, and other expenses. So I expect I’ll be getting 83% efficiency when charging which means 48KWh for 200KM or 96KWH for the equivalent of a $110 tank of petrol. At $0.28/KWh it will cost $26 for the same amount of driving as $110 of petrol. I also anticipate saving money on service as there’s no need for engine oil changes and all the other maintenance of a petrol engine and regenerative braking will reduce the incidence of brake pad replacement.

I expect to save over $1100 per annum on using electricity instead of petrol even if I pay the full rate. But if I charge my car in the middle of the day when there is over supply and I don’t get paid for feeding electricity from my solar panels into the grid (as is common nowadays) it could be almost free to charge the car and I could save about $1500 on fuel.

Comfort

Electric cars are much quieter than cars with petrol or Diesel engines which is a major luxury feature. This car is also significantly newer than any other car I’ve driven much so it has features like Bluetooth audio which weren’t in other cars I’ve driven. When doing 100Km/h I can hear a lot of noise from the airflow, part of that would be due to the LEAF not having the extreme streamlining features that are associated with Teslas (such as retracting door handles) and part of that would be due to the car being older and the door seals not being as good as they were when new. It’s still a very quiet car with a very smooth ride. It would be nice if they used the quality of seals and soundproofing that VW uses in the Passat but I guess the car would be heavier and have a shorter range if they did that.

This car has less space for the driver than any other car I’ve driven (with the possible exception of a 1989 Ford Laser AKA Mazda 323). The front seats have less space than the Prius. Also the batteries seem to be under the front seats so there’s a bulge in the floor going slightly in front of the front seats when they are moved back which gives less space for the front passenger to move their legs and less space for the driver when sitting in a parked car. There are a selection of electric cars from MG, BYD, and Great Wall that have more space in the front seats, if those cars were on the second hand market I might have made a different choice but a second hand LEAF is the only option for a cheap electric car in Australia now.

The heated steering wheel and heated seats took a bit of getting used to but I have come to appreciate the steering wheel and the heated seats are a good way of extending the range of the car.

Misc Notes

The LEAF is a fun car to drive and being quiet is a luxury feature, it’s no different to other EVs in this regard. It isn’t nearly as fast as a Tesla, but is faster than most cars actually drive on the road.

When I was looking into buying a LEAF from one of the car sales sites I was looking at models less than 5 years old. But the ZR1 series went from 2017 to 2023 so there’s probably not much difference between a 2019 model and a 2021 model but there is a significant price difference. I didn’t deliberately choose a 2019 car, it was what a relative was selling at a time when I needed a new car. But knowing what I know now I’d probably look at that age of LEAF if choosing from the car sales sites.

Problems

When I turn the car off the side mirrors fold in but when I turn it on they usually don’t automatically unfold if I have anything connected to the cigarette lighter power port. This is a well known problem and documented on forums. This is something that Nissan really should have tested before release because phone chargers that connect to the car cigarette lighter port have been common for at least 6 years before my car was manufactured and at least 4 years before the ZE1 model was released.

The built in USB port doesn’t supply enough power to match the power use of a Galaxy Note 9 running Google maps and playing music through Bluetooth. On it’s own this isn’t a big deal but combined with the mirror issue of using a charger in the cigarette lighter port it’s a problem.

The cover over the charging ports doesn’t seem to lock easily enough, I had it come open when doing 100Km/h on a freeway. This wasn’t a big deal but as the cover opens in a suicide-door manner at a higher speed it could have broken off.

The word is that LEAF service in Australia is not done well. Why do you need regular service of an electric car anyway? For petrol and Diesel cars it’s engine oil replacement that makes it necessary to have regular service. Surely you can just drive it until either the brakes squeak or the tires seem worn.

I have been having problems charging, sometimes it will charge from ~20% to 100% in under 24 hours, sometimes in 14+ hours it only gets to 30%.

Conclusion

This is a good car and the going price on them is low. I generally recommend them as long as you aren’t really big and aren’t too worried about the poor security.

It’s a fun car to drive even with a few annoying things like the mirrors not automatically extending on start.

The older ones like this are cheap enough that they should be able to cover the entire purchase cost in 10 years by the savings from not buying petrol even if you don’t drive a lot. With a petrol car I use about 13 tanks of petrol a year so my driving is about half the average for Australia. Some people could cover the purchase price of a second hand leaf in under 5 years.

27 May, 2025 10:24AM by etbe

Russ Allbery

INN 2.7.3

This is a bug fix and minor feature release over INN 2.7.2, and the upgrade should be painless. You can download the new release from ISC or my personal INN pages. The latter also has links to the full changelog and the other INN documentation.

For the full list of changes, see the INN 2.7.3 NEWS file.

As always, thanks to Julien ÉLIE for preparing this release and doing most of the maintenance work on INN!

27 May, 2025 03:24AM

May 26, 2025

hackergotchi for Otto Kekäläinen

Otto Kekäläinen

Creating Debian packages from upstream Git

Featured image of post Creating Debian packages from upstream Git

In this post, I demonstrate the optimal workflow for creating new Debian packages in 2025, preserving the upstream git history. The motivation for this is to lower the barrier for sharing improvements to and from upstream, and to improve software provenance and supply-chain security by making it easy to inspect every change at any level using standard git tooling.

Key elements of this workflow include:

  • Using a Git fork/clone of the upstream repository as the starting point for creating Debian packaging repositories.
  • Consistent use of the same git-buildpackage commands, with all package-specific options in gbp.conf.
  • DEP-14 tag and branch names for an optimal Git packaging repository structure.
  • Pristine-tar and upstream signatures for supply-chain security.
  • Use of Files-Excluded in the debian/copyright file to filter out unwanted files in Debian.
  • Patch queues to easily rebase and cherry-pick changes across Debian and upstream branches.
  • Efficient use of Salsa, Debian’s GitLab instance, for both automated feedback from CI systems and human feedback from peer reviews.

To make the instructions so concrete that anyone can repeat all the steps themselves on a real package, I demonstrate the steps by packaging the command-line tool Entr. It is written in C, has very few dependencies, and its final Debian source package structure is simple, yet exemplifies all the important parts that go into a complete Debian package:

  1. Creating a new packaging repository and publishing it under your personal namespace on salsa.debian.org.
  2. Using dh_make to create the initial Debian packaging.
  3. Posting the first draft of the Debian packaging as a Merge Request (MR) and using Salsa CI to verify Debian packaging quality.
  4. Running local builds efficiently and iterating on the packaging process.

Create new Debian packaging repository from the existing upstream project git repository

First, create a new empty directory, then clone the upstream Git repository inside it:

shell
mkdir debian-entr
cd debian-entr
git clone --origin upstreamvcs --branch master \
 --single-branch https://github.com/eradman/entr.git

Using a clean directory makes it easier to inspect the build artifacts of a Debian package, which will be output in the parent directory of the Debian source directory.

The extra parameters given to git clone lay the foundation for the Debian packaging git repository structure where the upstream git remote name is upstreamvcs. Only the upstream main branch is tracked to avoid cluttering git history with upstream development branches that are irrelevant for packaging in Debian.

Next, enter the git repository directory and list the git tags. Pick the latest upstream release tag as the commit to start the branch upstream/latest. This latest refers to the upstream release, not the upstream development branch. Immediately after, branch off the debian/latest branch, which will have the actual Debian packaging files in the debian/ subdirectory.

shell
cd entr
git tag # shows the latest upstream release tag was '5.6'
git checkout -b upstream/latest 5.6
git checkout -b debian/latest
%%{init: { 'gitGraph': { 'mainBranchName': 'master' } } }%%
gitGraph:
checkout master
commit id: "Upstream 5.6 release" tag: "5.6"
branch upstream/latest
checkout upstream/latest
commit id: "New upstream version 5.6" tag: "upstream/5.6"
branch debian/latest
checkout debian/latest
commit id: "Initial Debian packaging"
commit id: "Additional change 1"
commit id: "Additional change 2"
commit id: "Additional change 3"

At this point, the repository is structured according to DEP-14 conventions, ensuring a clear separation between upstream and Debian packaging changes, but there are no Debian changes yet. Next, add the Salsa repository as a new remote which called origin, the same as the default remote name in git.

shell
git remote add origin git@salsa.debian.org:otto/entr-demo.git
git push --set-upstream origin debian/latest

This is an important preparation step to later be able to create a Merge Request on Salsa that targets the debian/latest branch, which does not yet have any debian/ directory.

Launch a Debian Sid (unstable) container to run builds in

To ensure that all packaging tools are of the latest versions, run everything inside a fresh Sid container. This has two benefits: you are guaranteed to have the most up-to-date toolchain, and your host system stays clean without getting polluted by various extra packages. Additionally, this approach works even if your host system is not Debian/Ubuntu.

shell
cd ..
podman run --interactive --tty --rm --shm-size=1G --cap-add SYS_PTRACE \
 --env='DEB*' --volume=$PWD:/tmp/test --workdir=/tmp/test debian:sid bash

Note that the container should be started from the parent directory of the git repository, not inside it. The --volume parameter will loop-mount the current directory inside the container. Thus all files created and modified are on the host system, and will persist after the container shuts down.

Once inside the container, install the basic dependencies:

shell
apt update -q && apt install -q --yes git-buildpackage dpkg-dev dh-make

Automate creating the debian/ files with dh-make

To create the files needed for the actual Debian packaging, use dh_make:

shell
# dh_make --packagename entr_5.6 --single --createorig
Maintainer Name : Otto Kekäläinen
Email-Address : otto@debian.org
Date : Sat, 15 Feb 2025 01:17:51 +0000
Package Name : entr
Version : 5.6
License : blank
Package Type : single
Are the details correct? [Y/n/q]

Done. Please edit the files in the debian/ subdirectory now.

Due to how dh_make works, the package name and version need to be written as a single underscore separated string. In this case, you should choose --single to specify that the package type is a single binary package. Other options would be --library for library packages (see libgda5 sources as an example) or --indep (see dns-root-data sources as an example). The --createorig will create a mock upstream release tarball (entr_5.6.orig.tar.xz) from the current release directory, which is necessary due to historical reasons and how dh_make worked before git repositories became common and Debian source packages were based off upstream release tarballs (e.g. *.tar.gz).

At this stage, a debian/ directory has been created with template files, and you can start modifying the files and iterating towards actual working packaging.

shell
git add debian/
git commit -a -m "Initial Debian packaging"

Review the files

The full list of files after the above steps with dh_make would be:

|-- entr
| |-- LICENSE
| |-- Makefile.bsd
| |-- Makefile.linux
| |-- Makefile.linux-compat
| |-- Makefile.macos
| |-- NEWS
| |-- README.md
| |-- configure
| |-- data.h
| |-- debian
| | |-- README.Debian
| | |-- README.source
| | |-- changelog
| | |-- control
| | |-- copyright
| | |-- gbp.conf
| | |-- entr-docs.docs
| | |-- entr.cron.d.ex
| | |-- entr.doc-base.ex
| | |-- manpage.1.ex
| | |-- manpage.md.ex
| | |-- manpage.sgml.ex
| | |-- manpage.xml.ex
| | |-- postinst.ex
| | |-- postrm.ex
| | |-- preinst.ex
| | |-- prerm.ex
| | |-- rules
| | |-- salsa-ci.yml.ex
| | |-- source
| | | `-- format
| | |-- upstream
| | | `-- metadata.ex
| | `-- watch.ex
| |-- entr.1
| |-- entr.c
| |-- missing
| | |-- compat.h
| | |-- kqueue_inotify.c
| | |-- strlcpy.c
| | `-- sys
| | `-- event.h
| |-- status.c
| |-- status.h
| `-- system_test.sh
`-- entr_5.6.orig.tar.xz

You can browse these files in the demo repository.

The mandatory files in the debian/ directory are:

  • changelog,
  • control,
  • copyright,
  • and rules.

All the other files have been created for convenience so the packager has template files to work from. The files with the suffix .ex are example files that won’t have any effect until their content is adjusted and the suffix removed.

For detailed explanations of the purpose of each file in the debian/ subdirectory, see the following resources:

  • The Debian Policy Manual: Describes the structure of the operating system, the package archive and requirements for packages to be included in the Debian archive.
  • The Developer’s Reference: A collection of best practices and process descriptions Debian packagers are expected to follow while interacting with one another.
  • Debhelper man pages: Detailed information of how the Debian package build system works, and how the contents of the various files in ‘debian/’ affect the end result.

As Entr, the package used in this example, is a real package that already exists in the Debian archive, you may want to browse the actual Debian packaging source at https://salsa.debian.org/debian/entr/-/tree/debian/latest/debian for reference.

Most of these files have standardized formatting conventions to make collaboration easier. To automatically format the files following the most popular conventions, simply run wrap-and-sort -vast or debputy reformat --style=black.

Identify build dependencies

The most common reason for builds to fail is missing dependencies. The easiest way to identify which Debian package ships the required dependency is using apt-file. If, for example, a build fails complaining that pcre2posix.h cannot be found or that libcre2-posix.so is missing, you can use these commands:

shell
$ apt install -q --yes apt-file && apt-file update
$ apt-file search pcre2posix.h
libpcre2-dev: /usr/include/pcre2posix.h
$ apt-file search libpcre2-posix.so
libpcre2-dev: /usr/lib/x86_64-linux-gnu/libpcre2-posix.so
libpcre2-posix3: /usr/lib/x86_64-linux-gnu/libpcre2-posix.so.3
libpcre2-posix3: /usr/lib/x86_64-linux-gnu/libpcre2-posix.so.3.0.6

The output above implies that the debian/control should be extended to define a Build-Depends: libpcre2-dev relationship.

There is also dpkg-depcheck that uses strace to trace the files the build process tries to access, and lists what Debian packages those files belong to. Example usage:

shell
dpkg-depcheck -b debian/rules build

Build the Debian sources to generate the .deb package

After the first pass of refining the contents of the files in debian/, test the build by running dpkg-buildpackage inside the container:

shell
dpkg-buildpackage -uc -us -b

The options -uc -us will skip signing the resulting Debian source package and other build artifacts. The -b option will skip creating a source package and only build the (binary) *.deb packages.

The output is very verbose and gives a large amount of context about what is happening during the build to make debugging build failures easier. In the build log of entr you will see for example the line dh binary --buildsystem=makefile. This and other dh commands can also be run manually if there is a need to quickly repeat only a part of the build while debugging build failures.

To see what files were generated or modified by the build simply run git status --ignored:

shell
$ git status --ignored
On branch debian/latest

Untracked files:
 (use "git add <file>..." to include in what will be committed)
 debian/debhelper-build-stamp
 debian/entr.debhelper.log
 debian/entr.substvars
 debian/files

Ignored files:
 (use "git add -f <file>..." to include in what will be committed)
 Makefile
 compat.c
 compat.o
 debian/.debhelper/
 debian/entr/
 entr
 entr.o
 status.o

Re-running dpkg-buildpackage will include running the command dh clean, which assuming it is configured correctly in the debian/rules file will reset the source directory to the original pristine state. The same can of course also be done with regular git commands git reset --hard; git clean -fdx. To avoid accidentally committing unnecessary build artifacts in git, a debian/.gitignore can be useful and it would typically include all four files listed as “untracked” above.

After a successful build you would have the following files:

shell
|-- entr
| |-- LICENSE
| |-- Makefile -> Makefile.linux
| |-- Makefile.bsd
| |-- Makefile.linux
| |-- Makefile.linux-compat
| |-- Makefile.macos
| |-- NEWS
| |-- README.md
| |-- compat.c
| |-- compat.o
| |-- configure
| |-- data.h
| |-- debian
| | |-- README.source.md
| | |-- changelog
| | |-- control
| | |-- copyright
| | |-- debhelper-build-stamp
| | |-- docs
| | |-- entr
| | | |-- DEBIAN
| | | | |-- control
| | | | `-- md5sums
| | | `-- usr
| | | |-- bin
| | | | `-- entr
| | | `-- share
| | | |-- doc
| | | | `-- entr
| | | | |-- NEWS.gz
| | | | |-- README.md
| | | | |-- changelog.Debian.gz
| | | | `-- copyright
| | | `-- man
| | | `-- man1
| | | `-- entr.1.gz
| | |-- entr.debhelper.log
| | |-- entr.substvars
| | |-- files
| | |-- gbp.conf
| | |-- patches
| | | |-- PR149-expand-aliases-in-system-test-script.patch
| | | |-- series
| | | |-- system-test-skip-no-tty.patch
| | | `-- system-test-with-system-binary.patch
| | |-- rules
| | |-- salsa-ci.yml
| | |-- source
| | | `-- format
| | |-- tests
| | | `-- control
| | |-- upstream
| | | |-- metadata
| | | `-- signing-key.asc
| | `-- watch
| |-- entr
| |-- entr.1
| |-- entr.c
| |-- entr.o
| |-- missing
| | |-- compat.h
| | |-- kqueue_inotify.c
| | |-- strlcpy.c
| | `-- sys
| | `-- event.h
| |-- status.c
| |-- status.h
| |-- status.o
| `-- system_test.sh
|-- entr-dbgsym_5.6-1_amd64.deb
|-- entr_5.6-1.debian.tar.xz
|-- entr_5.6-1.dsc
|-- entr_5.6-1_amd64.buildinfo
|-- entr_5.6-1_amd64.changes
|-- entr_5.6-1_amd64.deb
`-- entr_5.6.orig.tar.xz

The contents of debian/entr are essentially what goes into the resulting entr_5.6-1_amd64.deb package. Familiarizing yourself with the majority of the files in the original upstream source as well as all the resulting build artifacts is time consuming, but it is a necessary investment to get high-quality Debian packages.

There are also tools such as Debcraft that automate generating the build artifacts in separate output directories for each build, thus making it easy to compare the changes to correlate what change in the Debian packaging led to what change in the resulting build artifacts.

Re-run the initial import with git-buildpackage

When upstreams publish releases as tarballs, they should also be imported for optimal software supply-chain security, in particular if upstream also publishes cryptographic signatures that can be used to verify the authenticity of the tarballs.

To achieve this, the files debian/watch, debian/upstream/signing-key.asc, and debian/gbp.conf need to be present with the correct options. In the gbp.conf file, ensure you have the correct options based on:

  1. Does upstream release tarballs? If so, enforce pristine-tar = True.
  2. Does upstream sign the tarballs? If so, configure explicit signature checking with upstream-signatures = on.
  3. Does upstream have a git repository, and does it have release git tags? If so, configure the release git tag format, e.g. upstream-vcs-tag = %(version%~%.)s.

To validate that the above files are working correctly, run gbp import-orig with the current version explicitly defined:

shell
$ gbp import-orig --uscan --upstream-version 5.6
gbp:info: Launching uscan...
gpgv: Signature made 7. Aug 2024 07.43.27 PDT
gpgv: using RSA key 519151D83E83D40A232B4D615C418B8631BC7C26
gpgv: Good signature from "Eric Radman <ericshane@eradman.com>"
gbp:info: Using uscan downloaded tarball ../entr_5.6.orig.tar.gz
gbp:info: Importing '../entr_5.6.orig.tar.gz' to branch 'upstream/latest'...
gbp:info: Source package is entr
gbp:info: Upstream version is 5.6
gbp:info: Replacing upstream source on 'debian/latest'
gbp:info: Running Postimport hook
gbp:info: Successfully imported version 5.6 of ../entr_5.6.orig.tar.gz

As the original packaging was done based on the upstream release git tag, the above command will fetch the tarball release, create the pristine-tar branch, and store the tarball delta on it. This command will also attempt to create the tag upstream/5.6 on the upstream/latest branch.

Import new upstream versions in the future

Forking the upstream git repository, creating the initial packaging, and creating the DEP-14 branch structure are all one-off work needed only when creating the initial packaging.

Going forward, to import new upstream releases, one would simply run git fetch upstreamvcs; gbp import-orig --uscan, which fetches the upstream git tags, checks for new upstream tarballs, and automatically downloads, verifies, and imports the new version. See the galera-4-demo example in the Debian source packages in git explained post as a demo you can try running yourself and examine in detail.

You can also try running gbp import-orig --uscan without specifying a version. It would fetch it, as it will notice there is now Entr version 5.7 available, and import it.

Build using git-buildpackage

From this stage onwards you should build the package using gbp buildpackage, which will do a more comprehensive build.

shell
gbp buildpackage -uc -us

The git-buildpackage build also includes running Lintian to find potential Debian policy violations in the sources or in the resulting .deb binary packages. Many Debian Developers run lintian -EviIL +pedantic after every build to check that there are no new nags, and to validate that changes intended to previous Lintian nags were correct.

Open a Merge Request on Salsa for Debian packaging review

Getting everything perfectly right takes a lot of effort, and may require reaching out to an experienced Debian Developers for review and guidance. Thus, you should aim to publish your initial packaging work on Salsa, Debian’s GitLab instance, for review and feedback as early as possible.

For somebody to be able to easily see what you have done, you should rename your debian/latest branch to another name, for example next/debian/latest, and open a Merge Request that targets the debian/latest branch on your Salsa fork, which still has only the unmodified upstream files.

If you have followed the workflow in this post so far, you can simply run:

  1. git checkout -b next/debian/latest
  2. git push --set-upstream origin next/debian/latest
  3. Open in a browser the URL visible in the git remote response
  4. Write the Merge Request description in case the default text from your commit is not enough
  5. Mark the MR as “Draft” using the checkbox
  6. Publish the MR and request feedback

Once a Merge Request exists, discussion regarding what additional changes are needed can be conducted as MR comments. With an MR, you can easily iterate on the contents of next/debian/latest, rebase, force push, and request re-review as many times as you want.

While at it, make sure the Settings > CI/CD page has under CI/CD configuration file the value debian/salsa-ci.yml so that the CI can run and give you immediate automated feedback.

For an example of an initial packaging Merge Request, see https://salsa.debian.org/otto/entr-demo/-/merge_requests/1.

Open a Merge Request / Pull Request to fix upstream code

Due to the high quality requirements in Debian, it is fairly common that while doing the initial Debian packaging of an open source project, issues are found that stem from the upstream source code. While it is possible to carry extra patches in Debian, it is not good practice to deviate too much from upstream code with custom Debian patches. Instead, the Debian packager should try to get the fixes applied directly upstream.

Using git-buildpackage patch queues is the most convenient way to make modifications to the upstream source code so that they automatically convert into Debian patches (stored at debian/patches), and can also easily be submitted upstream as any regular git commit (and rebased and resubmitted many times over).

First, decide if you want to work out of the upstream development branch and later cherry-pick to the Debian packaging branch, or work out of the Debian packaging branch and cherry-pick to an upstream branch.

The example below starts from the upstream development branch and then cherry-picks the commit into the git-buildpackage patch queue:

shell
git checkout -b bugfix-branch master
nano entr.c
make
./entr # verify change works as expected
git commit -a -m "Commit title" -m "Commit body"
git push # submit upstream
gbp pq import --force --time-machine=10
git cherry-pick <commit id>
git commit --amend # extend commit message with DEP-3 metadata
gbp buildpackage -uc -us -b
./entr # verify change works as expected
gbp pq export --drop --commit
git commit --amend # Write commit message along lines "Add patch to .."

The example below starts by making the fix on a git-buildpackage patch queue branch, and then cherry-picking it onto the upstream development branch:

shell
gbp pq import --force --time-machine=10
nano entr.c
git commit -a -m "Commit title" -m "Commit body"
gbp buildpackage -uc -us -b
./entr # verify change works as expected
gbp pq export --drop --commit
git commit --amend # Write commit message along lines "Add patch to .."
git checkout -b bugfix-branch master
git cherry-pick <commit id>
git commit --amend # prepare commit message for upstream submission
git push # submit upstream

The key git-buildpackage commands to enter and exit the patch-queue mode are:

shell
gbp pq import --force --time-machine=10
gbp pq export --drop --commit
%%{init: { 'gitGraph': { 'mainBranchName': 'debian/latest' } } }%%
gitGraph
checkout debian/latest
commit id: "Initial packaging"
branch patch-queue/debian/latest
checkout patch-queue/debian/latest
commit id: "Delete debian/patches/..."
commit id: "Patch 1 title"
commit id: "Patch 2 title"
commit id: "Patch 3 title"

These can be run at any time, regardless if any debian/patches existed prior, or if existing patches applied cleanly or not, or if there were old patch queue branches around. Note that the extra -b in gbp buildpackage -uc -us -b instructs to build only binary packages, avoiding any nags from dpkg-source that there are modifications in the upstream sources while building in the patches-applied mode.

Programming-language specific dh-make alternatives

As each programming language has its specific way of building the source code, and many other conventions regarding the file layout and more, Debian has multiple custom tools to create new Debian source packages for specific programming languages.

Notably, Python does not have its own tool, but there is an dh_make --python option for Python support directly in dh_make itself. The list is not complete and many more tools exist. For some languages, there are even competing options, such as for Go there is in addition to dh-make-golang also Gophian.

When learning Debian packaging, there is no need to learn these tools upfront. Being aware that they exist is enough, and one can learn them only if and when one starts to package a project in a new programming language.

The difference between source git repository vs source packages vs binary packages

As seen in earlier example, running gbp buildpackage on the Entr packaging repository above will result in several files:

entr_5.6-1_amd64.changes
entr_5.6-1_amd64.deb
entr_5.6-1.debian.tar.xz
entr_5.6-1.dsc
entr_5.6.orig.tar.gz
entr_5.6.orig.tar.gz.asc

The entr_5.6-1_amd64.deb is the binary package, which can be installed on a Debian/Ubuntu system. The rest of the files constitute the source package. To do a source-only build, run gbp buildpackage -S and note the files produced:

entr_5.6-1_source.changes
entr_5.6-1.debian.tar.xz
entr_5.6-1.dsc
entr_5.6.orig.tar.gz
entr_5.6.orig.tar.gz.asc

The source package files can be used to build the binary .deb for amd64, or any architecture that the package supports. It is important to grasp that the Debian source package is the preferred form to be able to build the binary packages on various Debian build systems, and the Debian source package is not the same thing as the Debian packaging git repository contents.

flowchart LR
git[Git repository<br>branch debian/latest] -->|gbp buildpackage -S| src[Source Package<br>.dsc + .tar.xz]
src -->|dpkg-buildpackage| bin[Binary Packages<br>.deb]

If the package is large and complex, the build could result in multiple binary packages. One set of package definition files in debian/ will however only ever result in a single source package.

Option to repackage source packages with Files-Excluded lists in the debian/copyright file

Some upstream projects may include binary files in their release, or other undesirable content that needs to be omitted from the source package in Debian. The easiest way to filter them out is by adding to the debian/copyright file a Files-Excluded field listing the undesired files. The debian/copyright file is read by uscan, which will repackage the upstream sources on-the-fly when importing new upstream releases.

For a real-life example, see the debian/copyright files in the Godot package that lists:

debian
Files-Excluded: platform/android/java/gradle/wrapper/gradle-wrapper.jar

The resulting repackaged upstream source tarball, as well as the upstream version component, will have an extra +ds to signify that it is not the true original upstream source but has been modified by Debian:

godot_4.3+ds.orig.tar.xz
godot_4.3+ds-1_amd64.deb

Creating one Debian source package from multiple upstream source packages also possible

In some rare cases the upstream project may be split across multiple git repositories or the upstream release may consist of multiple components each in their own separate tarball. Usually these are very large projects that get some benefits from releasing components separately. If in Debian these are deemed to go into a single source package, it is technically possible using the component system in git-buildpackage and uscan. For an example see the gbp.conf and watch files in the node-cacache package.

Using this type of structure should be a last resort, as it creates complexity and inter-dependencies that are bound to cause issues later on. It is usually better to work with upstream and champion universal best practices with clear releases and version schemes.

When not to start the Debian packaging repository as a fork of the upstream one

Not all upstreams use Git for version control. It is by far the most popular, but there are still some that use e.g. Subversion or Mercurial. Who knows — maybe in the future some new version control systems will start to compete with Git. There are also projects that use Git in massive monorepos and with complex submodule setups that invalidate the basic assumptions required to map an upstream Git repository into a Debian packaging repository.

In those cases one can’t use a debian/latest branch on a clone of the upstream git repository as the starting point for the Debian packaging, but one must revert the traditional way of starting from an upstream release tarball with gbp import-orig package-1.0.tar.gz.

Conclusion

Created in August 1993, Debian is one of the oldest Linux distributions. In the 32 years since inception, the .deb packaging format and the tooling to work with it have evolved several generations. In the past 10 years, more and more Debian Developers have converged on certain core practices evidenced by https://trends.debian.net/, but there is still a lot of variance in workflows even for identical tasks. Hopefully, you find this post useful in giving practical guidance on how exactly to do the most common things when packaging software for Debian.

Happy packaging!

26 May, 2025 12:00AM

May 25, 2025

Iustin Pop

Corydalis v2025.21.0 - new features!

I just released yesterday a new version of Corydalis (https://demo.corydalis.io, https://github.com/iustin/corydalis). To me personally, it’s a major improvement, since the native (my own) image viewer finally gets zooming, panning, gesture handling, etc. This is table-stakes for an image viewer, but oh well, it took me a long time to implement it, because of multiple things: lack of time, the JS library I was using for gestures was pretty old and unmaintained and it caused more trouble than was helping, etc.

The feature is not perfect, and on the demo site there’s already a bug since all images are smaller than the screen, and this I didn’t test 😅, so double-click to zoom doesn’t work: says “Already at minimum zoom”, but zooming otherwise (+/- on the keyboard, mouse wheel, gesture) works.

End-to-end, the major development for this release was done over around two weeks, which is pretty short: I extensively used Claude Sonnet and Grok to unblock myself. Not to write code per se - although there is code written 1:1 by LLMs, but most of the code is weirdly wrong, and I have to either correct it or just use it as a starter and rewrite most of it. But to discuss and unblock, and learn about new things, the current LLMs are very good at.

And yet, sometimes even what they’re good at, fails hard. I asked for ideas to simplify a piece of code, and it went nowhere, even if there were significant rewrite possibilities. I spent the brain cycles on it, reverse engineered my own code, then simplified. I’ll have to write a separate blog post on this…

In any case, this (zooming) was the last major feature I was missing. There are image viewer libraries, but most of them slow, compared to the bare-bones (well, now not so much anymore) viewer that I use as main viewer. From now on, it will me minor incremental features, mostly around Exif management/handling, etc. Or, well, internal cleanups: extend test coverage, remove use of JQuery in the frontend, etc., there are tons of things to do.

Fun fact: I managed to discover a Safari iOS bug. Or at least I think it’s a bug, so reported it and curious what’ll come out of it.

Finally, I still couldn’t fix the GitHub actions bug where the git describe doesn’t see the just pushed tag, sigh, so the demo site still lists Corydalis v2024.12.0-133-g00edf63 as the version 😅.

25 May, 2025 02:37PM

hackergotchi for Otto Kekäläinen

Otto Kekäläinen

New Debian package creation from upstream git repository

Featured image of post New Debian package creation from upstream git repository

In this post, I demonstrate the optimal workflow for creating new Debian packages in 2025, preserving the upstream git history. The motivation for this is to lower the barrier for sharing improvements to and from upstream, and to improve software provenance and supply-chain security by making it easy to inspect every change at any level using standard git tooling.

Key elements of this workflow include:

  • Using a Git fork/clone of the upstream repository as the starting point for creating Debian packaging repositories.
  • Consistent use of the same git-buildpackage commands, with all package-specific options in gbp.conf.
  • DEP-14 tag and branch names for an optimal Git packaging repository structure.
  • Pristine-tar and upstream signatures for supply-chain security.
  • Use of Files-Excluded in the debian/copyright file to filter out unwanted files in Debian.
  • Patch queues to easily rebase and cherry-pick changes across Debian and upstream branches.
  • Efficient use of Salsa, Debian’s GitLab instance, for both automated feedback from CI systems and human feedback from peer reviews.

To make the instructions so concrete that anyone can repeat all the steps themselves on a real package, I demonstrate the steps by packaging the command-line tool Entr. It is written in C, has very few dependencies, and its final Debian source package structure is simple, yet exemplifies all the important parts that go into a complete Debian package:

  1. Creating a new packaging repository and publishing it under your personal namespace on salsa.debian.org.
  2. Using dh_make to create the initial Debian packaging.
  3. Posting the first draft of the Debian packaging as a Merge Request (MR) and using Salsa CI to verify Debian packaging quality.
  4. Running local builds efficiently and iterating on the packaging process.

Create new Debian packaging repository from the existing upstream project git repository

First, create a new empty directory, then clone the upstream Git repository inside it:

shell
mkdir debian-entr
cd debian-entr
git clone --origin upstreamvcs --branch master \
 --single-branch https://github.com/eradman/entr.git

Using a clean directory makes it easier to inspect the build artifacts of a Debian package, which will be output in the parent directory of the Debian source directory.

The extra parameters given to git clone lay the foundation for the Debian packaging git repository structure where the upstream git remote name is upstreamvcs. Only the upstream main branch is tracked to avoid cluttering git history with upstream development branches that are irrelevant for packaging in Debian.

Next, enter the git repository directory and list the git tags. Pick the latest upstream release tag as the commit to start the branch upstream/latest. This latest refers to the upstream release, not the upstream development branch. Immediately after, branch off the debian/latest branch, which will have the actual Debian packaging files in the debian/ subdirectory.

shell
cd entr
git tag # shows the latest upstream release tag was '5.6'
git checkout -b upstream/latest 5.6
git checkout -b debian/latest
%%{init: { 'gitGraph': { 'mainBranchName': 'master' } } }%%
gitGraph:
checkout master
commit id: "Upstream 5.6 release" tag: "5.6"
branch upstream/latest
checkout upstream/latest
commit id: "New upstream version 5.6" tag: "upstream/5.6"
branch debian/latest
checkout debian/latest
commit id: "Initial Debian packaging"
commit id: "Additional change 1"
commit id: "Additional change 2"
commit id: "Additional change 3"

At this point, the repository is structured according to DEP-14 conventions, ensuring a clear separation between upstream and Debian packaging changes, but there are no Debian changes yet. Next, add the Salsa repository as a new remote which called origin, the same as the default remote name in git.

shell
git remote add origin git@salsa.debian.org:otto/entr-demo.git
git push --set-upstream origin debian/latest

This is an important preparation step to later be able to create a Merge Request on Salsa that targets the debian/latest branch, which does not yet have any debian/ directory.

Launch a Debian Sid (unstable) container to run builds in

To ensure that all packaging tools are of the latest versions, run everything inside a fresh Sid container. This has two benefits: you are guaranteed to have the most up-to-date toolchain, and your host system stays clean without getting polluted by various extra packages. Additionally, this approach works even if your host system is not Debian/Ubuntu.

shell
cd ..
podman run --interactive --tty --rm --shm-size=1G --cap-add SYS_PTRACE \
 --env='DEB*' --volume=$PWD:/tmp/test --workdir=/tmp/test debian:sid bash

Note that the container should be started from the parent directory of the git repository, not inside it. The --volume parameter will loop-mount the current directory inside the container. Thus all files created and modified are on the host system, and will persist after the container shuts down.

Once inside the container, install the basic dependencies:

shell
apt update -q && apt install -q --yes git-buildpackage dpkg-dev dh-make

Automate creating the debian/ files with dh-make

To create the files needed for the actual Debian packaging, use dh_make:

shell
# dh_make --packagename entr_5.6 --single --createorig
Maintainer Name : Otto Kekäläinen
Email-Address : otto@debian.org
Date : Sat, 15 Feb 2025 01:17:51 +0000
Package Name : entr
Version : 5.6
License : blank
Package Type : single
Are the details correct? [Y/n/q]

Done. Please edit the files in the debian/ subdirectory now.

Due to how dh_make works, the package name and version need to be written as a single underscore separated string. In this case, you should choose --single to specify that the package type is a single binary package. Other options would be --library for library packages (see libgda5 sources as an example) or --indep (see dns-root-data sources as an example). The --createorig will create a mock upstream release tarball (entr_5.6.orig.tar.xz) from the current release directory, which is necessary due to historical reasons and how dh_make worked before git repositories became common and Debian source packages were based off upstream release tarballs (e.g. *.tar.gz).

At this stage, a debian/ directory has been created with template files, and you can start modifying the files and iterating towards actual working packaging.

shell
git add debian/
git commit -a -m "Initial Debian packaging"

Review the files

The full list of files after the above steps with dh_make would be:

|-- entr
| |-- LICENSE
| |-- Makefile.bsd
| |-- Makefile.linux
| |-- Makefile.linux-compat
| |-- Makefile.macos
| |-- NEWS
| |-- README.md
| |-- configure
| |-- data.h
| |-- debian
| | |-- README.Debian
| | |-- README.source
| | |-- changelog
| | |-- control
| | |-- copyright
| | |-- gbp.conf
| | |-- entr-docs.docs
| | |-- entr.cron.d.ex
| | |-- entr.doc-base.ex
| | |-- manpage.1.ex
| | |-- manpage.md.ex
| | |-- manpage.sgml.ex
| | |-- manpage.xml.ex
| | |-- postinst.ex
| | |-- postrm.ex
| | |-- preinst.ex
| | |-- prerm.ex
| | |-- rules
| | |-- salsa-ci.yml.ex
| | |-- source
| | | `-- format
| | |-- upstream
| | | `-- metadata.ex
| | `-- watch.ex
| |-- entr.1
| |-- entr.c
| |-- missing
| | |-- compat.h
| | |-- kqueue_inotify.c
| | |-- strlcpy.c
| | `-- sys
| | `-- event.h
| |-- status.c
| |-- status.h
| `-- system_test.sh
`-- entr_5.6.orig.tar.xz

You can browse these files in the demo repository.

The mandatory files in the debian/ directory are:

  • changelog,
  • control,
  • copyright,
  • and rules.

All the other files have been created for convenience so the packager has template files to work from. The files with the suffix .ex are example files that won’t have any effect until their content is adjusted and the suffix removed.

For detailed explanations of the purpose of each file in the debian/ subdirectory, see the following resources:

  • The Debian Policy Manual: Describes the structure of the operating system, the package archive and requirements for packages to be included in the Debian archive.
  • The Developer’s Reference: A collection of best practices and process descriptions Debian packagers are expected to follow while interacting with one another.
  • Debhelper man pages: Detailed information of how the Debian package build system works, and how the contents of the various files in ‘debian/’ affect the end result.

As Entr, the package used in this example, is a real package that already exists in the Debian archive, you may want to browse the actual Debian packaging source at https://salsa.debian.org/debian/entr/-/tree/debian/latest/debian for reference.

Most of these files have standardized formatting conventions to make collaboration easier. To automatically format the files following the most popular conventions, simply run wrap-and-sort -vast or debputy reformat --style=black.

Identify build dependencies

The most common reason for builds to fail is missing dependencies. The easiest way to identify which Debian package ships the required dependency is using apt-file. If, for example, a build fails complaining that pcre2posix.h cannot be found or that libcre2-posix.so is missing, you can use these commands:

shell
$ apt install -q --yes apt-file && apt-file update
$ apt-file search pcre2posix.h
libpcre2-dev: /usr/include/pcre2posix.h
$ apt-file search libpcre2-posix.so
libpcre2-dev: /usr/lib/x86_64-linux-gnu/libpcre2-posix.so
libpcre2-posix3: /usr/lib/x86_64-linux-gnu/libpcre2-posix.so.3
libpcre2-posix3: /usr/lib/x86_64-linux-gnu/libpcre2-posix.so.3.0.6

The output above implies that the debian/control should be extended to define a Build-Depends: libpcre2-dev relationship.

There is also dpkg-depcheck that uses strace to trace the files the build process tries to access, and lists what Debian packages those files belong to. Example usage:

shell
dpkg-depcheck -b debian/rules build

Build the Debian sources to generate the .deb package

After the first pass of refining the contents of the files in debian/, test the build by running dpkg-buildpackage inside the container:

shell
dpkg-buildpackage -uc -us -b

The options -uc -us will skip signing the resulting Debian source package and other build artifacts. The -b option will skip creating a source package and only build the (binary) *.deb packages.

The output is very verbose and gives a large amount of context about what is happening during the build to make debugging build failures easier. In the build log of entr you will see for example the line dh binary --buildsystem=makefile. This and other dh commands can also be run manually if there is a need to quickly repeat only a part of the build while debugging build failures.

To see what files were generated or modified by the build simply run git status --ignored:

shell
$ git status --ignored
On branch debian/latest

Untracked files:
 (use "git add <file>..." to include in what will be committed)
 debian/debhelper-build-stamp
 debian/entr.debhelper.log
 debian/entr.substvars
 debian/files

Ignored files:
 (use "git add -f <file>..." to include in what will be committed)
 Makefile
 compat.c
 compat.o
 debian/.debhelper/
 debian/entr/
 entr
 entr.o
 status.o

Re-running dpkg-buildpackage will include running the command dh clean, which assuming it is configured correctly in the debian/rules file will reset the source directory to the original pristine state. The same can of course also be done with regular git commands git reset --hard; git clean -fdx. To avoid accidentally committing unnecessary build artifacts in git, a debian/.gitignore can be useful and it would typically include all four files listed as “untracked” above.

After a successful build you would have the following files:

shell
|-- entr
| |-- LICENSE
| |-- Makefile -> Makefile.linux
| |-- Makefile.bsd
| |-- Makefile.linux
| |-- Makefile.linux-compat
| |-- Makefile.macos
| |-- NEWS
| |-- README.md
| |-- compat.c
| |-- compat.o
| |-- configure
| |-- data.h
| |-- debian
| | |-- README.source.md
| | |-- changelog
| | |-- control
| | |-- copyright
| | |-- debhelper-build-stamp
| | |-- docs
| | |-- entr
| | | |-- DEBIAN
| | | | |-- control
| | | | `-- md5sums
| | | `-- usr
| | | |-- bin
| | | | `-- entr
| | | `-- share
| | | |-- doc
| | | | `-- entr
| | | | |-- NEWS.gz
| | | | |-- README.md
| | | | |-- changelog.Debian.gz
| | | | `-- copyright
| | | `-- man
| | | `-- man1
| | | `-- entr.1.gz
| | |-- entr.debhelper.log
| | |-- entr.substvars
| | |-- files
| | |-- gbp.conf
| | |-- patches
| | | |-- PR149-expand-aliases-in-system-test-script.patch
| | | |-- series
| | | |-- system-test-skip-no-tty.patch
| | | `-- system-test-with-system-binary.patch
| | |-- rules
| | |-- salsa-ci.yml
| | |-- source
| | | `-- format
| | |-- tests
| | | `-- control
| | |-- upstream
| | | |-- metadata
| | | `-- signing-key.asc
| | `-- watch
| |-- entr
| |-- entr.1
| |-- entr.c
| |-- entr.o
| |-- missing
| | |-- compat.h
| | |-- kqueue_inotify.c
| | |-- strlcpy.c
| | `-- sys
| | `-- event.h
| |-- status.c
| |-- status.h
| |-- status.o
| `-- system_test.sh
|-- entr-dbgsym_5.6-1_amd64.deb
|-- entr_5.6-1.debian.tar.xz
|-- entr_5.6-1.dsc
|-- entr_5.6-1_amd64.buildinfo
|-- entr_5.6-1_amd64.changes
|-- entr_5.6-1_amd64.deb
`-- entr_5.6.orig.tar.xz

The contents of debian/entr are essentially what goes into the resulting entr_5.6-1_amd64.deb package. Familiarizing yourself with the majority of the files in the original upstream source as well as all the resulting build artifacts is time consuming, but it is a necessary investment to get high-quality Debian packages.

There are also tools such as Debcraft that automate generating the build artifacts in separate output directories for each build, thus making it easy to compare the changes to correlate what change in the Debian packaging led to what change in the resulting build artifacts.

Re-run the initial import with git-buildpackage

When upstreams publish releases as tarballs, they should also be imported for optimal software supply-chain security, in particular if upstream also publishes cryptographic signatures that can be used to verify the authenticity of the tarballs.

To achieve this, the files debian/watch, debian/upstream/signing-key.asc, and debian/gbp.conf need to be present with the correct options. In the gbp.conf file, ensure you have the correct options based on:

  1. Does upstream release tarballs? If so, enforce pristine-tar = True.
  2. Does upstream sign the tarballs? If so, configure explicit signature checking with upstream-signatures = on.
  3. Does upstream have a git repository, and does it have release git tags? If so, configure the release git tag format, e.g. upstream-vcs-tag = %(version%~%.)s.

To validate that the above files are working correctly, run gbp import-orig with the current version explicitly defined:

shell
$ gbp import-orig --uscan --upstream-version 5.6
gbp:info: Launching uscan...
gpgv: Signature made 7. Aug 2024 07.43.27 PDT
gpgv: using RSA key 519151D83E83D40A232B4D615C418B8631BC7C26
gpgv: Good signature from "Eric Radman <ericshane@eradman.com>"
gbp:info: Using uscan downloaded tarball ../entr_5.6.orig.tar.gz
gbp:info: Importing '../entr_5.6.orig.tar.gz' to branch 'upstream/latest'...
gbp:info: Source package is entr
gbp:info: Upstream version is 5.6
gbp:info: Replacing upstream source on 'debian/latest'
gbp:info: Running Postimport hook
gbp:info: Successfully imported version 5.6 of ../entr_5.6.orig.tar.gz

As the original packaging was done based on the upstream release git tag, the above command will fetch the tarball release, create the pristine-tar branch, and store the tarball delta on it. This command will also attempt to create the tag upstream/5.6 on the upstream/latest branch.

Import new upstream versions in the future

Forking the upstream git repository, creating the initial packaging, and creating the DEP-14 branch structure are all one-off work needed only when creating the initial packaging.

Going forward, to import new upstream releases, one would simply run git fetch upstreamvcs; gbp import-orig --uscan, which fetches the upstream git tags, checks for new upstream tarballs, and automatically downloads, verifies, and imports the new version. See the galera-4-demo example in the Debian source packages in git explained post as a demo you can try running yourself and examine in detail.

You can also try running gbp import-orig --uscan without specifying a version. It would fetch it, as it will notice there is now Entr version 5.7 available, and import it.

Build using git-buildpackage

From this stage onwards you should build the package using gbp buildpackage, which will do a more comprehensive build.

shell
gbp buildpackage -uc -us

The git-buildpackage build also includes running Lintian to find potential Debian policy violations in the sources or in the resulting .deb binary packages. Many Debian Developers run lintian -EviIL +pedantic after every build to check that there are no new nags, and to validate that changes intended to previous Lintian nags were correct.

Open a Merge Request on Salsa for Debian packaging review

Getting everything perfectly right takes a lot of effort, and may require reaching out to an experienced Debian Developers for review and guidance. Thus, you should aim to publish your initial packaging work on Salsa, Debian’s GitLab instance, for review and feedback as early as possible.

For somebody to be able to easily see what you have done, you should rename your debian/latest branch to another name, for example next/debian/latest, and open a Merge Request that targets the debian/latest branch on your Salsa fork, which still has only the unmodified upstream files.

If you have followed the workflow in this post so far, you can simply run:

  1. git checkout -b next/debian/latest
  2. git push --set-upstream origin next/debian/latest
  3. Open in a browser the URL visible in the git remote response
  4. Write the Merge Request description in case the default text from your commit is not enough
  5. Mark the MR as “Draft” using the checkbox
  6. Publish the MR and request feedback

Once a Merge Request exists, discussion regarding what additional changes are needed can be conducted as MR comments. With an MR, you can easily iterate on the contents of next/debian/latest, rebase, force push, and request re-review as many times as you want.

While at it, make sure the Settings > CI/CD page has under CI/CD configuration file the value debian/salsa-ci.yml so that the CI can run and give you immediate automated feedback.

For an example of an initial packaging Merge Request, see https://salsa.debian.org/otto/entr-demo/-/merge_requests/1.

Open a Merge Request / Pull Request to fix upstream code

Due to the high quality requirements in Debian, it is fairly common that while doing the initial Debian packaging of an open source project, issues are found that stem from the upstream source code. While it is possible to carry extra patches in Debian, it is not good practice to deviate too much from upstream code with custom Debian patches. Instead, the Debian packager should try to get the fixes applied directly upstream.

Using git-buildpackage patch queues is the most convenient way to make modifications to the upstream source code so that they automatically convert into Debian patches (stored at debian/patches), and can also easily be submitted upstream as any regular git commit (and rebased and resubmitted many times over).

First, decide if you want to work out of the upstream development branch and later cherry-pick to the Debian packaging branch, or work out of the Debian packaging branch and cherry-pick to an upstream branch.

The example below starts from the upstream development branch and then cherry-picks the commit into the git-buildpackage patch queue:

shell
git checkout -b bugfix-branch master
nano entr.c
make
./entr # verify change works as expected
git commit -a -m "Commit title" -m "Commit body"
git push # submit upstream
gbp pq import --force --time-machine=10
git cherry-pick <commit id>
git commit --amend # extend commit message with DEP-3 metadata
gbp buildpackage -uc -us -b
./entr # verify change works as expected
gbp pq export --drop --commit
git commit --amend # Write commit message along lines "Add patch to .."

The example below starts by making the fix on a git-buildpackage patch queue branch, and then cherry-picking it onto the upstream development branch:

shell
gbp pq import --force --time-machine=10
nano entr.c
git commit -a -m "Commit title" -m "Commit body"
gbp buildpackage -uc -us -b
./entr # verify change works as expected
gbp pq export --drop --commit
git commit --amend # Write commit message along lines "Add patch to .."
git checkout -b bugfix-branch master
git cherry-pick <commit id>
git commit --amend # prepare commit message for upstream submission
git push # submit upstream

The key git-buildpackage commands to enter and exit the patch-queue mode are:

shell
gbp pq import --force --time-machine=10
gbp pq export --drop --commit
%%{init: { 'gitGraph': { 'mainBranchName': 'debian/latest' } } }%%
gitGraph
checkout debian/latest
commit id: "Initial packaging"
branch patch-queue/debian/latest
checkout patch-queue/debian/latest
commit id: "Delete debian/patches/..."
commit id: "Patch 1 title"
commit id: "Patch 2 title"
commit id: "Patch 3 title"

These can be run at any time, regardless if any debian/patches existed prior, or if existing patches applied cleanly or not, or if there were old patch queue branches around. Note that the extra -b in gbp buildpackage -uc -us -b instructs to build only binary packages, avoiding any nags from dpkg-source that there are modifications in the upstream sources while building in the patches-applied mode.

Programming-language specific dh-make alternatives

As each programming language has its specific way of building the source code, and many other conventions regarding the file layout and more, Debian has multiple custom tools to create new Debian source packages for specific programming languages.

Notably, Python does not have its own tool, but there is an dh_make --python option for Python support directly in dh_make itself. The list is not complete and many more tools exist. For some languages, there are even competing options, such as for Go there is in addition to dh-make-golang also Gophian.

When learning Debian packaging, there is no need to learn these tools upfront. Being aware that they exist is enough, and one can learn them only if and when one starts to package a project in a new programming language.

The difference between source git repository vs source packages vs binary packages

As seen in earlier example, running gbp buildpackage on the Entr packaging repository above will result in several files:

entr_5.6-1_amd64.changes
entr_5.6-1_amd64.deb
entr_5.6-1.debian.tar.xz
entr_5.6-1.dsc
entr_5.6.orig.tar.gz
entr_5.6.orig.tar.gz.asc

The entr_5.6-1_amd64.deb is the binary package, which can be installed on a Debian/Ubuntu system. The rest of the files constitute the source package. To do a source-only build, run gbp buildpackage -S and note the files produced:

entr_5.6-1_source.changes
entr_5.6-1.debian.tar.xz
entr_5.6-1.dsc
entr_5.6.orig.tar.gz
entr_5.6.orig.tar.gz.asc

The source package files can be used to build the binary .deb for amd64, or any architecture that the package supports. It is important to grasp that the Debian source package is the preferred form to be able to build the binary packages on various Debian build systems, and the Debian source package is not the same thing as the Debian packaging git repository contents.

flowchart LR
git[Git repository<br>branch debian/latest] -->|gbp buildpackage -S| src[Source Package<br>.dsc + .tar.xz]
src -->|dpkg-buildpackage| bin[Binary Packages<br>.deb]

If the package is large and complex, the build could result in multiple binary packages. One set of package definition files in debian/ will however only ever result in a single source package.

Option to repackage source packages with Files-Excluded lists in the debian/copyright file

Some upstream projects may include binary files in their release, or other undesirable content that needs to be omitted from the source package in Debian. The easiest way to filter them out is by adding to the debian/copyright file a Files-Excluded field listing the undesired files. The debian/copyright file is read by uscan, which will repackage the upstream sources on-the-fly when importing new upstream releases.

For a real-life example, see the debian/copyright files in the Godot package that lists:

debian
Files-Excluded: platform/android/java/gradle/wrapper/gradle-wrapper.jar

The resulting repackaged upstream source tarball, as well as the upstream version component, will have an extra +ds to signify that it is not the true original upstream source but has been modified by Debian:

godot_4.3+ds.orig.tar.xz
godot_4.3+ds-1_amd64.deb

Creating one Debian source package from multiple upstream source packages also possible

In some rare cases the upstream project may be split across multiple git repositories or the upstream release may consist of multiple components each in their own separate tarball. Usually these are very large projects that get some benefits from releasing components separately. If in Debian these are deemed to go into a single source package, it is technically possible using the component system in git-buildpackage and uscan. For an example see the gbp.conf and watch files in the node-cacache package.

Using this type of structure should be a last resort, as it creates complexity and inter-dependencies that are bound to cause issues later on. It is usually better to work with upstream and champion universal best practices with clear releases and version schemes.

When not to start the Debian packaging repository as a fork of the upstream one

Not all upstreams use Git for version control. It is by far the most popular, but there are still some that use e.g. Subversion or Mercurial. Who knows — maybe in the future some new version control systems will start to compete with Git. There are also projects that use Git in massive monorepos and with complex submodule setups that invalidate the basic assumptions required to map an upstream Git repository into a Debian packaging repository.

In those cases one can’t use a debian/latest branch on a clone of the upstream git repository as the starting point for the Debian packaging, but one must revert the traditional way of starting from an upstream release tarball with gbp import-orig package-1.0.tar.gz.

Conclusion

Created in August 1993, Debian is one of the oldest Linux distributions. In the 32 years since inception, the .deb packaging format and the tooling to work with it have evolved several generations. In the past 10 years, more and more Debian Developers have converged on certain core practices evidenced by https://trends.debian.net/, but there is still a lot of variance in workflows even for identical tasks. Hopefully, you find this post useful in giving practical guidance on how exactly to do the most common things when packaging software for Debian.

Happy packaging!

25 May, 2025 12:00AM

Valhalla's Things

Honeycomb shirt

Posted on May 25, 2025
Tags: madeof:atoms, craft:sewing, FreeSoftWear, GNU Terry Pratchett

A woman wearing a purplish blue shirt with very wide sleeves, gathered at the cuffs and shoulder with honeycombing, and also a rectangle of honeycombing in the front between the neckline and just above the bust. The shirt is gathered at the waist with a wide belt, and an almost lilac towel hangs from the belt.

After cartridge pleating, the next fabric manipulation technique I wanted to try was smocking, of the honeycombing variety, on a shirt.

My current go-to pattern for shirts is the 1880 menswear one I have on my website: I love the fact that most of the fabric is still cut as big rectangles, but the shaped yoke and armscyes make it significantly more comfortable than the earlier style where most of the shaping at the neck was done with gathers into a straight collar.

A woman wearing a shirt in the same fabric; this one has a slit in the front, is gathered into a tall rectangular collar and has dropped shoulders because it's cut from plain rectangles. The sleeves are still huge, and gathered into tall cuffs. It is worn belted (with the same wide white elastic belt used in the previous picture) and the woman is wearing a matching fabric mask, because the picture has been taken in 2021.

In my stash I had a cut of purple-blue hopefully cotton [#cotton] I had bought for a cheap price and used for my first attempt at an historically accurate pirate / vampire shirt that has now become by official summer vaccine jab / blood test shirt (because it has the long sleeves I need, but they are pretty easy to roll up to give access to my arm.

That shirt tends to get out of the washing machine pretty wearable even without ironing, which made me think it could be a good fabric for something that may be somewhat hard to iron (but also made me suspicious about the actual composition of the fabric, even if it feels nice enough even when worn in the summer).

A piece of fabric with many rows of honeycombing laid on top of the collar and yoke of the shirt; a metal snap peeks from behind the piece of honeycombed fabric.  There are still basting lines for the armscyes.

Of course I wanted some honeycombing on the front, but I was afraid that the slit in the middle of it would interfere with the honeycombing and gape, so I decided to have the shirt open in an horizontal line at the yoke.

I added instructions to the pattern page for how I changed the opening in the front, basically it involved finishing the front edge of the yoke, and sewing the honeycombed yoke to a piece of tape with snaps.

Another change from the pattern is that I used plain rectangles for the sleeves, and a square gusset, rather than the new style tapered sleeve , because I wanted to have more fabric to gather at the wrist. I did the side and sleeve seams with a hem + whipstitch method rather than a felled seam, which may have helped, but the sleeves went into the fitted armscyes with no issue.

I think that if (yeah, right. when) I’ll make another sleeve in this style I’ll sew it into the side seam starting 2-3 cm lower than the place I’ve marked on the pattern for the original sleeve.

The back of the unbelted shirt: it has a fitted yoke, and then it is quite wide and unfitted, with the fabric gathered into the yoke with a row of honeycombing and some pleating on top.

I also used a row of honeycombing on the back and two on the upper part of the sleeves, instead of the gathering, and of course some rows to gather the cuffs.

The honeycombing on the back was a bit too far away from the edge, so it’s a bit of an odd combination of honeycombing and pleating that I don’t hate, but don’t love either. It’s on the back, so I don’t mind. On the sleeves I’ve done the honeycombing closer to the edge and I’ve decided to sew the sleeve as if it was a cartridge pleated sleeve, and that worked better.

Because circumstances are still making access to my sewing machine more of a hassle than I’d want it to be, this was completely sewn by hand, and at a bit more than a month I have to admit that near the end it felt like it had been taken forever. I’m not sure whether it was the actual sewing being slow, some interruptions that happened when I had little time to work on it, or the fact that I’ve just gone through a time when my brain kept throwing new projects at me, and I kept thinking of how to make those. Thanks brain.

Even when on a hurry to finish it, however, it was still enjoyable sewing, and I think I’ll want to do more honeycombing in the future.

The same woman with arms wide to show the big sleeves and the shirt unbelted to show that it is pretty wide also from the front, below the yoke and the honeycombing. The back can be seen as about 10 cm longer than the front.

Anyway, it’s done! And it’s going straight into my daily garment rotation, because the weather is getting hot, and that means it’s definitely shirt time.

25 May, 2025 12:00AM

May 24, 2025

hackergotchi for Bits from Debian

Bits from Debian

New Debian Developers and Maintainers (March and April 2025)

The following contributors got their Debian Developer accounts in the last two months:

  • Moritz Schlarb (moschlar)
  • Sérgio de Almeida Cipriano Júnior (cipriano)
  • Mario Anthony Limonciello (superm1)

The following contributor was added as Debian Maintainer in the last two months:

  • Martin-Éric Racine

Congratulations!

24 May, 2025 06:00PM by Jean-Pierre Giraud

hackergotchi for Steinar H. Gunderson

Steinar H. Gunderson

Some demoparty stream firsts

A discussion the other day made me remember some of the demoparty stream “first” that I'm still proud of, most of which still haven't been matched:

  • Live voting user counts during the compo (example, at the bottom). A combination of gamification and deliberate peer pressure; if you see that others are voting, you'll feel compelled to follow their example. (The counter would never go down during a compo, only up, even if people stopped adding new votes. Also deliberate.)
  • Locking the frame rate to the compo machine throughout the entire chain; in practice, this means that oldschool demos would come in 50 Hz and newschool in 60 Hz, creating a fully VFR stream. A pain during switches, and YouTube messes it up and makes the entire file a bit choppy, but so cool. If always wanted it to do the infamous 50.12 Hz from C64, but never really figured it out.
  • And last but not least, the “Eurovision” effect. Don't remember which entry was which during voting? No problem, we wget a 10-second sample (the advantage of having software video mixing for the bigscreen!) during the compo, furiously transcode it to something reasonable on an Arm thingie, and then put a loop of all of them together at the end of the compo. (A glimpse of an example)

Oh, and we streamed the street basketball compo, of course. Through a terrible, terrible video chain that made everything choppy. Would do better if I ever did it again :-)

24 May, 2025 02:08PM

Julian Andres Klode

A SomewhatMaxSAT Solver

As you may recall from previous posts and elsewhere I have been busy writing a new solver for APT. Today I want to share some of the latest changes in how to approach solving.

The idea for the solver was that manually installed packages are always protected from removals – in terms of SAT solving, they are facts. Automatically installed packages become optional unit clauses. Optional clauses are solved after manual ones, they don’t partake in normal unit propagation.

This worked fine, say you had

A                                   # install request for A
B                                   # manually installed, keep it
A depends on: conflicts-B | C

Installing A on a system with B installed installed C, as it was not allowed to install the conflicts-B package since B is installed.

However, I also introduced a mode to allow removing manually installed packages, and that’s where it broke down, now instead of B being a fact, our clauses looked like:

A                               # install request for A
A depends on: conflicts-B | C
Optional: B                     # try to keep B installed

As a result, we installed conflicts-B and removed B; the steps the solver takes are:

  1. A is a fact, mark it
  2. A depends on: conflicts-B | C is the strongest clause, try to install conflicts-B
  3. We unit propagate that conflicts-B conflicts with B, so we mark not B
  4. Optional: B is reached, but not satisfiable, ignore it because it’s optional.

This isn’t correct: Just because we allow removing manually installed packages doesn’t mean that we should remove manually installed packages if we don’t need to.

Fixing this turns out to be surprisingly easy. In addition to adding our optional (soft) clauses, let’s first assume all of them!

But to explain how this works, we first need to explain some terminology:

  1. The solver operates on a stack of decisions
  2. “enqueue” means a fact is being added at the current decision level, and enqueued for propagation
  3. “assume” bumps the decision level, and then enqueues the assumed variable
  4. “propagate” looks at all the facts and sees if any clause becomes unit, and then enqueues it
  5. “unit” is when a clause has a single literal left to assign

To illustrate this in pseudo Python code:

  1. We introduce all our facts, and if they conflict, we are unsat:

    for fact in facts:
        enqueue(fact)
    if not propagate():
        return False
    
  2. For each optional literal, we register a soft clause and assume it. If the assumption fails, we ignore it. If it succeeds, but propagation fails, we undo the assumption.

    for optionalLiteral in optionalLiterals:
        registerClause(SoftClause([optionalLiteral]))
        if assume(optionalLiteral) and not propagate():
            undo()
    
  3. Finally we enter the main solver loop:

    while True:
        if not propagate():
            if not backtrack():
                return False
        elif <all clauses are satisfied>:
            return True
        elif it := find("best unassigned literal satisfying a hard clause"):
            assume(it)
        elif it := find("best literal satisfying a soft clause"):
            assume(it)
    

The key point to note is that the main loop will undo the assumptions in order; so if you assume A,B,C and B is not possible, we will have also undone C. But since C is also enqueued as a soft clause, we will then later find it again:

  1. Assume A: State=[Assume(A)], Clauses=[SoftClause([A])]
  2. Assume B: State=[Assume(A),Assume(B)], Clauses=[SoftClause([A]),SoftClause([B])]
  3. Assume C: State=[Assume(A),Assume(B),Assume(C)], Clauses=[SoftClause([A]),SoftClause([B]),SoftClause([C])]
  4. Solve finds a conflict, backtracks, and sets not C: State=[Assume(A),Assume(B),not(C)]
  5. Solve finds a conflict, backtracks, and sets not B: State=[Assume(A),not(B)] – C is no longer assumed either
  6. Solve, assume C as it satisfies SoftClause([C]) as next best literal: State=[Assume(A),not(B),Assume(C)]
  7. All clauses are satisfied, solution is A, not B, and C.

This is not (correct) MaxSAT, because we actually do not guarantee that we satisfy as many soft clauses as possible. Consider you have the following clauses:

Optional: A
Optional: B
Optional: C
B Conflicts with A
C Conflicts with A

There are two possible results here:

  1. {A} – If we assume A first, we are unable to satisfy B or C.
  2. {B,C} – If we assume either B or C first, A is unsat.

The question to ponder though is whether we actually need a global maximum or whether a local maximum is satisfactory in practice for a dependency solver If you look at it, a naive MaxSAT solver needs to run the SAT solver 2**n times for n soft clauses, whereas our heuristic only needs n runs.

For dependency solving, it seems we do not seem have a strong need for a global maximum: There are various other preferences between our literals, say priorities; and empirically, from evaluating hundreds of regressions without the initial assumptions, I can say that the assumptions do fix those cases and the result is correct.

Further improvements exist, though, and we can look into them if they are needed, such as:

  • Use a better heuristic:

    If we assume 1 clause and solve, and we cause 2 or more clauses to become unsatisfiable, then that clause is a local minimum and can be skipped. This is a more common heuristical MaxSAT solver. This gives us a better local maximum, but not a global one.

    This is more or less what the Smart package manager did, except that in Smart, all packages were optional, and the entire solution was scored. It calculated a basic solution without optimization and then toggled each variable and saw if the score improved.

  • Implement an actual search for a global maximum:

    This involves reading the literature. There are various versions of this, for example:

    1. Find unsatisfiable cores and use those to guide relaxation of clauses.

    2. A bounds-based search, where we translate sum(satisifed clauses) > k into SAT, and then search in one of the following ways:

      1. from 0 upward
      2. from n downward
      3. perform a binary search on [0, k] satisfied clauses.

      Actually we do not even need to calculate sum constraints into CNF, because we can just add a specialized new type of constraint to our code.

24 May, 2025 10:14AM

May 23, 2025

hackergotchi for Yves-Alexis Perez

Yves-Alexis Perez

strongSwan autopkgtests

For a while, the strongSwan Debian package had an autopktest. The initial version was proposed by Christian Ehrhardt in 2016 (presumably especially for downstream use in Ubuntu) and updated in 2019, but since then not much at least in Debian.

With the metapackage dependencies update in 6.0.0-1 I had to tune a bit the tests dependencies so they wouldn't totally fail, and I noticed the amd64 tests were failing since basically the beginning (the other architectures would pass, but because the tests wouldn't actually run at all since they rely on the isolation-machine restriction which is not available there.

So I tried to fix them, and it actually took me quite a while because I wasn't able to run the tests locally easily and the salsa CI doesn't have the isolation-machine restriction either. And some tests would pass and not other.

With some nice help from #debci, and using my newly received X13G5 I set up an autopkgtest VM and started experimenting. The 6.0.0-4 autopkgtests were failing 19 times over 20 steps, but passing one time. So it looked like a race condition, which we narrowed to the fact that starting the daemons (using invoke-rc.d which calls systemctl) is asynchronous. So depending on the load and maybe the machine, the tests would usually fail but sometime pass.

There's no easy way to make the call synchronous, so as a stopgap I added a small sleep 1 command and it fixed it for now. Tada! strongSwan has now passing autopkgtests in unstable (and testing) amd64. It's not entirely satisfying but still.

Next steps would be to add tests for the new daemon using the swanctl inteface, but that'll be for Forky (Trixie+1).

23 May, 2025 02:49PM by Yves-Alexis (corsac@debian.org)

Sven Hoexter

pflogsumm 1.1.6

Mainly relevant for the few who still run their own mail server and use Postfix + pflogsumm.

Few weeks back Jim contacted me that he's going to pick up work on pflogsumm again, and as first step wanted to release 1.1.6 to incorporate patches from the Debian package. That one is now released. Since we're already in the Trixie freeze the package is in experimental, but as usual should be fine to install manually.

Heads Up - Move to /usr/bin

I took that as an opportunity to move pflogsumm from /usr/sbin to /usr/bin! There was not really a good reason to ever have it in sbin. It's neither a system binary, nor statically linked (like in the very old days), or something that really only makes sense to be used as root. Some out there likely have custom scripts which do not rely on an adjusted PATH variable, those scripts require an update.

23 May, 2025 11:52AM

May 22, 2025

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

RcppArmadillo 14.4.3-1 on CRAN: Small Upstream Bug Fix

armadillo image

Armadillo is a powerful and expressive C++ template library for linear algebra and scientific computing. It aims towards a good balance between speed and ease of use, has a syntax deliberately close to Matlab, and is useful for algorithm development directly in C++, or quick conversion of research code into production environments. RcppArmadillo integrates this library with the R environment and language–and is widely used by (currently) 1251 other packages on CRAN, downloaded 39.8 million times (per the partial logs from the cloud mirrors of CRAN), and the CSDA paper (preprint / vignette) by Conrad and myself has been cited 628 times according to Google Scholar.

Conrad released a minor bugfix version yesterday which addresses corner cases with non-finite values in sparse matrices. And despite conference traveling, I managed to wrap this up and ship it to CRAN where it appeared yesterday. The changes since the last CRAN release are summarised below.

Changes in RcppArmadillo version 14.4.3-1 (2025-05-21)

  • Upgraded to Armadillo release 14.4.3 (Filtered Espresso)

    • Fix for several corner cases involving handling of non-finite elements by sparse matrices

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the Rcpp R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

22 May, 2025 01:19PM

Scarlett Gately Moore

KDE Application Snaps 25.04.1 with Major Bug Fix!,Life ( Good news finally!)

Snaps!

I actually released last week 🙂 I haven’t had time to blog, but today is my birthday and taking some time to myself!

This release came with a major bugfix. As it turns out our applications were very crashy on non-KDE platforms including Ubuntu proper. Unfortunately, for years, and I didn’t know. Developers were closing the bug reports as invalid because users couldn’t provide a stacktrace. I have now convinced most developers to assign snap bugs to the Snap platform so I at least get a chance to try and fix them. So with that said, if you tried our snaps in the past and gave up in frustration, please do try them again! I also spent some time cleaning up our snaps to only have current releases in the store, as rumor has it snapcrafters will be responsible for any security issues. With 200+ snaps I maintain, that is a lot of responsibility. We’ll see if I can pull it off.

Life!

My last surgery was a success! I am finally healing and out of a sling for the first time in almost a year. I have also lined up a good amount of web work for next month and hopefully beyond. I have decided to drop the piece work for donations and will only accept per project proposals for open source work. I will continue to maintain KDE snaps for as long as time allows. A big thank you to everyone that has donated over the last year to fund my survival during this broken arm fiasco. I truly appreciate it!

With that said,  if you want to drop me a donation for my work, birthday or well-being until I get paid for the aforementioned web work please do so here:

22 May, 2025 12:49PM by sgmoore

May 21, 2025

hackergotchi for Bits from Debian

Bits from Debian

EDF Platinum Sponsor of DebConf25

edf-logo

We are pleased to announce that EDF has committed to sponsor DebConf25 as a Platinum Sponsor.

EDF is a leading global utility company focused on low-carbon power generation. The group uses advanced engineering and scientific computing tools to drive innovation and efficiency in its operations, especially in nuclear power plant design and safety assessment.

Since 2003, the EDF Group has been using Debian as its main scientific computing environment. Debian's focus on stability and reproducibility ensures that EDF's calculations and simulations produce consistent and accurate results.

With this commitment as Platinum Sponsor, EDF is contributing to the annual Debian Developers' Conference, directly supporting the progress of Debian and Free Software. EDF contributes to strengthening the worldwide community that collaborates on Debian projects year-round.

Thank you very much, EDF, for your support of DebConf25!

Become a sponsor too!

DebConf25 will take place from 14th to July 19th 2025 in Brest, France, and will be preceded by DebCamp, from 7th to 13th July 2025.

DebConf25 is accepting sponsors! Interested companies and organizations may contact the DebConf team through sponsors@debconf.org, and visit the DebConf25 website at https://debconf25.debconf.org/sponsors/become-a-sponsor/.

21 May, 2025 12:50AM by Sahil Dhiman

May 20, 2025

Arturo Borrero González

Wikimedia Cloud VPS: IPv6 support

Cape Town (ZA), Sea Point, Nachtansicht

Dietmar Rabich, Cape Town (ZA), Sea Point, Nachtansicht — 2024 — 1867-70 – 2, CC BY-SA 4.0

This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez.

Wikimedia Cloud VPS is a service offered by the Wikimedia Foundation, built using OpenStack and managed by the Wikimedia Cloud Services team. It provides cloud computing resources for projects related to the Wikimedia movement, including virtual machines, databases, storage, Kubernetes, and DNS.

A few weeks ago, in April 2025, we were finally able to introduce IPv6 to the cloud virtual network, enhancing the platform’s scalability, security, and future-readiness. This is a major milestone, many years in the making, and serves as an excellent point to take a moment to reflect on the road that got us here. There were definitely a number of challenges that needed to be addressed before we could get into IPv6. This post covers the journey to this implementation.

The Wikimedia Foundation was an early adopter of the OpenStack technology, and the original OpenStack deployment in the organization dates back to 2011. At that time, IPv6 support was still nascent and had limited implementation across various OpenStack components. In 2012, the Wikimedia cloud users formally requested IPv6 support.

When Cloud VPS was originally deployed, we had set up the network following some of the upstream-recommended patterns:

  • nova-networks as the engine in charge of the software-defined virtual network
  • using a flat network topology – all virtual machines would share the same network
  • using a physical VLAN in the datacenter
  • using Linux bridges to make this physical datacenter VLAN available to virtual machines
  • using a single virtual router as the edge network gateway, also executing a global egress NAT – barring some exceptions, using what was called “dmz_cidr” mechanism

In order for us to be able to implement IPv6 in a way that aligned with our architectural goals and operational requirements, pretty much all the elements in this list would need to change. First of all, we needed to migrate from nova-networks into Neutron, a migration effort that started in 2017. Neutron was the more modern component to implement software-defined networks in OpenStack. To facilitate this transition, we made the strategic decision to backport certain functionalities from nova-networks into Neutron, specifically the “dmz_cidr” mechanism and some egress NAT capabilities.

Once in Neutron, we started to think about IPv6. In 2018 there was an initial attempt to decide on the network CIDR allocations that Wikimedia Cloud Services would have. This initiative encountered unforeseen challenges and was subsequently put on hold. We focused on removing the previously backported nova-networks patches from Neutron.

Between 2020 and 2021, we initiated another significant network refresh. We were able to introduce the cloudgw project, as part of a larger effort to rework the Cloud VPS edge network. The new edge routers allowed us to drop all the custom backported patches we had in Neutron from the nova-networks era, unblocking further progress. Worth mentioning that the cloudgw router would use nftables as firewalling and NAT engine.

A pivotal decision in 2022 was to expose the OpenStack APIs to the internet, which crucially enabled infrastructure management via OpenTofu. This was key in the IPv6 rollout as will be explained later. Before this, management was limited to Horizon – the OpenStack graphical interface – or the command-line interface accessible only from internal control servers.

Later, in 2023, following the OpenStack project’s announcement of the deprecation of the neutron-linuxbridge-agent, we began to seriously consider migrating to the neutron-openvswitch-agent. This transition would, in turn, simplify the enablement of “tenant networks” – a feature allowing each OpenStack project to define its own isolated network, rather than all virtual machines sharing a single flat network.

Once we replaced neutron-linuxbridge-agent with neutron-openvswitch-agent, we were ready to migrate virtual machines to VXLAN. Demonstrating perseverance, we decided to execute the VXLAN migration in conjunction with the IPv6 rollout.

We prepared and tested several things, including the rework of the edge routing to be based on BGP/OSPF instead of static routing. In 2024 we were ready for the initial attempt to deploy IPv6, which failed for unknown reasons. There was a full network outage and we immediately reverted the changes. This quick rollback was feasible due to our adoption of OpenTofu: deploying IPv6 had been reduced to a single code change within our repository.

We started an investigation, corrected a few issues, and increased our network functional testing coverage before trying again. One of the problems we discovered was that Neutron would enable the “enable_snat” configuration flag for our main router when adding the new external IPv6 address.

Finally, in April 2025, after many years in the making, IPv6 was successfully deployed.

Compared to the network from 2011, we would have:

  • Neutron as the engine in charge of the software-defined virtual network
  • Ready to use tenant-networks
  • Using a VXLAN-based overlay network
  • Using neutron-openvswitch-agent to provide networking to virtual machines
  • A modern and robust edge network setup

Over time, the WMCS team has skillfully navigated numerous challenges to ensure our service offerings consistently meet high standards of quality and operational efficiency. Often engaging in multi-year planning strategies, we have enabled ourselves to set and achieve significant milestones.

The successful IPv6 deployment stands as further testament to the team’s dedication and hard work over the years. I believe we can confidently say that the 2025 Cloud VPS represents its most advanced and capable iteration to date.

This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez.

20 May, 2025 01:00PM

May 19, 2025

Melissa Wen

A Look at the Latest Linux KMS Color API Developments on AMD and Intel

This week, I reviewed the last available version of the Linux KMS Color API. Specifically, I explored the proposed API by Harry Wentland and Alex Hung (AMD), their implementation for the AMD display driver and tracked the parallel efforts of Uma Shankar and Chaitanya Kumar Borah (Intel) in bringing this plane color management to life. With this API in place, compositors will be able to provide better HDR support and advanced color management for Linux users.

To get a hands-on feel for the API’s potential, I developed a fork of drm_info compatible with the new color properties. This allowed me to visualize the display hardware color management capabilities being exposed. If you’re curious and want to peek behind the curtain, you can find my exploratory work on the drm_info/kms_color branch. The README there will guide you through the simple compilation and installation process.

Note: You will need to update libdrm to match the proposed API. You can find an updated version in my personal repository here. To avoid potential conflicts with your official libdrm installation, you can compile and install it in a local directory. Then, use the following command: export LD_LIBRARY_PATH="/usr/local/lib/"

In this post, I invite you to familiarize yourself with the new API that is about to be released. You can start doing as I did below: just deploy a custom kernel with the necessary patches and visualize the interface with the help of drm_info. Or, better yet, if you are a userspace developer, you can start developing user cases by experimenting with it.

The more eyes the better.

KMS Color API on AMD

The great news is that AMD’s driver implementation for plane color operations is being developed right alongside their Linux KMS Color API proposal, so it’s easy to apply to your kernel branch and check it out. You can find details of their progress in the AMD’s series.

I just needed to compile a custom kernel with this series applied, intentionally leaving out the AMD_PRIVATE_COLOR flag. The AMD_PRIVATE_COLOR flag guards driver-specific color plane properties, which experimentally expose hardware capabilities while we don’t have the generic KMS plane color management interface available.

If you don’t know or don’t remember the details of AMD driver specific color properties, you can learn more about this work in my blog posts [1] [2] [3]. As driver-specific color properties and KMS colorops are redundant, the driver only advertises one of them, as you can see in AMD workaround patch 24.

So, with the custom kernel image ready, I installed it on a system powered by AMD DCN3 hardware (i.e. my Steam Deck). Using my custom drm_info, I could clearly see the Plane Color Pipeline with eight color operations as below:

└───"COLOR_PIPELINE" (atomic): enum {Bypass, Color Pipeline 258} = Bypass
    ├───Bypass
    └───Color Pipeline 258
        ├───Color Operation 258
        │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, Multiplier, 3D LUT} = 1D Curve
        │   ├───"BYPASS" (atomic): range [0, 1] = 1
        │   └───"CURVE_1D_TYPE" (atomic): enum {sRGB EOTF, PQ 125 EOTF, BT.2020 Inverse OETF} = sRGB EOTF
        ├───Color Operation 263
        │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, Multiplier, 3D LUT} = Multiplier
        │   ├───"BYPASS" (atomic): range [0, 1] = 1
        │   └───"MULTIPLIER" (atomic): range [0, UINT64_MAX] = 0
        ├───Color Operation 268
        │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, Multiplier, 3D LUT} = 3x4 Matrix
        │   ├───"BYPASS" (atomic): range [0, 1] = 1
        │   └───"DATA" (atomic): blob = 0
        ├───Color Operation 273
        │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, Multiplier, 3D LUT} = 1D Curve
        │   ├───"BYPASS" (atomic): range [0, 1] = 1
        │   └───"CURVE_1D_TYPE" (atomic): enum {sRGB Inverse EOTF, PQ 125 Inverse EOTF, BT.2020 OETF} = sRGB Inverse EOTF
        ├───Color Operation 278
        │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, Multiplier, 3D LUT} = 1D LUT
        │   ├───"BYPASS" (atomic): range [0, 1] = 1
        │   ├───"SIZE" (atomic, immutable): range [0, UINT32_MAX] = 4096
        │   ├───"LUT1D_INTERPOLATION" (immutable): enum {Linear} = Linear
        │   └───"DATA" (atomic): blob = 0
        ├───Color Operation 285
        │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, Multiplier, 3D LUT} = 3D LUT
        │   ├───"BYPASS" (atomic): range [0, 1] = 1
        │   ├───"SIZE" (atomic, immutable): range [0, UINT32_MAX] = 17
        │   ├───"LUT3D_INTERPOLATION" (immutable): enum {Tetrahedral} = Tetrahedral
        │   └───"DATA" (atomic): blob = 0
        ├───Color Operation 292
        │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, Multiplier, 3D LUT} = 1D Curve
        │   ├───"BYPASS" (atomic): range [0, 1] = 1
        │   └───"CURVE_1D_TYPE" (atomic): enum {sRGB EOTF, PQ 125 EOTF, BT.2020 Inverse OETF} = sRGB EOTF
        └───Color Operation 297
            ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, Multiplier, 3D LUT} = 1D LUT
            ├───"BYPASS" (atomic): range [0, 1] = 1
            ├───"SIZE" (atomic, immutable): range [0, UINT32_MAX] = 4096
            ├───"LUT1D_INTERPOLATION" (immutable): enum {Linear} = Linear
            └───"DATA" (atomic): blob = 0

Note that Gamescope is currently using AMD driver-specific color properties implemented by me, Autumn Ashton and Harry Wentland. It doesn’t use this KMS Color API, and therefore COLOR_PIPELINE is set to Bypass. Once the API is accepted upstream, all users of the driver-specific API (including Gamescope) should switch to the KMS generic API, as this will be the official plane color management interface of the Linux kernel.

KMS Color API on Intel

On the Intel side, the driver implementation available upstream was built upon an earlier iteration of the API. This meant I had to apply a few tweaks to bring it in line with the latest specifications. You can explore their latest work here. For a more simplified handling, combining the V9 of the Linux Color API, Intel’s contributions, and my necessary adjustments, check out my dedicated branch.

I then compiled a kernel from this integrated branch and deployed it on a system featuring Intel TigerLake GT2 graphics. Running my custom drm_info revealed a Plane Color Pipeline with three color operations as follows:

├───"COLOR_PIPELINE" (atomic): enum {Bypass, Color Pipeline 480} = Bypass
│   ├───Bypass
│   └───Color Pipeline 480
│       ├───Color Operation 480
│       │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, 1D LUT Mult Seg, 3x3 Matrix, Multiplier, 3D LUT} = 1D LUT Mult Seg
│       │   ├───"BYPASS" (atomic): range [0, 1] = 1
│       │   ├───"HW_CAPS" (atomic, immutable): blob = 484
│       │   └───"DATA" (atomic): blob = 0
│       ├───Color Operation 487
│       │   ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, 1D LUT Mult Seg, 3x3 Matrix, Multiplier, 3D LUT} = 3x3 Matrix
│       │   ├───"BYPASS" (atomic): range [0, 1] = 1
│       │   └───"DATA" (atomic): blob = 0
│       └───Color Operation 492
│           ├───"TYPE" (immutable): enum {1D Curve, 1D LUT, 3x4 Matrix, 1D LUT Mult Seg, 3x3 Matrix, Multiplier, 3D LUT} = 1D LUT Mult Seg
│           ├───"BYPASS" (atomic): range [0, 1] = 1
│           ├───"HW_CAPS" (atomic, immutable): blob = 496
│           └───"DATA" (atomic): blob = 0

Observe that Intel’s approach introduces additional properties like “HW_CAPS” at the color operation level, along with two new color operation types: 1D LUT with Multiple Segments and 3x3 Matrix. It’s important to remember that this implementation is based on an earlier stage of the KMS Color API and is awaiting review.

A Shout-Out to Those Who Made This Happen

I’m impressed by the solid implementation and clear direction of the V9 of the KMS Color API. It aligns with the many insightful discussions we’ve had over the past years. A huge thank you to Harry Wentland and Alex Hung for their dedication in bringing this to fruition!

Beyond their efforts, I deeply appreciate Uma and Chaitanya’s commitment to updating Intel’s driver implementation to align with the freshest version of the KMS Color API. The collaborative spirit of the AMD and Intel developers in sharing their color pipeline work upstream is invaluable. We’re now gaining a much clearer picture of the color capabilities embedded in modern display hardware, all thanks to their hard work, comprehensive documentation, and engaging discussions.

Finally, thanks all the userspace developers, color science experts, and kernel developers from various vendors who actively participate in the upstream discussions, meetings, workshops, each iteration of this API and the crucial code review process. I’m happy to be part of the final stages of this long kernel journey, but I know that when it comes to colors, one step is completed for new challenges to be unlocked.

Looking forward to meeting you in this year Linux Display Next hackfest, organized by AMD in Toronto, to further discuss HDR, advanced color management, and other display trends.

19 May, 2025 09:05PM

May 17, 2025

Andrew Cater

Debian 12.11 - testing completed, images being signed and we'll be back for the next point release on ???

 All finished and wrapping up. The bug I thought was fixed has been identified on two distinct sets of hardware. There are workarounds: the most sensible is *not* to use i386 without a modeset parameter but to just use amd64 instead. amd64 works on the identical problematic hardware in question - just use 64 bit.

17 May, 2025 06:00PM by Andrew Cater (noreply@blogger.com)