June 19, 2023

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

RcppArmadillo 0.12.4.1.0 on CRAN: New Upstream Bugfix

armadillo image

Armadillo is a powerful and expressive C++ template library for linear algebra and scientific computing. It aims towards a good balance between speed and ease of use, has a syntax deliberately close to Matlab, and is useful for algorithm development directly in C++, or quick conversion of research code into production environments. RcppArmadillo integrates this library with the R environment and language–and is widely used by (currently) 1079 other packages on CRAN, downloaded 29.6 million times (per the partial logs from the cloud mirrors of CRAN), and the CSDA paper) (preprint / vignette) by Conrad and myself has been cited 543 times according to Google Scholar.

This release brings bugfix upstream release 12.4.1 made by Conrad at the end of last week. As usual, I prepared the usual release candidate, tested on the over 1000 reverse depends (which sadly takes a long time on old hardware), found no issues and sent it to CRAN. Where it got tested again and was by a stroke of bad luck upheld for two unrelated issue (one package fell over one of its other dependencies changing a data representation, another fell afoul of a tightened test on total test time) so this awaited the usual email handshake with the CRAN maintainers … and the weekend got in the way. The release also contains a PR kindly provided by Mikael Jagan for an upcoming change in package Matrix.

As a bugfix release, the set of changes is fairly small.

Changes in RcppArmadillo version 0.12.4.1.0 (2023-06-17)

  • Upgraded to Armadillo release 12.4.1 (Cortisol Profusion Redux)

    • fix bug in SpMat::shed_cols()

    • functions such as .is_finite() and find_nonfinite() will now emit a runtime warning when compiled in fast math mode; such compilation mode disables detection of non-finite values

  • Accommodate upcoming change in package Matrix (Mikael Jagan in #417 addressing #415)

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the Rcpp R-Forge page.

If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

19 June, 2023 02:05PM

hackergotchi for Daniel Lange

Daniel Lange

Linux kernel USB errors -71 and -110

After an upgrade of my PC's mainboard BIOS the boot would take a minute or more to complete and sometimes the lightdm login screen would sit there but not accept keyboard input for another minute or so. Then the keyboard got enabled and I could log in normally. Everything worked fine after that bootup struggle completed. This was fully reproducible and persisted across reboots. Weird.

The kernel dmesg log showed entries that looked suspicious:

dmesg log excerpt showing USB error messages

Googleing these error -110 and error -71 is a bit hard. Now why the USB driver does not give useful error messages instead of archaic errno-style numbers escapes me. This is not the 80s anymore.

Citation needed (Wikipedia style) The wisdom of the crowd says error -110 is something around "the USB port power supply was exceeded" [source].

Now lsusb -tv shows device 1-7 ... to be my USB keyboard. I somehow doubt that wants more power than the hub is willing to provide.

The Archlinux BBS Forums recommend to piece together information from drivers/usb/host/ohci.h and (updated from their piece which is from 2012) /tools/include/uapi/asm-generic/errno.h. This is why some people then consider -110 to mean "Connection timed out". Nah, not likely either.

Reading through the kernel source around drivers/usb/host did not enlighten me either. To the contrary. Uuugly. There seems to be no comprehensive list what these error codes mean. And the numbers are assigned to errors conditions quite arbitrarily. And - of course - there is no documentation. "It was hard to do, so it should be hard to understand as well."

Luckily some of the random musings I read through contained some curious advice: power cycle the host. So I did and that did not make the error go away. Other people insisted on removing cables out of wall sockets, unplugging everything and conducting esoteric rituals. That made it dawn on me, the mainboard of course nicely powers the USB in "off" state, too. So switching the power supply off (yes, these have a separate switch, go find yours), waiting a bit for capacitors to drain and switching things back on and ... the errors were gone, the system booted within seconds again.

So the takeaway message: If you get random error messages like

device descriptor read/64, error -110
device not accepting address 42, error -71

on devices that previously worked fine ... completely remove power from the host, the hubs and the USB devices. So they forget they saw each other on the bus before. And when they see each other after that blackout, they will happily go through negotiating protocol details with each other again successfully.

19 June, 2023 11:00AM by Daniel Lange

hackergotchi for C.J. Collier

C.J. Collier

First taste of Debian 12

As some of you may know, the Debian project released v12, bookworm to stable on the 10th of this month. I haven’t had a reason to try it yet, but I’m downloading it now. My first thought is that it’s much larger than I expected. The normal sized version used to fit on a CD-ROM disk, so around 650MB. The netinst has until now been even smaller, with the most recent versions being about 256MB if I recall correctly. The netinst, now with proprietary firmware, weighs in over 700MB:

2023-06-18 20:29:29 (6.85 MB/s) - ‘debian-12.0.0-amd64-netinst.iso’ saved [773849088/773849088]

It looks like 738.0MB!

The system I’m installing is for a piece of software I’m crafting for use by Remote Online Notaries. I’m building a disk image for the notary’s side of the connection. The notary will have a yubikey-style PGP card which they will use as a digital seal similar to the one required to perform notarial acts for the state of washington. I’ll leave some ramblings on the blog as I go through the process of implementation. Here’s a link to whet your appetite!

https://app.leg.wa.gov/wac/default.aspx?cite=308-30-020

19 June, 2023 04:49AM by C.J. Collier

June 18, 2023

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

spdl 0.0.5 on CRAN: Small Extension

Another quick update to the still somewhat new package spdl is now om CRAN, and will go to Debian soon too. The key focus of spdl is to offer the exact same interface to logging from both R and C++ by relying on spdlog via my RcppSpdlog package. Usage examples are shown on the RcppSpdlog docs page.

This release add support for the wrappers init() and log() wrapping the existing setup() function but requiring only the level argument. This requires version 0.0.13 of RcppSpdlog which was released to CRAN yesterday.

The short NEWS entry follows.

Changes in spdl version 0.0.5 (2023-06-18)

  • Add simple aliases init() and log() wrapping setup() but requiring only the logging level argument

Courtesy of my CRANberries, there is also a diffstat report. More detailed information is on the spdl page

If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

18 June, 2023 04:11PM

hackergotchi for Louis-Philippe Véronneau

Louis-Philippe Véronneau

Solo V2: nice but flawed

I recently received the two Solo V2 hardware tokens I ordered as part of their crowdfunding campaign, back in March 2022. It did take them longer than advertised to ship me the tokens, but that's hardly unexpected from such small-scale, crowdfunded undertaking.

I'm mostly happy about my purchase and I'm glad to get rid of the aging Tomu boards I was using as U2F tokens1. Still, beware: I am not sure it's a product I would recommend if what you want is simply something that works. If you do not care about open-source hardware, the Solo V2 is not for you.

The Good

A side-by-side view of the Solo V2's top and back sides

I first want to mention I find the Solo V2 gorgeous. I really like the black and gold color scheme of the USB-A model (which is reversible!) and it seems like a well built and solid device. I'm not afraid to have it on my keyring and I fully expect it to last a long time.

An animation of the build process, showing how the PCB is assembled and then slotted into the shell

I'm also very impressed by the modular design: the PCB sits inside a shell, which decouples the logic from the USB interface and lets them manufacture a single board for both the USB-C and USB-A models. The clear epoxy layer on top of the PCB module also looks very nice in my opinion.

A picture of the Solo V2 with its silicone case on my keyring, showing the 3 capacitive buttons

I'm also very happy the Solo V2 has capacitive touch buttons instead of physical "clicky" buttons, as it means the device has no moving parts. The token has three buttons (the gold metal strips): one on each side of the device and a third one near the keyhole.

As far as I've seen, the FIDO2 functions seem to work well via the USB interface and do not require any configuration on a Debian 12 machine. I've already migrated to the Solo V2 for web-based 2FA and I am in the process of migrating to an SSH ed25519-sk key. Here is a guide I recommend if you plan on setting those up with a Solo V2.

The Bad and the Ugly

Sadly, the Solo V2 is far from being a perfect project. First of all, since the crowdfunding campaign is still being fulfilled, it is not currently commercially available. Chances are you won't be able to buy one directly before at least Q4 2023.

I've also hit what seems to be a pretty big firmware bug, or at least, one that affects my use case quite a bit. Invoking gpg crashes the Solo V2 completely if you also have scdaemon installed. Since scdaemon is necessary to use gpg with an OpenPGP smartcard, this means you cannot issue any gpg commands (like signing a git commit...) while the Solo V2 is plugged in.

Any gpg commands that queries scdaemon, such as gpg --edit-card or gpg --sign foo.txt times out after about 20 seconds and leaves the token unresponsive to both touch and CLI commands.

The way to "fix" this issue is to make sure scdaemon does not interact with the Solo V2 anymore, using the reader-port argument:

  1. Plug both your Solo V2 and your OpenPGP smartcard

  2. To get a list of the tokens scdaemon sees, run the following command: $ echo scd getinfo reader_list | gpg-connect-agent --decode | awk '/^D/ {print $2}'

  3. Identify your OpenPGP smartcard. For example, my Nitrokey Start is listed as 20A0:4211:FSIJ-1.2.15-43211613:0

  4. Create a file in ~/.gnupg/scdaemon.conf with the following line reader-port $YOUR_TOKEN_ID. For example, in my case I have: reader-port 20A0:4211:FSIJ-1.2.15-43211613:0

  5. Reload scdaemon: $ gpgconf --reload scdaemon

Although this is clearly a firmware bug2, I do believe GnuPG is also partly to blame here. Let's just say I was not very surprised to have to battle scdaemon again, as I've had previous issues with it.

Which leads me to my biggest gripe so far: it seems SoloKeys (the company) isn't really fixing firmware issues anymore and doesn't seems to care. The last firmware release is about a year old.

Although people are experiencing serious bugs, there is no official way to report them, which leads to issues being seemingly ignored. For example, the NFC feature is apparently killing keys (!!!), but no one from the company seems to have acknowledged the issue. The same goes for my GnuPG bug, which was flagged in September 2022.

For a project that mainly differentiates itself from its (superior) competition by being "Open", it's not a very good look... Although “SoloKeys is still an unprofitable open source side business of its creators3, this kind of attitude certainly doesn't help foster trust.

Conclusion

If you want to have a nice, durable FIDO2 token, I would suggest you get one of the many models Yubico offers. They are similarly priced, are readily commercially available, are part of a nice and maintained software ecosystem and have more features than the Solo V2 (OpenPGP support being the one I miss the most). Yubikeys are the practical option.

What they are not is open-source hardware, whereas the Solo V2 is. As bunnie very well explained on his blog in 2019, it does not mean the later is inherently more trustable than the former, but it does make the Solo V2 the ideological option. Knowledge is power and it should be free.

As such, tread carefully with SoloKeys, but don't dismiss them altogether: the Solo V2 is certainly functioning well enough for me.


  1. Although U2F is still part of the FIDO2 specification, the Tomus predate this standard and were thus not fully compliant with FIDO2. So long and thanks for all the fish little boards, you've served me well! 

  2. It appears the Solo V2 shares its firmware with the Nitrokey 3, which had a similar issue a while back. 

  3. This is a direct quote from one of the Solo V2 firmware maintainers. 

18 June, 2023 04:00AM by Louis-Philippe Véronneau

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

RcppSpdlog 0.0.13 on CRAN: Small Extensions

Version 0.0.13 of RcppSpdlog is now on CRAN and will be soon be uploaded to Debian too. RcppSpdlog bundles spdlog, a wonderful header-only C++ logging library with all the bells and whistles you would want that was written by Gabi Melman, and also includes fmt by Victor Zverovich. You can learn more at the package documention site.

This release adds a small (but handy) accessor generalisation: Instead of calling setup() with two arguments for a label and the logging level we now only require the desired level. We also cleaned up one implementation detail for the stopwatch feature added in January, and simplified the default C++ compilation standard setting.

The NEWS entry for this release follows.

Changes in RcppSpdlog version 0.0.13 (2023-06-17)

  • Minor tweak to stopwatch setup avoids pulling in fmt

  • No longer set a C++ compilation standard as the default choices by R are sufficient for the package

  • Add convenience wrapper log_init omitting first argument to log_setup while preserving the interface from the latter

  • Add convenience setup wrappers init and log to API header file spdl.h

Courtesy of my CRANberries, there is also a diffstat report. More detailed information is on the RcppSpdlog page, or the package documention site.

If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

18 June, 2023 01:25AM

June 17, 2023

hackergotchi for Bastian Venthur

Bastian Venthur

Blag 2.0 released

A few days ago, I released a major update on blag, my blog-aware static-site generator, which introduces a few backwards-incompatible changes and many improvements over the old version.

Good-looking default theme

The old bare-bones default theme has been replaced with a good-looking one, based on the one used on this blog:

Blag Screenshot

It comes with a light- and dark theme that switches automatically based on the browser setting, as well as fitting light- and dark syntax highlighting themes for code blocks.

Improved quickstart

The blag quickstart command has been improved. Additionally to generating the configuration, it now also populates the working directory with the templates-, static- and content directories, containing the updated default theme and a few content pages to get you started.

No internal fallback templates anymore

Related to the changes in quickstart, the internal fallback template has been removed, and blag now completely relies on the templates in the local templates directory. This makes it more transparent for the user what is happening while simplifying blag’s internal logic.

However, this is a backwards incompatible change! In the case of a missing template, the user will be warned with a hint on how to obtain the missing template.

Index and archive are now separate

Previously, the front-page would always show the archive of all articles. This is not very useful when your blog contains more than a few dozen articles. With blag 2.0, the previous archive has been split into index and archive, where index is the front-page showing only the most recent 15 articles by default and linking to the archive which shows all articles. There’s also two corresponding templates in the templates directory.

Miscellaneous

  • Various dependencies have been updated.
  • blag’s documentation has been migrated from Sphinx to MkDocs, which is a bit more lightweight and easier to maintain
  • A packaging issue has been fixed, where the tests/conftest.py was missing in the source distribution

Blag 2.0 is available on pypi, debian/unstable and github

17 June, 2023 01:45PM by Bastian Venthur

John Goerzen

Using dar for Data Archiving

This is the third post in a series about data archiving to removable media (optical discs and hard drives). In the first, I explained the difference between backing up and archiving, established goals for the project, and said I’d evaluate git-annex and dar. The second post evaluated git-annex, and now it’s time to look at dar. The series will conclude with a post comparing git-annex with dar.

What is dar?

I could open with the same thing I did with git-annex, just changing the name of the program: “[dar] is a fantastic and versatile program that does… well, it’s one of those things that can do so much that it’s a bit hard to describe.” It is, fundamentally, an archiver like tar or zip (makes one file representing a bunch of other files), but it goes far beyond that. dar’s homepage lays out a comprehensive list of features, which I will try to summarize here.

  • Dar itself is both a library (with C++ and Python bindings) for interacting with data, and a CLI tool (dar itself).
  • Alongside this, there is an ecosystem of tools around dar, including GUIs for multiple platforms, backup scripts, and FUSE implementations.
  • Dar is like tar in that it can read and write files sequentially if desired. Dar archives can be streamed, just like tar archives. But dar takes it further; if you have dar_slave on the remote end, random access is possible over ssh (dramatically speeding up certain operations).
  • Dar is like zip in that a dar archive contains a central directory (called a catalog) which permits random access to the contents of an archive. In other words, you don’t have to read an entire archive to extract just one file (assuming the archive is on disk or something that itself permits random access). Also, dar can compress each file individually, rather than the tar approach of compressing the archive as a whole. This increases archive performance (dar knows not to try to compress already-compressed data), boosts restore resilience (corruption of one part of an archive doesn’t invalidate the entire rest of it), and boosts restore performance (permitting random access).
  • Dar can split an archive into multiple pieces called slices, and it can even split member files among the slices. The catalog contains information allowing you to know which slice(s) a given file is saved in.
  • The catalog can also be saved off in a file of its own (dar calls this an “isolated catalog”). Isolated catalogs record just metadata about files archived.
  • dar_manager can assemble a database by reading archives or isolated catalogs, letting you know where files are stored and facilitating restores using the minimal number of discs.
  • Dar supports differential/incremental backups, which record changes since the last backup. These backups record not just additions, but also deletions. dar can optionally use rsync-style binary deltas to minimize the space needed to record changes. Dar does not suffer from GNU tar’s data loss bug with incrementals.
  • Dar can “slice and dice” archives like Perl does strings. The usage notes page shows how you can merge archives, create decremental archives (where the full backup always reflects the current state of the system, and incrementals go backwards in time instead of forwards), etc. You can change the compression algorithm on an existing archive, re-slice it, etc.
  • Dar is extremely careful about preserving all metadata: hard links, sparse files, symlinks, timestamps (including subsecond resolution), EAs, POSIX ACLs, resource forks on Mac, detecting files being modified while being read, etc. It makes a nice way to copy directories, sort of similar to rsync -avxHAXS.

So to tie this together for this project, I will set up a 400MB slice size (to mimic what I did with git-annex), and see how dar saves the data and restores it.

Isolated cataloges aren’t strictly necessary for this, but by using them (and/or dar_manager), we can build up a database of files and locations and thus directly compare dar to git-annex location tracking.

Walkthrough: Creating the first archive

As with the git-annex walkthrough, I’ll set some variables to make it easy to remember:

  • $SOURCEDIR is the directory being backed up
  • $DRIVE is the directory for backups to be stored in. Since dar can split by a specified size, I don’t need to make separate filesystems to simulate the separate drive experience as I did with git-annex.
  • $CATDIR will hold isolated catalogs
  • $DARDB points to the dar_manager database

OK, we can run the backup immediately. No special setup is needed. dar supports both short-form (single-character) parameters and long-form ones. Since the parameters probably aren’t familiar to everyone, I will use the long-form ones in these examples.

Here’s how we create our initial full backup. I’ll explain the parameters below:


$ dar \
--verbose \
--create $DRIVE/bak1 \
--on-fly-isolate $CATDIR/bak1 \
--slice 400M \
--min-digits 2 \
--pause \
--fs-root $SOURCEDIR

Let’s look at each of these parameters:

  • –verbose does what you expect
  • –create selects the operation mode (like tar -c) and gives the archive basename
  • –on-fly-isolate says to write an isolated catalog as well, right while making the archive. You can always create an isolated catalog later (which is fast, since it only needs to read the last bits of the last slice) but it’s more convenient to do it now, so we do. We give the base name for the isolated catalog also.
  • –slice 400M says to split the archive, and create slices 400MB each.
  • –min-digits 2 pertains to naming files. Without it, dar would create files named bak1.dar.1, bak1.dar.2, bak1.dar.10, etc. dar works fine with this, but it can be annoying in ls. This is just convenience for humans.
  • –pause tells dar to pause after writing each slice. This would let us swap drives, burn discs, etc. I do this for demonstration purposes only; it isn’t strictly necessary in this situation. For a more powerful option, dar also supports –execute, which can run commands after each slice.
  • –fs-root gives the path to actually back up.

This same command could have been written with short options as:


$ dar -v -c $DRIVE/bak1 -@ $CATDIR/bak1 -s 400M -9 2 -p -R $SOURCEDIR

What does it look like while running? Here’s an excerpt:


...
Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/[redacted]
Finished writing to file 1, ready to continue ? [return = YES | Esc = NO]
...
Writing down archive contents...
Closing the escape layer...
Writing down the first archive terminator...
Writing down archive trailer...
Writing down the second archive terminator...
Closing archive low layer...
Archive is closed.

--------------------------------------------
581 inode(s) saved
including 0 hard link(s) treated
0 inode(s) changed at the moment of the backup and could not be saved properly
0 byte(s) have been wasted in the archive to resave changing files
0 inode(s) with only metadata changed
0 inode(s) not saved (no inode/file change)
0 inode(s) failed to be saved (filesystem error)
0 inode(s) ignored (excluded by filters)
0 inode(s) recorded as deleted from reference backup
--------------------------------------------
Total number of inode(s) considered: 581
--------------------------------------------
EA saved for 0 inode(s)
FSA saved for 581 inode(s)
--------------------------------------------
Making room in memory (releasing memory used by archive of reference)...
Now performing on-fly isolation...
...

That was easy! Let’s look at the contents of the backup directory:


$ ls -lh $DRIVE
total 3.7G
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.01.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.02.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.03.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.04.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.05.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.06.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.07.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.08.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:29 bak1.09.dar
-rw-r--r-- 1 jgoerzen jgoerzen 156M Jun 16 19:33 bak1.10.dar

And the isolated catalog:


$ ls -lh $CATDIR
total 37K
-rw-r--r-- 1 jgoerzen jgoerzen 35K Jun 16 19:33 bak1.1.dar

The isolated catalog is stored compressed automatically.

Well this was easy. With one command, we archived the entire data set, split into 400MB chunks, and wrote out the catalog data.

Walkthrough: Inspecting the saved archive

Can dar tell us which slice contains a given file? Sure:


$ dar --list $DRIVE/bak1 --list-format=slicing | less
Slice(s)|[Data ][D][ EA ][FSA][Compr][S]|Permission| Filemane
--------+--------------------------------+----------+-----------------------------
...
1 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- [redacted]
1-2 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- [redacted]
2 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- [redacted]
...

This illustrates the transition from slice 1 to slice 2. The first file was stored entirely in slice 1; the second stored partially in slice 1 and partially in slice 2, and third solely in slice 2. We can get other kinds of information as well.


$ dar --list $DRIVE/bak1 | less
[Data ][D][ EA ][FSA][Compr][S]| Permission | User | Group | Size | Date | filename
--------------------------------+------------+-------+-------+---------+-------------------------------+------------
[Saved][ ] [-L-][ 0%][X] -rwxr--r-- jgoerzen jgoerzen 24 Mio Mon Mar 5 07:58:09 2018 [redacted]
[Saved][ ] [-L-][ 0%][X] -rwxr--r-- jgoerzen jgoerzen 16 Mio Mon Mar 5 07:58:09 2018 [redacted]
[Saved][ ] [-L-][ 0%][X] -rwxr--r-- jgoerzen jgoerzen 22 Mio Mon Mar 5 07:58:09 2018 [redacted]

These are the same files I was looking at before. Here we see they are 24MB, 16MB, and 22MB in size, and some additional metadata. Even more is available in the XML list format.

Walkthrough: updates

As with git-annex, I’ve made some changes in the source directory: moved a file, added another, and deleted one. Let’s create an incremental backup now:


$ dar \
--verbose \
--create $DRIVE/bak2 \
--on-fly-isolate $CATDIR/bak2 \
--ref $CATDIR/bak1 \
--slice 400M \
--min-digits 2 \
--pause \
--fs-root $SOURCEDIR

This command is very similar to the earlier one. Instead of writing an archive and catalog named bak1, we write one named bak2. What’s new here is --ref $CATDIR/bak1. That says, make an incremental based on an archive of reference. All that is needed from that archive of reference is the detached catalog. --ref $DRIVE/bak1 would have worked equally well here.

Here’s what I did to the $SOURCEDIR:

  • Renamed a file to file01-unchanged
  • Deleted a file
  • Copied /bin/cp to a file named cp

Let’s see if dar’s command output matches this:


...
Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/file01-unchanged
Saving Filesystem Specific Attributes for /acrypt/no-backup/jgoerzen/testdata/file01-unchanged
Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/cp
Saving Filesystem Specific Attributes for /acrypt/no-backup/jgoerzen/testdata/cp
Adding folder to archive: [redacted]
Saving Filesystem Specific Attributes for [redacted]
Adding reference to files that have been destroyed since reference backup...
...
--------------------------------------------
3 inode(s) saved
including 0 hard link(s) treated
0 inode(s) changed at the moment of the backup and could not be saved properly
0 byte(s) have been wasted in the archive to resave changing files
0 inode(s) with only metadata changed
578 inode(s) not saved (no inode/file change)
0 inode(s) failed to be saved (filesystem error)
0 inode(s) ignored (excluded by filters)
2 inode(s) recorded as deleted from reference backup
--------------------------------------------
Total number of inode(s) considered: 583
--------------------------------------------
EA saved for 0 inode(s)
FSA saved for 3 inode(s)
--------------------------------------------
...

Yes, it does. The rename is recorded as a deletion and an addition, since dar doesn’t directly track renames. So the rename plus the deletion account for the two deletions. The rename plus the addition of cp count as 2 of the 3 inodes saved; the third is the modified directory from which files were deleted and moved out.

Let’s see the files that were created:


$ ls -lh $DRIVE/bak2*
-rw-r--r-- 1 jgoerzen jgoerzen 18M Jun 16 19:52 /acrypt/no-backup/jgoerzen/dar-testing/drive/bak2.01.dar
$ ls -lh $CATDIR/bak2*
-rw-r--r-- 1 jgoerzen jgoerzen 22K Jun 16 19:52 /acrypt/no-backup/jgoerzen/dar-testing/cat/bak2.1.dar

What does –list look like now?


Slice(s)|[Data ][D][ EA ][FSA][Compr][S]|Permission| Filemane
--------+--------------------------------+----------+-----------------------------
[ ][ ] [---][-----][X] -rwxr--r-- [redacted]
1 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- file01-unchanged
...
[--- REMOVED ENTRY ----][redacted]
[--- REMOVED ENTRY ----][redacted]

Here I show an example of:

  1. A file that was not changed from the initial backup. Its presence was simply noted, but because we’re doing an incremental, the data wasn’t saved.
  2. A file that is saved in this incremental, on slice 1.
  3. The two deleted files

Walkthrough: dar_manager

As we’ve seen above, the two archives (or their detached catalog) give us a complete picture of what files were present at the time of the creation of each archive, and what files were stored in a given archive. We can certainly continue working in that way. We can also use dar_manager to build a comprehensive database of these archives, to be able to find what media is necessary to restore each given file. Or, with dar_manager’s –when parameter, we can restore files as of a particular date.

Let’s try it out. First, we create our database:


$ dar_manager --create $DARDB
$ dar_manager --base $DARDB --add $DRIVE/bak1
Auto detecting min-digits to be 2
$ dar_manager --base $DARDB --add $DRIVE/bak2
Auto detecting min-digits to be 2

Here we created the database, and added our two catalogs to it. (Again, we could have as easily used $CATDIR/bak1; either the archive or its isolated catalog will work here.) It’s important to add the catalogs in order.

Let’s do some quick experimentation with dar_manager:


$ dar_manager -v --base $DARDB --list
Decompressing and loading database to memory...

dar path :
dar options :
database version : 6
compression used : gzip
compression level: 9

archive # | path | basename
------------+--------------+---------------
1 /acrypt/no-backup/jgoerzen/dar-testing/drive bak1
2 /acrypt/no-backup/jgoerzen/dar-testing/drive bak2

$ dar_manager --base $DARDB --stat
archive # | most recent/total data | most recent/total EA
--------------+-------------------------+-----------------------
1 580/581 0/0
2 3/3 0/0

The –list option shows the correlation between dar_manager archive number (1, 2) with filenames (bak1, bak2). It is coincidence here that 1/bak1 and 2/bak2 correlate; that’s not necessarily the case. Most dar_manager commands operate on archive number, while dar commands operate on archive path/basename.

Now let’s see just what files are saved in archive , the incremental:


$ dar_manager --base $DARDB --used 2
[ Saved ][ ] [redacted]
[ Saved ][ ] file01-unchanged
[ Saved ][ ] cp

Now we can also where a file is stored. Here’s one that was saved in the full backup and unmodified in the incremental:


$ dar_manager --base $DARDB --file [redacted]
1 Fri Jun 16 19:15:12 2023 saved absent
2 Fri Jun 16 19:15:12 2023 present absent

(The absent at the end refers to extended attributes that the file didn’t have)

Similarly, for files that were added or removed, they’ll be listed only at the appropriate place.

Walkthrough: Restoration

I’m not going to repeat the author’s full restoration with dar page, but here are some quick examples.

A simple way of doing everything is using incrementals for the whole series. To do that, you’d have bak1 be full, bak2 based on bak1, bak3 based on bak2, bak4 based on bak3, etc. To restore from such a series, you have two options:

  • Use dar to simply extract each archive in order. It will handle deletions, renames, etc. along the way.
  • Use dar_manager with the backup database to do manage the process. It may be somewhat more efficient, as it won’t bother to restore files that will later be modified or deleted.

If you get fancy — for instance, bak2 is based on bak1, bak3 on bak2, bak4 on bak1 — then you would want to use dar_manager to ensure a consistent restore is completed. Either way, the process is nearly identical. Also, I figure, to make things easy, you can save a copy of the entire set of isolated catalogs before you finalize each disc/drive. They’re so small, and this would let someone with just the most recent disc build a dar_manager database without having to go through all the other discs.

Anyhow, let’s do a restore using just dar. I’ll make a $RESTOREDIR and do it that way.


$ dar \
--verbose \
--extract $DRIVE/bak1 \
--fs-root $RESTOREDIR \
--no-warn \
--execute "echo Ready for slice %n. Press Enter; read foo"

This –execute lets us see how dar works; this is an illustration of the power it has (above –pause); it’s a snippet interpreted by /bin/sh with %n being one of the dar placeholders. If memory serves, it’s not strictly necessary, as dar will prompt you for slices it needs if they’re not mounted. Anyhow, you’ll see it first reading the last slice, which contains the catalog, then reading from the beginning.

Here we go:


Auto detecting min-digits to be 2
Opening archive bak1 ...
Opening the archive using the multi-slice abstraction layer...
Ready for slice 10. Press Enter
...
Loading catalogue into memory...
Locating archive contents...
Reading archive contents...
File ownership will not be restored du to the lack of privilege, you can disable this message by asking not to restore file ownership [return = YES | Esc = NO]
Continuing...
Restoring file's data: [redacted]
Restoring file's FSA: [redacted]
Ready for slice 1. Press Enter
...
Ready for slice 2. Press Enter
...
--------------------------------------------
581 inode(s) restored
including 0 hard link(s)
0 inode(s) not restored (not saved in archive)
0 inode(s) not restored (overwriting policy decision)
0 inode(s) ignored (excluded by filters)
0 inode(s) failed to restore (filesystem error)
0 inode(s) deleted
--------------------------------------------
Total number of inode(s) considered: 581
--------------------------------------------
EA restored for 0 inode(s)
FSA restored for 0 inode(s)
--------------------------------------------

The warning is because I’m not doing the extraction as root, which limits dar’s ability to fully restore ownership data.

OK, now the incremental:


$ dar \
--verbose \
--extract $DRIVE/bak2 \
--fs-root $RESTOREDIR \
--no-warn \
--execute "echo Ready for slice %n. Press Enter; read foo"
...
Ready for slice 1. Press Enter
...
Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/file01-unchanged
Restoring file's FSA: /acrypt/no-backup/jgoerzen/dar-testing/restore/file01-unchanged
Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/cp
Restoring file's FSA: /acrypt/no-backup/jgoerzen/dar-testing/restore/cp
Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/[redacted directory]
Removing file (reason is file recorded as removed in archive): [redacted file]
Removing file (reason is file recorded as removed in archive): [redacted file]

This all looks right! Now how about we compare the restore to the original source directory?


$ diff -durN $SOURCEDIR $RESTOREDIR

No changes – perfect.

We could instead do this restore via a single dar_manager command, though annoyingly, we’d have to pass all top-level files/directories to dar_manager –restore. But still, it’s one command, and basically automates and optimizes the dar restores shown above.

Conclusions

Dar makes it extremely easy to just Do The Right Thing when making archives. One command makes a backup. It saves things in simple files. You can make an isolated catalog if you want, and it too is saved in a simple file. You can query what is in the files and where. You can restore from all or part of the files. You can simply play the backups forward, in order, to achieve a full and consistent restore. Or you can load data about them into dar_manager for an optimized restore.

A bit of scripting will be necessary to make incrementals; finding the most recent backup or catalog. If backup files are named with care — for instance, by date — then this should be a pretty easy task.

I haven’t touched on resiliency yet. dar comes with tools for recovering archives that have had portions corrupted or lost. It can also rebuild the catalog if it is corrupted or lost. It adds “tape marks” (or “escape sequences”) to the archive along with the data stream. So every entry in the catalog is actually stored in the archive twice: once alongside the file data, and once at the end in the collected catalog. This allows dar to scan a corrupted file for the tape marks and reconstruct whatever is still intact, even if the catalog is lost. dar also integrates with tools like sha256sum and par2 to simplify archive integrity testing and restoration.

This balances against the need to use a tool (dar, optionally with a GUI frontend) to restore files. I’ll discuss that more in the next post.

17 June, 2023 01:16AM by John Goerzen

June 16, 2023

hackergotchi for Lisandro Damián Nicanor Pérez Meyer

Lisandro Damián Nicanor Pérez Meyer

Qt 6 in Debian bullseye, take 2

Bookworm has been released and Bullseye is now old-stable. Non the less today I took the time to update the Qt 6 backports so they are as close as Bookworm as possible. Except security fixes are needed these ought to be the latest uploads of Qt 6 to bullseye-backports.

Hope you enjoyed them, and thanks again The Qt Company and ICS for making this possible.

16 June, 2023 01:45PM by Lisandro Damián Nicanor Pérez Meyer

Russell Coker

BOINC and Idle Users

The BOINC distributed computing client in Debian (Bookworm and previous releases) can check the idle time via the X11 protocol and run GPU jobs when the interactive user is idle, so the user gets GPU power for graphics when they need it and when it’s idle BOINC uses it. This doesn’t work for Wayland and unfortunately no-one has written a Wayland equivalent of xprintidle (which shows the number of milliseconds that the X11 session has been idle in milliseconds.

In the Debian bug system there is bug #813728 about a message every second due to failed attempts to find X11 idle time [1]. On my main workstation with Wayland it logs “Authorization required, but no authorization protocol specified“.

There is also bug #775125 about BOINC not detecting mouse movements [2], I added to it about the issues with Wayland. There’s the package swayidle in Debian that is designed to manage the screen-save process on Wayland, below is an example of how to use it to display output on 5 seconds and 10 seconds of idle.

swayidle -w timeout 5 'echo 5' timeout 10 'echo 10' resume 'echo resume' before-sleep 'echo before-sleep'

The code for swayidle has only 7 comments and isn’t easy to read. I looked in to writing a Wayland equivalent of xprintidle but it would take more work than I’m prepared to invest in it. So it seems to me that the best option might be to have BOINC receive SIGUSR1 and SIGUSR2 for the start and stop of idle time and then have scripts call xprintidle, swayidle, a wrapper for “w” (for systems without graphics) or other methods. To run swayidle as root you can set WAYLAND_DISPLAY=../$USER_ID/wayland-0.

16 June, 2023 01:37PM by etbe

hackergotchi for Junichi Uekawa

Junichi Uekawa

Upgraded my main machines to bookworm.

Upgraded my main machines to bookworm. Things look relatively eventless. Nice. Emacs is noisy. why is native-comp-async-report-warnings-errors t?

16 June, 2023 01:14PM by Junichi Uekawa

John Goerzen

Using git-annex for Data Archiving

In my recent post about data archiving to removable media, I laid out the difference between backing up and archiving, and also said I’d evaluate git-annex and dar. This post evaluates git-annex. The next will look at dar, and then I’ll make a comparison post.

What is git-annex?

git-annex is a fantastic and versatile program that does… well, it’s one of those things that can do so much that it’s a bit hard to describe. Its homepage says:

git-annex allows managing large files with git, without storing the file contents in git. It can sync, backup, and archive your data, offline and online. Checksums and encryption keep your data safe and secure. Bring the power and distributed nature of git to bear on your large files with git-annex.

I think the particularly interesting features of git-annex aren’t actually included in that list. Among the features of git-annex that make it shine for this purpose, its location tracking is key. git-annex can know exactly which device has which file at which version at all times. Combined with its preferred content settings, this lets you very easily say things like:

  • “I want exactly 1 copy of every file to exist within the set of backup drives. Here’s a drive in that set; copy to it whatever needs to be copied to satisfy that requirement.”
  • “Now I have another set of backup drives. Periodically I will swap sets offsite. Copy whatever is needed to this drive in the second set, making sure that there is 1 copy of every file within this set as well, regardless of what’s in the first set.”
  • “Here’s a directory I want to use to track the status of everything else. I don’t want any copies at all here.”

git-annex can be set to allow a configurable amount of free space to remain on a device, and it will fill it up with whatever copies are necessary up until it hits that limit. Very convenient!

git-annex will store files in a folder structure that mirrors the origin folder structure, in plain files just as they were. This maximizes the ability for a future person to access the content, since it is all viewable without any special tool at all. Of course, for things like optical media, git-annex will essentially be creating what amounts to incrementals. To obtain a consistent copy of the original tree, you would still need to use git-annex to process (export) the archives.

git-annex challenges

In my prior post, I related some challenges with git-annex. The biggest of them – quite poor performance of the directory special remote when dealing with many files – has been resolved by Joey, git-annex’s author! That dramatically improves the git-annex use scenario here! The fixing commit is in the source tree but not yet in a release.

git-annex no doubt may still have performance challenges with repositories in the 100,000+-range, but in that order of magnitude it now looks usable. I’m not sure about 1,000,000-file repositories (I haven’t tested); there is a page about scalability.

A few other more minor challenges remain:

  • git-annex doesn’t really preserve POSIX attributes; for instance, permissions, symlink destinations, and timestamps are all not preserved. Of these, timestamps are the most important for my particular use case.
  • If your data set to archive contains Git repositories itself, these will not be included.

I worked around the timestamp issue by using the mtree-netbsd package in Debian. mtree writes out a summary of files and metadata in a tree, and can restore them. To save:

mtree -c -R nlink,uid,gid,mode -p /PATH/TO/REPO -X <(echo './.git') > /tmp/spec

And, after restoration, the timestamps can be applied with:

mtree -t -U -e < /tmp/spec

Walkthrough: initial setup

To use git-annex in this way, we have to do some setup. My general approach is this:

  • There is a source of data that lives outside git-annex. I'll call this $SOURCEDIR.
  • I'm going to name the directories holding my data $REPONAME.
  • There will be a "coordination" git-annex repo. It will hold metadata only, and no data. This will let us track where things live. I'll call it $METAREPO.
  • There will be drives. For this example, I'll call their mountpoints $DRIVE01 and $DRIVE02. For easy demonstration purposes, I used a ZFS dataset with a refquota set (to observe the size handling), but I could have as easily used a LVM volume, btrfs dataset, loopback filesystem, or USB drive. For optical discs, this would be a staging area or a UDF filesystem.

Let's get started! I've set all these shell variables appropriately for this example, and REPONAME to "testdata". We'll begin by setting up the metadata-only tracking repo.


$ REPONAME=testdata
$ mkdir "$METAREPO"
$ cd "$METAREPO"
$ git init
$ git config annex.thin true

There is a sort of complicated topic of how git-annex stores files in a repo, which varies depending on whether the data for the file is present in a given repo, and whether the file is locked or unlocked. Basically, the options I use here cause git-annex to mostly use hard links instead of symlinks or pointer files, for maximum compatibility with non-POSIX filesystems such as NTFS and UDF, which might be used on these devices. thin is part of that.

Let's continue:


$ git annex init 'local hub'
init local hub ok
(recording state in git...)
$ git annex wanted . "include=* and exclude=$REPONAME/*"
wanted . ok
(recording state in git...)

In a bit, we are going to import the source data under the directory named $REPONAME (here, testdata). The wanted command says: in this repository (represented by the bare dot), the files we want are matched by the rule that says eveyrthing except what's under $REPONAME. In other words, we don't want to make an unnecessary copy here.

Because I expect to use an mtree file as documented above, and it is not under $REPONAME/, it will be included. Let's just add it and tweak some things.


$ touch mtree
$ git annex add mtree
add mtree
ok
(recording state in git...)
$ git annex sync
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
commit
[main (root-commit) 6044742] git-annex in local hub
1 file changed, 1 insertion(+)
create mode 120000 mtree
ok
$ ls -l
total 9
lrwxrwxrwx 1 jgoerzen jgoerzen 178 Jun 15 22:31 mtree -> .git/annex/objects/pX/ZJ/...

OK! We've added a file, and it got transformed into a symlink. That's the thing I said we were going to avoid, so:


git annex adjust --unlock-present
adjust
Switched to branch 'adjusted/main(unlockpresent)'
ok
$ ls -l
total 1
-rw-r--r-- 2 jgoerzen jgoerzen 0 Jun 15 22:31 mtree

You'll notice it transformed into a hard link (nlinks=2) file. Great! Now let's import the source data. For that, we'll use the directory special remote.


$ git annex initremote source type=directory directory=$SOURCEDIR importtree=yes \
encryption=none
initremote source ok
(recording state in git...)
$ git annex enableremote source directory=$SOURCEDIR
enableremote source ok
(recording state in git...)
$ git config remote.source.annex-readonly true
$ git config annex.securehashesonly true
$ git config annex.genmetadata true
$ git config annex.diskreserve 100M
$ git config remote.source.annex-tracking-branch main:$REPONAME

OK, so here we created a new remote named "source". We enabled it, and set some configuration. Most notably, that last line causes files from "source" to be imported under $REPONAME/ as we wanted earlier. Now we're ready to scan the source.


$ git annex sync

At this point, you'll see git-annex computing a hash for every file in the source directory.

I can verify with du that my metadata-only repo only uses 14MB of disk space, while my source is around 4GB.

Now we can see what git-annex thinks about file locations:


$ git-annex whereis | less
whereis mtree (1 copy)
8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here]
ok
whereis testdata/[redacted] (0 copies)
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
failed
... many more lines ...

So remember we said we wanted mtree, but nothing under testdata, under this repo? That's exactly what we got. git-annex knows that the files under testdata can be found under the "source" special remote, but aren't in any git-annex repo -- yet. Now we'll start adding them.

Walkthrough: removable drives

I've set up two 500MB filesystems to represent removable drives. We'll see how git-annex works with them.


$ cd $DRIVE01
$ df -h .
Filesystem Size Used Avail Use% Mounted on
acrypt/no-backup/annexdrive01 500M 1.0M 499M 1% /acrypt/no-backup/annexdrive01
$ git clone $METAREPO
Cloning into 'testdata'...
done.
$ cd $REPONAME
$ git config annex.thin true
$ git annex init "test drive #1"
$ git annex adjust --hide-missing --unlock
adjust
Switched to branch 'adjusted/main(hidemissing-unlocked)'
ok
$ git annex sync

OK, that's the initial setup. Now let's enable the source remote and configure it the same way we did before:


$ git annex enableremote source directory=$SOURCEDIR
enableremote source ok
(recording state in git...)
$ git config remote.source.annex-readonly true
$ git config remote.source.annex-tracking-branch main:$REPONAME
$ git config annex.securehashesonly true
$ git config annex.genmetadata true
$ git config annex.diskreserve 100M

Now, we'll add the drive to a group called "driveset01" and configure what we want on it:


$ git annex group . driveset01
$ git annex wanted . '(not copies=driveset01:1)'

What this does is say: first of all, this drive is in a group named driveset01. Then, this drive wants any files for which there isn't already at least one copy in driveset01.

Now let's load up some files!


$ git annex sync --content

As the messages fly by from here, you'll see it mentioning that it got mtree, and then various files from "source" -- until, that is, the filesystem had less than 100MB free, at which point it complained of no space for the rest. Exactly like we wanted!

Now, we need to teach $METAREPO about $DRIVE01.


$ cd $METAREPO
$ git remote add drive01 $DRIVE01/$REPONAME
$ git annex sync drive01
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
commit
On branch adjusted/main(unlockpresent)
nothing to commit, working tree clean
ok
merge synced/main (Merging into main...)
Updating d1d9e53..817befc
Fast-forward
(Merging into adjusted branch...)
Updating 7ccc20b..861aa60
Fast-forward
ok
pull drive01
remote: Enumerating objects: 214, done.
remote: Counting objects: 100% (214/214), done.
remote: Compressing objects: 100% (95/95), done.
remote: Total 110 (delta 6), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (110/110), 13.01 KiB | 1.44 MiB/s, done.
Resolving deltas: 100% (6/6), completed with 6 local objects.
From /acrypt/no-backup/annexdrive01/testdata
* [new branch] adjusted/main(hidemissing-unlocked) -> drive01/adjusted/main(hidemissing-unlocked)
* [new branch] adjusted/main(unlockpresent) -> drive01/adjusted/main(unlockpresent)
* [new branch] git-annex -> drive01/git-annex
* [new branch] main -> drive01/main
* [new branch] synced/main -> drive01/synced/main
ok

OK! This step is important, because drive01 and drive02 (which we'll set up shortly) won't necessarily be able to reach each other directly, due to not being plugged in simultaneously. Our $METAREPO, however, will know all about where every file is, so that the "wanted" settings can be correctly resolved. Let's see what things look like now:


$ git annex whereis | less
whereis mtree (2 copies)
8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here]
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
ok
whereis testdata/[redacted] (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]

The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok

If I scroll down a bit, I'll see the files past the 400MB mark that didn't make it onto drive01. Let's add another example drive!

Walkthrough: Adding a second drive

The steps for $DRIVE02 are the same as we did before, just with drive02 instead of drive01, so I'll omit listing it all a second time. Now look at this excerpt from whereis:


whereis testdata/[redacted] (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]

The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
whereis testdata/[redacted] (1 copy)
c4540343-e3b5-4148-af46-3f612adda506 -- test drive #2 [drive02]

The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok

Look at that! Some files on drive01, some on drive02, some neither place. Perfect!

Walkthrough: Updates

So I've made some changes in the source directory: moved a file, added another, and deleted one. All of these were copied to drive01 above. How do we handle this?

First, we update the metadata repo:


$ cd $METAREPO
$ git annex sync
$ git annex dropunused all

OK, this has scanned $SOURCEDIR and noted changes. Let's see what whereis says:


$ git annex whereis | less
...
whereis testdata/cp (0 copies)
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
failed
whereis testdata/file01-unchanged (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]

The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok

So this looks right. The file I added was a copy of /bin/cp. I moved another file to one named file01-unchanged. Notice that it realized this was a rename and that the data still exists on drive01.

Well, let's update drive01.


$ cd $DRIVE01/$REPONAME
$ git annex sync --content

Looking at the testdata/ directory now, I see that file01-unchanged has been renamed, the deleted file is gone, but cp isn't yet here -- probably due to space issues; as it's new, it's undefined whether it or some other file would fill up free space. Let's work along a few more commands.


$ git annex get --auto
$ git annex drop --auto
$ git annex dropunused all

And now, let's make sure metarepo is updated with its state.


$ cd $METAREPO
$ git annex sync

We could do the same for drive02. This is how we would proceed with every update.

Walkthrough: Restoration

Now, we have bare files at reasonable locations in drive01 and drive02. But, to generate a consistent restore, we need to be able to actually do an export. Otherwise, we may have files with old names, duplicate files, etc. Let's assume that we lost our source and metadata repos and have to restore from scratch. We'll make a new $RESTOREDIR. We'll begin with drive01 since we used it most recently.


$ mv $METAREPO $METAREPO.disabled
$ mv $SOURCEDIR $SOURCEDIR.disabled
$ git clone $DRIVE01/$REPONAME $RESTOREDIR
$ cd $RESTOREDIR
$ git config annex.thin true
$ git annex init "restore"
$ git annex adjust --hide-missing --unlock

Now, we need to connect the drive01 and pull the files from it.


$ git remote add drive01 $DRIVE01/$REPONAME
$ git annex sync --content

Now, repeat with drive02:


$ git remote add drive02 $DRIVE02/$REPONAME
$ git annex sync --content

Now we've got all our content back! Here's what whereis looks like:


whereis testdata/file01-unchanged (3 copies)
3d663d0f-1a69-4943-8eb1-f4fe22dc4349 -- restore [here]
9e48387e-b096-400a-8555-a3caf5b70a64 -- source
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [origin]
ok
...

I was a little surprised that drive01 didn't seem to know what was on drive02. Perhaps that could have been remedied by adding more remotes there? I'm not entirely sure; I'd thought would have been able to do that automatically.

Conclusions

I think I have demonstrated two things:

First, git-annex is indeed an extremely powerful tool. I have only scratched the surface here. The location tracking is a neat feature, and being able to just access the data as plain files if all else fails is nice for future users.

Secondly, it is also a complex tool and difficult to get right for this purpose (I think much easier for some other purposes). For someone that doesn't live and breathe git-annex, it can be hard to get right. In fact, I'm not entirely sure I got it right here. Why didn't drive02 know what files were on drive01 and vice-versa? I don't know, and that reflects some kind of misunderstanding on my part about how metadata is synced; perhaps more care needs to be taken in restore, or done in a different order, than I proposed. I initially tried to do a restore by using git annex export to a directory special remote with exporttree=yes, but I couldn't ever get it to actually do anything, and I don't know why.

These two cut against each other. On the one hand, the raw accessibility of the data to someone with no computer skills is unmatched. On the other hand, I'm not certain I have the skill to always prepare the discs properly, or to do a proper consistent restore.

16 June, 2023 04:59AM by John Goerzen

Valhalla's Things

Shawl Calculations

Posted on June 16, 2023

I’ve just realized that I’m not anywhere close to finishing the shawl I’m knitting, so I’ve done the perfectly logical and rational thing and started a new one.

This one is using some yarn from the stash, so its size is limited by the available yarn, and I wanted to estimate how long it may be, so I weighted the ball of yarn at the beginning and then again after knitting 10 and 20 rows.

It’s a top-down crescent, with 6 increases every two rows (but these calculations should work for any uniform top-down shawl with a regular number of increases), so each block of 10 rows should use an aproximately fixed weight of yarn more than the previous block of 10 rows.

So, let w0 be the weight of the first block of rows, wr the (average) difference between two consecutive blocks and wT the total weight of the shawl. Then the weight used by block i should be wi = w0 + wr ⋅ i and the total weight of the shawl should be:


$$w_T = \sum_{i=0}^{N}w_i = w_0 + w_r ⋅ \frac{N ( N + 1)}{2}$$

where N is the number of blocks in the whole shawl.

This gives:


N2 + N + 2/Wr ⋅ (w0 − wT)

and the only positive solution will be:


$$N = \frac{-1 - \sqrt(1 - \frac{8}{w_r} (w_0 - w_T))}{2}$$

or, in a few lines of python that can be easily copypasted (changing the values in ws and w_T, of course):

import math
import statistics

w_T = 200
ws = [2, 4, 6]
w_r = statistics.mean(map(lambda x: x[0] - x[1], zip(ws[1:], ws)))
(-1 + math.sqrt(1 - 8 / w_r * (ws[0] - w_T))) / 2

Which right now (using the actual measured values) tells me I will have about 140 rows in my shawl, but I’d really want to do a few more blocks of 10 rows and have more datapoints before I trust the numbers I’ve put in.

Which means that this shawl will also take forever.

16 June, 2023 12:00AM

Shawl Calculations

Posted on June 16, 2023

Update 2023-06-17: I had missed an N in the formulas, they have been updated, and since I was editing this I’ve added the haskell bit.

I’ve just realized that I’m not anywhere close to finishing the shawl I’m knitting, so I’ve done the perfectly logical and rational thing and started a new one.

This one is using some yarn from the stash, so its size is limited by the available yarn, and I wanted to estimate how long it may be, so I weighted the ball of yarn at the beginning and then again after knitting 10 and 20 rows.

It’s a top-down crescent, with 6 increases every two rows (but these calculations should work for any uniform top-down shawl with a regular number of increases), so each block of 10 rows should use an approximately fixed weight of yarn more than the previous block of 10 rows.

So, let w0 be the weight of the first block of rows, wr the (average) difference between two consecutive blocks and wT the total weight of the shawl. Then the weight used by block i should be wi = w0 + wr ⋅ i and the total weight of the shawl should be:


$$w_T = \sum_{i=0}^{N}w_i = N ⋅ w_0 + w_r ⋅ \frac{N ( N + 1)}{2}$$

where N is the number of blocks in the whole shawl.

This gives:


N2 + (1 + 2 ⋅ w0/wr) ⋅ N − 2 * wT/wr = 0

and the only positive solution will be:


$$N = - 1/2 - w_0/w_r + \sqrt(1/4 + w_0^2/w_r^2 - w_0/w_r + 2 ⋅ w_T/w_r)$$

or, in a few lines of python that can be easily copypasted (changing the values in ws and w_T, of course):

import math
import statistics

w_T = 200
ws = [2, 4, 6]
w_r = statistics.mean(map(lambda x: x[0] - x[1], zip(ws[1:], ws)))
-1/2 - ws[0] / w_r + math.sqrt(1/4 + ws[0]**2 / w_r**2 - ws[0]/w_r + 2 * w_T / w_r)

Or, in Haskell:

let ws = [2, 4, 6]
let w_T = 200
let w_0 = head ws
let w_r = ( sum (map (\(x,y) -> y-x) (zip ws (drop 1 ws))) ) / (fromIntegral (length ws - 1))
-1/2 - w_0 / w_r + sqrt (1/4 + (w_0/w_r)**2 - w_0/w_r + 2 * w_T / w_r)

Which right now (using the actual measured values) tells me I will have about 135 rows in my shawl, but I’d really want to do a few more blocks of 10 rows and have more datapoints before I trust the numbers I’ve put in.

Which means that this shawl will also take forever.

16 June, 2023 12:00AM

June 15, 2023

hackergotchi for Shirish Agarwal

Shirish Agarwal

Ayisha, Manju Warrier, Debutsav, Books

Ayisha

After a long time I saw a movie that I enjoyed wholeheartedly. And it unexpectedly touched my heart. The name of the movie is Ayisha. The first frame of the movie itself sets the pace where we see Ayisha (Manju Warrier) who decides to help out a gang as lot of women were being hassled. So she agrees to hoodwink cops and help launder some money. Then she is shown to work as a maid for an elite Arab family. To portray a Muslim character in these polarized times really shows guts especially when the othering of the Muslim has been happening 24×7. In fact, just few days back I was shocked to learn that Muslim homes were being marked as Jews homes had been marked in the 1930’s. Not just homes but also businesses too. And after few days in a total hypocritical fashion one of the judges says that you cannot push people to buy or not buy from a shop. This is after systemically doing the whole hate campaign for almost 2 weeks. What value the judge’s statements are after 2 weeks ??? The poison has already seeped in 😦 But I’m drifting from the topic/movie.

The real fun of the movie is the beautiful relationship that happens between Ayisha and Mama, she is the biggest maternal figure in the house and in fact, her command is what goes in the house. The house or ‘palace’ which is the perfect description is shown as being opulent but not as rich as both Mama and Ayisha are, spiritually and emotionally both giving and sharing of each other. Almost a mother daughter relationship, although with others she is shown as having a bit of an iron hand. Halfway through the movie we come to know that Ayisha was also a dramatist and an actress having worked in early Malayalam movies. I do not want to go through all the ups and downs as that is the beauty of the movie and it needs to be seen for that aspect. I am always sort of in two worlds where should I promote a book or series or movie or not because most of the time it is the unexpected that works. When we have expectation it doesn’t. Avatar, the Way of the Water is an exception, not many movies I can recall like that where I had expectations and still the movie surpassed it. So maybe go with no expectations at all 😉

Manju Warrier

Manju Warrier should actually be called ‘Manju Warrior’ as she chose to be with the survivor rather than the sexism that is prevalent in the Malayalam film industry which actually is more or less a mirror of Bollywood and society as whole. These three links should give enough background knowledge as to what has been happening although I’m sure my Malayalam friends would more than add to that knowledge whatever may be missing. In quite a few movies, the women are making inroads without significant male strength. Especially Manju’s movies have no male lead for the last few movies. Whether that is deliberate part on Manju or an obstacle being put in front of her. Anyone knows that having a male lead and a female lead enriches the value of a movie quite a bit. This doesn’t mean one is better than the other but having both enriches the end product, as simple as that. This is sadly not happening. Having POSH training and having an ICC is something that each organization should look forward for. It’s kind of mandatory need of hour, especially when we have young people all around us. I am hopeful that people who are from Kerala would shed some more background light on what has been happening.

Books

I haven’t yet submitted an application for Debconf. But my idea is irrespective of whether or not I’m there, I do hope we can have a library where people can donate books and people can take away books as well. A kind of circular marketplace/library where just somebody notes what books are available. Even if 100 odd people are coming to Debconf that easily means 100 books of various languages. That in itself would be interesting and to see what people are reading, wanting to discuss etc. We could even have readings. IIRC, in 2016 we had a children’s area, maybe we could do some readings from some books to children which fuels their imagination. Even people like me who are deaf would be willing to look at excerpts and be charmed by them. For instance, in all my forays of fantasy literature except for Babylon Steel I haven’t read one book that has a female lead character and I have read probably around 100 odd fantasy books till date. Not a lot but still to my mind, is a big gap as far as literature is concerned. How would more women write fantasy if they don’t have heroes to look forward to :(. Or maybe I may be missing some authors and characters that others know and I do not. Do others feel the same or this question hasn’t even been asked ??? Dunno. Please let me know.

Debutsav

So apparently Debutsav is happening 2 days from now. While I did come to know about it few days back I had to think whether I want to apply for this or apply for Debconf as I physically, emotionally can’t do justice to both even though they are a few months apart. I wish all the best for the attendees as well as presenters sharing all the projects and hopefully somebody shares at least some of the projects that are presented there so we may know what new projects or softwares to follow or whatever. Till later.

15 June, 2023 06:37PM by shirishag75

hackergotchi for Thomas Lange

Thomas Lange

20.000 customized images created by the FAI.me build service

The counter of the FAI.me build service has reached 20.000. This counter was added shortly after the service was started in November 2017. Since then, this service has built more than 21.000 installation images and more than 1300 cloud disk images. In the last few month we had averaged 100 requests per week.

Some statistics which settings are popular:

  • Language/keyboard layout selected

    12000 us
    4000 de
    2500 fr
    800 gb
    500 es
    300 ru
    300 cn
    200 pt

  • Desktop environments selected

    12000 NONE (without any desktop)
    5000 GNOME
    1800 XFCE
    800 KDE
    700 CINNAMON
    700 MATE
    500 LXDE

  • In April 2023, support for building your own Ubuntu installation ISO was added. Since then, 200 Ubuntu ISOs has been created.

  • Packages that are often added: tmux screen apt-transport-https build-essential sudo net-tools mc git wget htop vim curl

  • A postinst script was provided more that 1500 times even though it was not added until 2021.

  • Packages from backports were used 4000 times.

I still have some more ideas for the future: Build your own custom Live ISO

Thanks for all your feedback I got to improve this service.

The build service is available on the FAI project website at https://fai-project.org/FAIme

15 June, 2023 05:21PM

hackergotchi for Jonathan Dowland

Jonathan Dowland

containers as first-class network citizens

I've moved to having containers be first-class citizens on my home network, so any local machine (laptop, phone,tablet) can communicate directly with them all, but they're not (by default) exposed to the wider Internet. Here's why, and how.

After I moved containers from docker to Podman and systemd, it became much more convenient to run web apps on my home server, but the default approach to networking (each container gets an address on a private network between the host server and containers) meant tedious work (maintaining and reconfiguring a HTTP reverse proxy) to make them reachable by other devices. A more attractive arrangement would be if each container received an IP from the range used by my home LAN, and were automatically addressable from any device on it.

To make the containers first-class citizens on my home LAN, first I needed to configure a Linux network bridge and attach the host machine's interface to it (I've done that many times before); then define a new Podman network, of type "bridge". podman-network-create (1) serves as reference, but the blog post Exposing Podman containers fully on the network is an easier read (skip past the macvlan bit).

I've opted to choose IP addresses for each container by hand. The Podman network is narrowly defined to a range of IPs that are within the subnet that my ISP-provided router uses, but outside the range of IPs that it allocates.

When I start up a container by hand for the first time, I choose a free IP from the sub-range by hand and add a line to /etc/avahi/hosts on the parent machine, e.g.

192.168.1.33 octoprint.local

I then start the container specifying that address, e.g.

podman run --rm -d --name octoprint \
        ...
        --network bridge_local --ip 192.168.1.33 \
        octoprint/octoprint

I can now access that container from any device in my house (laptop, phone, tablet...) via octoprint.local.

What's next

Although it's not a huge burden, it would be nice to not need to statically define the addresses in /etc/avahi/hosts (perhaps via "IPAM"). I've also been looking at WireGuard (which should be the subject of a future blog post) and combining this with that would be worthwhile.

15 June, 2023 02:11PM

June 14, 2023

hackergotchi for Jonathan Carter

Jonathan Carter

CLUG Talk: Running Debian on a 100Gbps router

Last night I attended the first local Linux User Group talk since before the pandemic (possibly even… long before the pandemic!)

Topic: How and why Atomic Access runs Debian on a 100Gbps router

Speaker: Joe Botha

This is the first time CLUG used Woodstock Brewery as a venue. It’s great, because now we can have snacks and beer during the talks :)

Joe has worked in the internet space for quite some time, and co-founded companies like Teraco, Frogfoot, Amobia, Octotel and Atomic Access. Through all of these he’s done interesting and noteworthy work, which I’ve only seen some glimpses of before in the few moments we’ve interacted at CLUG events.

It was nice seeing a lot more detail of a project that I wouldn’t even know about if he didn’t give this talk. It doesn’t seem that anyone else is running Debian on big switches for commercial ISPs. He goes through these great lengths to run Debian so that he can have a decent set of tools and familiar commands on the switch, as apposed to the (my word here) crappy tooling that you would get on the brand name switches.

By total coincidence, David Plonka happened to be at the brewery too, he’s a network expert who works at Akamai. He didn’t know this talk was taking place, so this was a fun happenstance, he had some good inputs during the talk too. He also bought everyone a round of beer, thanks David!

I asked Joe for his slides and I’ll share them here when I get them. Unfortunately we don’t have video for this talk, but I asked Joe to consider coming to DebConf23, I think this topic would be really interesting to the wider Debian crowd. By the way, both registration and the call for proposals are now officially open for DebConf23, it’s taking place in September in Kochi, India this year.

14 June, 2023 04:41PM by jonathan

Russell Coker

Do Not Use …

When I connect my Desklab USB-C monitor [1] (which has been vastly underused for the last 3 years) into a Linux system the display type is listed as “DO NOT USE – RTK“.

One of the more informative discussions of this was on Linux Mint forums [2] which revealed that it’s a mapping for an code that shouldn’t be used. So it’s not saying “don’t use this monitor” it’s saying “don’t use this code”. So the Desklab people when they implemented a display with an RTK chipset should have changed the ID field from “RTK” to something representing their use. On Debian the file /usr/share/hwdata/pnp.ids has the IDs and you can grep for RTK in that.

Also for programmers, please use more descriptive strings than “do not use”, when I was trying to find this on Debian code search [3] it turned up hundreds of pages of results which was more than a human can read through. If the text had been something that would make sense to a user such as “OEM please replace with company name” it would have made it very clear to me (and all the other people searching for this) what it meant and the fact that Desklab had stuffed up. So instead of wondering about this for years before eventually finding the right Google search to find the answer I could have worked it out immediately if the text had been clearer.

14 June, 2023 02:25PM by etbe

Sven Hoexter

htop on stage in the theatre

Always amusing to see some more or less famous open source tools on stage or in movies. Lately we watched THE ME (german only) which is mixing live playing of actors and pre recorded video material. In one of the early video sequences a fictional console interface is displayed, claiming to be running on a Macbook, and htop is used to look for a suspicious process.

14 June, 2023 09:28AM

hackergotchi for Freexian Collaborators

Freexian Collaborators

Monthly report about Debian Long Term Support, May 2023 (by Roberto C. Sánchez)

Like each month, have a look at the work funded by Freexian’s Debian LTS offering.

Debian LTS contributors

In May, 18 contributors have been paid to work on Debian LTS, their reports are available:

  • Abhijith PA did 6.0h (out of 6.0h assigned and 8.0h from previous period), thus carrying over 8.0h to the next month.
  • Anton Gladky did 6.0h (out of 8.0h assigned and 7.0h from previous period), thus carrying over 9.0h to the next month.
  • Bastien Roucariès did 17.0h (out of 17.0h assigned and 3.0h from previous period), thus carrying over 3.0h to the next month.
  • Ben Hutchings did 17.0h (out of 16.0h assigned and 8.0h from previous period), thus carrying over 7.0h to the next month.
  • Chris Lamb did 18.0h (out of 18.0h assigned).
  • Daniel Leidert did 0.0h (out of 0h assigned and 12.0h from previous period), thus carrying over 12.0h to the next month.
  • Dominik George did 0.0h (out of 0h assigned and 20.34h from previous period), thus carrying over 20.34h to the next month.
  • Emilio Pozuelo Monfort did 32.0h (out of 18.5h assigned and 16.0h from previous period), thus carrying over 2.5h to the next month.
  • Guilhem Moulin did 20.0h (out of 8.5h assigned and 11.5h from previous period).
  • Holger Levsen did 0.0h (out of 0h assigned and 10.0h from previous period), thus carrying over 10.0h to the next month.
  • Lee Garrett did 0.0h (out of 0h assigned and 40.5h from previous period), thus carrying over 40.5h to the next month.
  • Markus Koschany did 34.5h (out of 34.5h assigned).
  • Roberto C. Sánchez did 18.25h (out of 20.5h assigned and 11.5h from previous period), thus carrying over 13.75h to the next month.
  • Scarlett Moore did 20.0h (out of 20.0h assigned).
  • Sylvain Beucler did 34.5h (out of 29.0h assigned and 5.5h from previous period).
  • Thorsten Alteholz did 14.0h (out of 14.0h assigned).
  • Tobias Frost did 16.0h (out of 15.0h assigned and 1.0h from previous period).
  • Utkarsh Gupta did 5.5h (out of 5.0h assigned and 26.0h from previous period), thus carrying over 25.5h to the next month.

Evolution of the situation

In May, we have released 34 DLAs.

Several of the DLAs constituted notable security updates to LTS during the month of May. Of particular note were the linux (4.19) and linux-5.10 packages, both of which addressed a considerable number of CVEs. Additionally, the postgresql-11 package was updated by synchronizing it with the 11.20 release from upstream.

Notable non-security updates were made to the distro-info-data database and the timezone database. The distro-info-data package was updated with the final expected release date of Debian 12, made aware of Debian 14 and Ubuntu 23.10, and was updated with the latest EOL dates for Ubuntu releases. The tzdata and libdatetime-timezone-perl packages were updated with the 2023c timezone database. The changes in these packages ensure that in addition to the latest security updates LTS users also have the latest information concerning Debian and Ubuntu support windows, as well as the latest timezone data for accurate worldwide timekeeping.

LTS contributor Anton implemented an improvement to the Debian Security Tracker “Unfixed vulnerabilities in unstable without a filed bug” view, allowing for more effective management of CVEs which do not yet have a corresponding bug entry in the Debian BTS.

LTS contributor Sylvain concluded an audit of obsolete packages still supported in LTS to ensure that new CVEs are properly associated. In this case, a package being obsolete means that it is no longer associated with a Debian release for which the Debian Security Team has direct responsibility. When this occurs, it is the responsibility of the LTS team to ensure that incoming CVEs are properly associated to packages which exist only in LTS.

Finally, LTS contributors also contributed several updates to packages in unstable/testing/stable to fix CVEs. This helps package maintainers, addresses CVEs in current and future Debian releases, and ensures that the CVEs do not remain open for an extended period of time only for the LTS team to be required to deal with them much later in the future.

Thanks to our sponsors

Sponsors that joined recently are in bold.

14 June, 2023 12:00AM by Roberto C. Sánchez

June 13, 2023

hackergotchi for Matt Brown

Matt Brown

Ventilation Monitoring Market Research

Over the last month I’ve performed some market research to better understand the potential for co2mon.nz and to help me decide whether the product I’ve built has a fit with the market or not. The key conclusions I’ve drawn from this work are:

  • Air quality is acknowledged as important, but monitoring it is not an urgent or pressing problem for most people.
  • Most of the value is seen in the hardware rather than the software service.

Keep reading to hear more about the results that lead to those conclusions.

Survey

The first piece of research I undertook was a survey covering three topics: views on indoor air quality, how respondents currently monitor indoor air quality and the desired features, including price, for a CO2 monitor.

The survey was distributed to my extended personal network via social media, email and word of mouth. I offered respondents the opportunity to win a year of free monitoring as an incentive and received just under 70 responses overall - the lucky winner of that prize was Sam H of Auckland whose shiny new CO2 monitor will be in the mail shortly.

Views on indoor air quality

  • Nearly all respondents strongly agreed that clean, fresh indoor air is important for avoiding sickness and enabling our best work, learning and general cognitive performance, with not a single negative response.
  • 25% of respondents indicated they did not have a good understanding of the quality of the indoor air they were breathing versus 43% who indicated they had a good understanding of their indoor air quality.
  • Nearly 70% of respondents agreed (and greater than 40% strongly agreed) that real-time monitoring is beneficial and worth investing time and money in providing, with a similar distribution of responses agreeing it should be required in all shared indoor spaces.

Current ventilation monitoring approaches

  • For the home setting, using our senses was the most common method of understanding air quality, and only 6% of respondents were unhappy with their ability to monitor ventilation at home.
  • At work, trusting the owner of the building to monitor ventilation was the most common method, although using our senses and some personally collected data also featured for 20% of respondents. While the majority of respondents saw some room for improvement here, less than 20% of respondents were unsatisfied with the ability to monitor ventilation at work.
  • In shared public spaces using our senses and trusting the owner were equally popular with very little use of any data reported. The majority of respondents (40%) were unsatisfied with this situation with 34% seeing some room for improvement and very few being satisfied overall.

CO2 monitoring product features

  • A screen and WiFi were both strongly supported features with less than 10% of respondents seeing them as irrelevant and a large majority of answers skewing towards essential.
  • Coloured lights providing a quick indication were not viewed as important by 13% of respondents and while the majority of answers were towards essential there was also a large (22%) set of respondents who were indifferent to this feature.
  • The ability to access measurements and reports via a web interface was very mixed. Around 20% of respondents reported the feature as irrelevant, 20% essential with the majority seeing it as useful but mot essential.
  • Almost all respondents strongly indicated that additional air quality metrics beyond CO2 were important to collect.
  • Respondents mostly indicated the proposed prices are too high (64%), with essentially no responses suggesting they were too low and the balance (43%) in the middle. Only 5% of respondents indicated a preference for a rental option over a straight purchase.

Advertising

In parallel with the survey, I worked with my cousin who runs a marketing agency, The Asset, to place some Facebook ads aiming to systematically evaluate what combination of images and text would draw the best response. It’s been an interesting process - despite working for Google for 15 years, I know relatively little about the day to day practice of online advertising!

I think we’re about 50% of the way through that process of systematically building a funnel of traffic, it’s been a steep learning curve and its clear there’s significantly more thought and time that would need to be invested into this were it to be the primary driver of sales for a business. It’s interested to see how what resonates or doesn’t resonate with the audience is often completely different to what I expect, confirming the importance of having a process to evaluate and tweak how the advertising runs.

After just under 2 weeks of advertising with a daily budget in the $20 - $30 range, my ads have had just under 17k impressions by 10k distinct people resulting in 76 visits to the co2mon.nz website, and zero sales. The ads themselves received 233 clicks, so there’s clearly a lot of room for further improvement and revision of the ad text itself to present a more compelling message. Unfortunately the most common response and feedback to the ads themselves has been comments arguing that CO2 is wonderful, climate change is invented and all our problems would be solved if we had more CO2 everywhere. Tedious to deal with, but also useful reminder about awareness and interest in the problem to contrast with the results from the survey of my extended personal network!

Feedback from other conversations

In addition to the survey and advertising I’ve had conversations with some local air conditioning and ventilation businesses as well as a commercial building management firm - all providing similar feedback to the results from the survey - acknowledgement that air quality is important and relatively immaturely measured currently, but low urgency or pain to change or remedy that situation.

Another interesting point that’s come up in conversations with various small business owners is what to do if or when the monitoring shows a ventilation problem? The obvious answer of opening the windows more does not seem to be particularly well received. Without a compelling solution to offer to the potential problem that the monitoring might reveal I often sense a reluctance from people to invest too much time and money in something which may create a problem in a space they don’t currently see as urgent.

Conclusions

The responses are interesting and surprising to me a in a few ways (no interest in rental, favouring web interface over app), but at the end of the day lead to the two conclusions described above:

Air quality is acknowledged as important, but monitoring it is not an urgent or pressing problem for most people.

At home and work the majority of people are OK with relying on their senses or trusting someone else to maintain ventilation. They wouldn’t object to improvements, but the feedback is that ventilation monitoring is not a problem people are actively looking to solve.

The number of people who do see this as an urgent enough problem to invest money into solving is low - even within the biased sample of my extended network. There is a stronger set of evidence for the problem being seen as more urgent by the users of shared public spaces - but I’ve not been able to find any evidence that the owners and managers of those spaces feel the same urgency or duty of care towards their users to invest in this space.

Most of the opportunity is in the hardware rather than the software service.

This signal comes through in the feedback on the pricing (preferring outright purchase vs rental), but it’s also been directly expressed in the free-form comments and other conversations I’ve had and the the relative importance given to the physical product features over the web/app interfaces in the survey results.

Wrap Up

I’m glad I finally spent the time doing this research, particularly the survey, these are good lessons to learn, even if I should have taken the time to learn them a year ago - so I can write that reminder (do your research before building a product) down as a key outcome of this process too!

Stay tuned for more details on the other work I’ve been doing recently on the hardware side of co2mon.nz and what these results mean for my overall plans. As always, I’d love to hear from you if these results give you ideas or questions you’d like to discuss.

13 June, 2023 11:49PM

June 12, 2023

hackergotchi for Bits from Debian

Bits from Debian

Registration and the Call for Proposals for DebConf23 are now open!

For DebConf23, we're pleased to announce opening of registration and call for proposal. Following is the info text -


Registration and the Call for Proposals for DebConf23 are now open. The 24th edition of the Debian annual conference will be held from September 10th to September 17th, 2023, in Infopark, Kochi, India. The main conference will be preceded by DebCamp, which will take place from September 3rd to September 9th, 2023.

The registration form can be accessed by creating an account on the DebConf23 website and clicking on "register" in the profile section. The number of attendees is capped at 300 this year. All registrations will be reviewed by bursary team, and completing the registration form does not guarantee attendance.

As always, basic registration for DebConf is free of charge for attendees. If you are attending the conference in a professional capacity or as a representative of your company, we kindly ask that you consider registering in one of our paid categories to help cover the costs of organizing the conference and to support subsidizing other community members.

The last day to register with guaranteed swag is 5th August.

We also encourage eligible individuals to apply for a diversity bursary. Travel, food, and accommodation bursaries are available. More details can be found on the bursary info page.

The last day to apply for a bursary is 1st July. Applicants should receive feedback on their bursary application by 16th July.

The call for proposals for talks, discussions and other activities is also open. To submit a proposal you need to create an account on the website, and then use the "Submit Talk" button in the profile section. The last day to submit and have your proposal be considered for the main conference schedule, with video coverage guaranteed, is 13th August.

DebConf23 is also accepting sponsors. Interested companies and organizations may contact the DebConf team through sponsors@debconf.org or visit the DebConf23 website.

12 June, 2023 04:17PM by Sahil Dhiman

hackergotchi for Matthew Palmer

Matthew Palmer

Private Key Redaction: Redux

[Note: the original version of this post named the author of the referenced blog post, and the tone of my writing could be construed to be mocking or otherwise belittling them. While that was not my intention, I recognise that was a possible interpretation, and I have revised this post to remove identifying information and try to neutralise the tone. On the other hand, I have kept the identifying details of the domain involved, as there are entirely legitimate security concerns that result from the issues discussed in this post.]

I have spoken before about why it is tricky to redact private keys. Although that post demonstrated a real-world, presumably-used-in-the-wild private key, I’ve been made aware of commentary along the lines of this representative sample:

I find it hard to believe that anyone would take their actual production key and redact it for documentation. Does the author have evidence of this in practice, or did they see example keys and assume they were redacted production keys?

Well, buckle up, because today’s post is another real-world case study, with rather higher stakes than the previous example.

When Helping Hurts

Today’s case study begins with someone who attempted to do a very good thing: they wrote a blog post about using HashiCorp Vault to store certificates and their private keys. In his post, they included some “test” data, a certificate and a private key, which they redacted.

Unfortunately, they did not redact these very well. Each base64 “blob” has had one line replaced with all xs. Based on the steps I explained previously, it is relatively straightforward to retrieve the entire, intact private key.

From Bad to OMFG

Now, if this post author had, say, generated a fresh private key (after all, there’s no shortage of possible keys), that would not be worthy of a blog post. As you may surmise, that is not what happened.

After reconstructing the insufficiently-redacted private key, you end up with a key that has a SHA256 fingerprint (in hex) of:

72bef096997ec59a671d540d75bd1926363b2097eb9fe10220b2654b1f665b54

Searching for certificates which use that key fingerprint, we find one result: a certificate for hiltonhotels.jp (and a bunch of other, related, domains, as subjectAltNames). As of the time of writing, that certificate is not marked as revoked, and appears to be the same certificate that is currently presented to visitors of that site.

This is, shall we say, not great.

Anyone in possession of this private key – which, I should emphasise, has presumably been public information since the post’s publication date of February 2023 – has the ability to completely transparently impersonate the sites listed in that certificate. That would provide an attacker with the ability to capture any data a user entered, such as personal information, passwords, or payment details, and also modify what the user’s browser received, including injecting malware or other unpleasantness.

In short, no good deed goes unpunished, and this attempt to educate the world at large about the benefits of secure key storage has instead published private key material. Remember, kids: friends don’t let friends post redacted private keys to the Internet.

12 June, 2023 12:00AM by Matt Palmer (mpalmer@hezmatt.org)

June 11, 2023

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

sanitizers 0.1.1 on CRAN: Updated and Expanded

bleach

The second release of the sanitizers package is now on CRAN. sanitizers provides ‘true positives’ for programming errors detected by Address Sanitizers and friends. This permits validation of the setup when chasing such bug reports: it allows us to ascertain that the compiler (and instrumented R version) are correctly set up and the errors we expect to be reported are in fact reported.

Almost nine years (!!) since the first release, this update brings an added integer overflow sanitizer contributed by Greg Jeffries so long ago that I had thought it was part of the CRAN releases—my bad for delaying this. It also updates programming practices by switching to symbol registration for the compiled functions. And of course several R packaging best practices have improved since the initial release so we updated a few small things throughout.

A very good resources for all things sanitizers is the Google repo at GitHub and especially its wiki.

The brief NEWS entry follows.

Changes in version 0.1.1 (2023-06-11)

  • Added integer overflow example kindly contributed by Greg Jeffries

  • Added continuous integration and badges

  • Updated package to use symbol registration for compiled code

  • Updated and edited DESCRIPTION, README and help pages for current packaging standards

  • Expanded README with usage example via r-devel-san Rocker container

Courtesy of my CRANberries, there is a diffstat report for this release. See the project page, the github repo, and the package documentation for more details.

If you like the open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

11 June, 2023 06:35PM

hackergotchi for Michael Prokop

Michael Prokop

What to expect from Debian/bookworm #newinbookworm

Bookworm Banner, Copyright 2022 Juliette Taka

Debian v12 with codename bookworm was released as new stable release on 10th of June 2023. Similar to what we had with #newinbullseye and previous releases, now it’s time for #newinbookworm!

I was the driving force at several of my customers to be well prepared for bookworm. As usual with major upgrades, there are some things to be aware of, and hereby I’m starting my public notes on bookworm that might be worth also for other folks. My focus is primarily on server systems and looking at things from a sysadmin perspective.

Further readings

As usual start at the official Debian release notes, make sure to especially go through What’s new in Debian 12 + Issues to be aware of for bookworm.

Package versions

As a starting point, let’s look at some selected packages and their versions in bullseye vs. bookworm as of 2023-02-10 (mainly having amd64 in mind):

Package bullseye/v11 bookworm/v12
ansible 2.10.7 2.14.3
apache 2.4.56 2.4.57
apt 2.2.4 2.6.1
bash 5.1 5.2.15
ceph 14.2.21 16.2.11
docker 20.10.5 20.10.24
dovecot 2.3.13 2.3.19
dpkg 1.20.12 1.21.22
emacs 27.1 28.2
gcc 10.2.1 12.2.0
git 2.30.2 2.39.2
golang 1.15 1.19
libc 2.31 2.36
linux kernel 5.10 6.1
llvm 11.0 14.0
lxc 4.0.6 5.0.2
mariadb 10.5 10.11
nginx 1.18.0 1.22.1
nodejs 12.22 18.13
openjdk 11.0.18 + 17.0.6 17.0.6
openssh 8.4p1 9.2p1
openssl 1.1.1n 3.0.8-1
perl 5.32.1 5.36.0
php 7.4+76 8.2+93
podman 3.0.1 4.3.1
postfix 3.5.18 3.7.5
postgres 13 15
puppet 5.5.22 7.23.0
python2 2.7.18 – (gone!)
python3 3.9.2 3.11.2
qemu/kvm 5.2 7.2
ruby 2.7+2 3.1
rust 1.48.0 1.63.0
samba 4.13.13 4.17.8
systemd 247.3 252.6
unattended-upgrades 2.8 2.9.1
util-linux 2.36.1 2.38.1
vagrant 2.2.14 2.3.4
vim 8.2.2434 9.0.1378
zsh 5.8 5.9

Linux Kernel

The bookworm release ships a Linux kernel based on version 6.1, whereas bullseye shipped kernel 5.10. As usual there are plenty of changes in the kernel area, including better hardware support, and this might warrant a separate blog entry, but to highlight some changes:

See Kernelnewbies.org for further changes between kernel versions.

Configuration management

puppet‘s upstream sadly still doesn’t provide packages for bookworm (see PA-4995), though Debian provides puppet-agent and puppetserver packages, and even puppetdb is back again, see release notes for further information.

ansible is also available and made it with version 2.14 into bookworm.

Prometheus stack

Prometheus server was updated from v2.24.1 to v2.42.0 and all the exporters that got shipped with bullseye are still around (in more recent versions of course).

Virtualization

docker (v20.10.24), ganeti (v3.0.2-3), libvirt (v9.0.0-4), lxc (v5.0.2-1), podman (v4.3.1), openstack (Zed), qemu/kvm (v7.2), xen (v4.17.1) are all still around.

Vagrant is available in version 2.3.4, also Vagrant upstream provides their packages for bookworm already.

If you’re relying on VirtualBox, be aware that upstream doesn’t provide packages for bookworm yet (see ticket 21524), but thankfully version 7.0.8-dfsg-2 is available from Debian/unstable (as of 2023-06-10) (VirtualBox isn’t shipped with stable releases since quite some time due to lack of cooperation from upstream on security support for older releases, see #794466).

rsync

rsync was updated from v3.2.3 to v3.2.7, and we got a few new options:

  • --fsync: fsync every written file
  • --old-dirs: works like –dirs when talking to old rsync
  • --old-args: disable the modern arg-protection idiom
  • --secluded-args, -s: use the protocol to safely send the args (replaces –protect-args option)
  • --trust-sender: trust the remote sender’s file list

OpenSSH

OpenSSH was updated from v8.4p1 to v9.2p1, so if you’re interested in all the changes, check out the release notes between those version (8.5, 8.6, 8.7, 8.8, 8.9, 9.0, 9.1 + 9.2). Let’s highlight some notable new features:

  • new system for restricting forwarding and use of keys added to ssh-agent(1), see SSH agent restriction for details)
  • switched scp(1) from using the legacy scp/rcp protocol to using the SFTP protocol by default (see release notes for v9.0 for details
  • ssh(1): when prompting the user to accept a new hostkey, display any other host names/addresses already associated with the key
  • ssh(1): allow UserKnownHostsFile=none to indicate that no known_hosts file should be used to identify host keys
  • ssh(1): add a ssh_config KnownHostsCommand option that allows the client to obtain known_hosts data from a command in addition to the usual files
  • ssh(1), sshd(8): add a RequiredRSASize directive to set a minimum RSA key length
  • ssh(1): add a “host” line to the output of ssh -G showing the original hostname argument
  • ssh-keygen -A (generate all default host key types) will no longer generate DSA keys
  • ssh-keyscan(1): allow scanning of complete CIDR address ranges, e.g. ssh-keyscan 192.168.0.0/24

One important change you might wanna be aware of is that as of OpenSSH v8.8, RSA signatures using the SHA-1 hash algorithm got disabled by default, but RSA/SHA-256/512 AKA RSA-SHA2 gets used instead. OpenSSH has supported RFC8332 RSA/SHA-256/512 signatures since release 7.2 and existing ssh-rsa keys will automatically use the stronger algorithm where possible. A good overview is also available at SSH: Signature Algorithm ssh-rsa Error.

Now tools/libraries not supporting RSA-SHA2 fail to connect to OpenSSH as present in bookworm. For example python3-paramiko v2.7.2-1 as present in bullseye doesn’t support RSA-SHA2. It tries to connect using the deprecated RSA-SHA-1, which is no longer offered by default with OpenSSH as present in bookworm, and then fails. Support for RSA/SHA-256/512 signatures in Paramiko was requested e.g. at #1734, and eventually got added to Paramiko and in the end the change made it into Paramiko versions >=2.9.0. Paramiko in bookworm works fine, and a backport by rebuilding the python3-paramiko package from bookworm for bullseye solves the problem (BTDT).

Misc unsorted

  • new non-free-firmware component/repository (see Debian Wiki for details)
  • support only the merged-usr root filesystem layout (see Debian Wiki for details)
  • the asterisk package didn’t make it into bookworm (see #1031046)
  • e2fsprogs: the breaking change related to metadata_csum_seed and orphan_file (see #1031325) was reverted with v1.47.0-2 for bookworm (also see #1031622 + #1030939)
  • rsnapshot is back again (see #986709)
  • crmadmin of pacemaker no longer interprets the timeout option (-t/–timeout) in milliseconds (as it used to be until v2.0.5), but as of v2.1.0 (and v2.1.5 is present in bookworm) it now interprets the argument as second by default

Thanks to everyone involved in the release, happy upgrading to bookworm, and let’s continue with working towards Debian/trixie. :)

11 June, 2023 09:50AM by mika

hackergotchi for Thomas Lange

Thomas Lange

New FAI ISO images for bookworm available and FAI Live ISO

After Debian 12 aka bookworm was released yesterday, I've also created new FAI ISO images using Debian 12.

The defaut ISO (large) uses FAI 6.0.3, kernel 6.1 and can install the XFCE and GNOME desktop without internet connection, since all needed packages are included into the ISO. Additional you can install Ubuntu 22.04 or Rocky Linux 9 with this FAI ISO. During these installations, the packages will be downloade via network. There's also the variant FAI ISO UBUNTU, which includes all Ubuntu packages needed for a Ubuntu server or Ubuntu desktop installation.

If you need a small image, you can take the FAI ISO small, which only includes the packages for a XFCE desktop without LibreOffice. This ISO is only 880MB in size.

Currently I'm working on a new feature, so FAI can create Live images, that are bootable. It's like the tool live-build which Debian uses for their official Debian Live images. A first verison of the ISO using the XFCE desktop can be downloaded from

https://fai-project.org/fai-cd

There you also find all other FAI ISOs.

11 June, 2023 09:30AM

Petter Reinholdtsen

What did I learn from OpenSnitch this summer?

With yesterdays release of Debian 12 Bookworm, I am happy to know the the interactive application firewall OpenSnitch is available for a wider audience. I have been running it for a few weeks now, and have been surprised about some of the programs connecting to the Internet. Some programs are obviously calling out from my machine, like the NTP network based clock adjusting system and Tor to reach other Tor clients, but others were more dubious. For example, the KDE Window manager try to look up the host name in DNS, for no apparent reason, but if this lookup is blocked the KDE desktop get periodically stuck when I use it. Another surprise was how much Firefox call home directly to mozilla.com, mozilla.net and googleapis.com, to mention a few, when I visit other web pages. This direct connection happen even if I told Firefox to always use a proxy, and the proxy setting is ignored for this traffic. Other surprising connections come from audacity and dirmngr (I do not use Gnome). It took some trial and error to get a good default set of permissions. Without it, I would get popups asking for permissions at any time, also the most inconvenient ones where I am in the middle of a time sensitive gaming session.

I suspect some application developers should rethink when then need to use network connections or DNS lookups, and recommend testing OpenSnitch (only apt install opensnitch away in Debian Bookworm) to locate and report any surprising Internet connections on your desktop machine.

At the moment the upstream developer and Debian package maintainer is working on making the system more reliable in Debian, by enabling the eBPF kernel module to track processes and connections instead of depending in content in /proc/. This should enter unstable fairly soon.

As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

11 June, 2023 06:30AM

June 10, 2023

hackergotchi for Bits from Debian

Bits from Debian

Debian 12 "bookworm" has been released!

Alt Bookworm has been released

We're happy to announce the release of Debian 12, codenamed bookworm!

Want to install it? Choose your favourite installation media and read the installation manual. You can also use an official cloud image directly on your cloud provider, or try Debian prior to installing it using our "live" images.

Already a happy Debian user and you only want to upgrade? You can easily upgrade from your current Debian 11 "bullseye" installation; please read the release notes.

Do you want to celebrate the release? We provide some bookworm artwork that you can share or use as base for your own creations. Follow the conversation about bookworm in social media via the #ReleasingDebianBookworm and #Debian11Bookworm hashtags or join an in-person or online Release Party!

10 June, 2023 09:30PM by Ana Guerrero Lopez, Laura Arjona Reina and Jean-Pierre Giraud

Andrew Cater

202306101949 - Release of install media - scripts running now

People are working quietly, cross-checking, reading back steps and running individual steps - we're really almost there for the install media.

Just had a friendly, humorous meal out by the barbeque in Sledge's garden. It's been quite a long day but we're just finished.

All this and then we'll probably have the first point release for Bookworm 12.1 in about a month. That will contain some few fixes which came in at the last minute and any other issues we've found today.

BOOKWORM IS HERE!!

10 June, 2023 07:55PM by Andrew Cater (noreply@blogger.com)

hackergotchi for Marco d'Itri

Marco d'Itri

On having a track record in operating systems development

Now that Debian 12 has been released with proprietary firmwares on the official media, non-optional merged-/usr and systemd adopted by everybody, I want to take a moment to list, not without some pride, a few things that I was right about over the last 20 years:

  • Distribution of proprietary firmwares (#33, #40, #114)
  • udev
  • systemd (#454)
  • merged-/usr

Accepting the obvious solution about firmwares took 18 years. My work on the merged-/usr transition started in 2014, and the first discussions about replacing sysvinit are from 2011. The general adoption of udev (and dynamic device names, and persistent network interface names...) took less time in comparison and no large-scale flame wars, since people could enable it at their own pace. But it required countless little debates in the Debian Bug Tracking System: I still remember the people insisting that they would never use this newfangled dynamic /dev/, or complaining about their beloved /dev/cdrom symbolic link and persistent network interface names.

So follow me for more rants about inevitable technologies.

10 June, 2023 05:00PM

hackergotchi for Steinar H. Gunderson

Steinar H. Gunderson

plocate 1.1.19 released

I've released version 1.1.19 of plocate; this was mostly to get compatibility with liburing 2.4 out the door. The fix (an external contribution; thanks!) had lingered in git for a while, but evidently, onw it's reached distributions and more people were starting to notice.

On a related notice, the user base seems to be growing, and also changing a bit. I usually say that as an open-source maintainer, what you want isn't users; you want patches. Users generally come with questions and bugs, and you don't really care that much about the latter because they didn't affect you and you feel like you have an obligation to fix it.

In the early days of plocate, what I'd get was indeed patches; not that many and none large, but the ones that I'd get would be high-quality fixes by way of git-send-email that I could just apply and get on with my day. This is the ideal situation. But as time now goes by, I tend to get more and more requests and/or complaints and/or people who haven't actually read the manual; e.g., a common case is how btrfs subvolumes as configured on some distributions trips up updatedb's bind mount detection (because btrfs subvolumes are indeed implemented by bind mounts, and since you can use subvolumes for pretty much everything, it's impossible to know which ones should be included and which ones should not). Another increasingly common case is people who have unusual local setups that they want large plocate changes for. Some of them go away when I try to explain that implementing their changes are nontrivial (sometimes highly so), some don't. I interpret this as plocate slowly reaching mass market status, to the degree any Linux command-line tool can be said to have that.

plocate has long been a “finished” product; it does basically what mlocate does, just much faster, and that's it. I don't intend to do more with it except fixes, unless I hit on something that really annoys me again, and I don't really think a lot of things need to be done within that framework either. So congrats, I'm the owner of a dead program that only now gets users, who discovered that the product exists way too late to actually influence its direction. It's nice to be an open-source maintainer =)

10 June, 2023 03:15PM

Andrew Cater

202306101353 - Release testing of media in full swing

 Most of the install images for Debian media have now been tested.

Various folk are now testing the live media.

We have been joined by a couple of people in IRC who have also done a few tests.

Useful things to note :)

The release name is Bookworm *not* Bookwork.

Debian 13 will be Trixie when it gets here: testing will be re-enabled shortly.

The release notes detail the changes in /etc/apt/sources.list to accommodate the changes to non-free-firmware but also see also Sources List on the Debian wiki.


10 June, 2023 02:01PM by Andrew Cater (noreply@blogger.com)

202306101010 - Debian release preparations and boot media testing in Cambridge

 We've all met up in Cambridge - so there's an egw_, amacater, kibi who has travelled over to join us, Isy, RattusRattus and Sledge mostly sat round a table. The usual number of laptops, three monitors, Rattus' tower machine.

Network running well and we're all ready to go, I think - there's normally a flurry of activity to get things started then a wait for a while for the first images

Coffee and tea at the ready - bacon sandwiches are on the way

[And the build process is under way - and smcv has joined us]

10 June, 2023 11:09AM by Andrew Cater (noreply@blogger.com)

hackergotchi for Freexian Collaborators

Freexian Collaborators

Debian Contributions: /usr-merge updates, tox 4 transition, and more! (by Utkarsh Gupta, Stefano Rivera)

Contributing to Debian is part of Freexian’s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

/usr-merge, by Helmut Grohne, et al

Towards the end of April, the discussion on DEP 17 on debian-devel@l.d.o initiated by Helmut Grohne took off, trying to deal with the fact that while Debian bookworm has a merged /usr, files are still being distributed to / and /usr in Debian binary packages, and moving them currently has some risk of breakage. Most participants of the discussion agreed that files should be moved, and there are several competing design ideas for doing it safely.

Most of the time was spent understanding the practical implications of lifting the moratorium and moving all the files from / to /usr in a coordinated effort. With help from Emilio Pozuelo Monfort, Enrico Zini, and Raphael Hertzog, Helmut Grohne performed extensive analysis of the various aspects, including quantitative analysis of the original file move problem, analysis of effects on dpkg-divert, dpkg-statoverride, and update-alternatives, analysis of effects on filesystem bootstrapping tools. Most of the problematic cases spawned plausible workarounds, such as turning Breaks into Conflicts in selected cases or adding protective diversions for the symbolic links that enable aliasing.

Towards the end of May, Andreas Beckmann reported a new failure scenario which may cause shared resources to inadvertently disappear, such as directories and even regular files in case of Multi-Arch packages, and our work on analyzing these problems and proposing mitigations is on-going.

While the quantitative analysis is funded by Freexian, we wouldn’t be here without the extensive feedback and ideas of many voluntary contributors from multiple areas of Debian, which are too many to name here. Thank you.

Preparing for the tox 4 transition, by Stefano Rivera

While Debian was in freeze for the bookworm release, tox 4 has landed in Debian experimental, and some packages are starting to require it, upstream. It has some backwards-incompatible behavior that breaks many packages using tox through pybuild. So Stefano had to make some changes to pybuild and to many packages that run build-time tests with tox. The easy bits of this transition are now completed in git / experimental, but a few packages that integrate deeply into tox need upstream work.

Debian Printing, by Thorsten Alteholz

Just before the release of Bookworm, lots of QA tools were used to inspect packages. One of these tools found a systemd service file in a wrong directory. So, Thorsten did another upload of package lprint to correct this.

Thanks a lot to all the hardworking people who run such tools and file bugs.

Thorsten also participated in discussions about the new Common Printing Dialog Backends (CPDB) that will be introduced in Trixie and hopefully can replace the current printing architecture in Forky.

Miscellaneous contributions

  • DebConf 23 preparations by Stefano Rivera. Some work on the website, video team planning, accounting, and team documentation.
  • Utkarsh Gupta started to prep the work on the bursary team’s side for DC23.
  • Stefano spun up a website for the Hamburg mini-DebConf so that the video team could have a machine-readable schedule and a place to stream video from the event.
  • Santiago Ruano Rincón reviewed and sponsored four python packages of a prospective Debian member.
  • Helmut Grohne supported Timo Roehling and Jochen Sprickerhof to improve cross building in 15 ROS packages.
  • Helmut Grohne supported Jochen Sprickerhof with diagnosing an e2fsprogs RC bug.
  • Helmut Grohne continued to maintain rebootstrap and located an issue with lto in gcc-13.
  • Anton Gladky fixed some RC-Bugs and uploaded a new stravalib python library.

10 June, 2023 12:00AM by Utkarsh Gupta, Stefano Rivera

June 09, 2023

hackergotchi for Jonathan Carter

Jonathan Carter

Phone upgraded to Debian 12

A long time ago, before the pandemic, I bought a Librem 5 phone from Purism. I also moved home since then, and sadly my phone was sleeping peacefully in a box in the garage since I moved.

When I was in Hamburg last month, I saw how great Mobian and Phosh was coming along, and this inspired me to go dig up the Librem 5 which was about 2 Debian releases behind, and upgrade it to the latest and greatest version.

I followed the instruction on the Debian wiki, and after some stumbles, managed to flash it with the latest Mobian image:

It’s still a big bulky phone, but Phosh has really come a long way and the phone feels so much more responsive and usable now. It’s also the first time I tried out the new GNOME Console app (I hope they consider taking some features from JuiceSSH so that it’s easier to run apps like mc and irssi on the phone).

Next up I want to try out some progressive web apps and also check what the latest state of emulating Android apps is. I’ve also been meaning to follow some GTK tutorials, and trying out some ideas on a mobile device motivates me a bit more to do that.

It’s really impressive the large amount of work people put into making Debian and a mobile GNOME experience work so well on a phone! Good job to everyone who contributes to this eco-system!

09 June, 2023 05:17PM by jonathan

hackergotchi for Wouter Verhelst

Wouter Verhelst

Planet Debian rendered with PtLink

As I blogged before, I've been working on a Planet Venus replacement. This is necessary, because Planet Venus, unfortunately, has not been maintained for a long time, and is a Python 2 (only) application which has never been updated to Python 3.

Python not being my language of choice, and my having plans to do far more than just the "render RSS streams" functionality that Planet Venus does, meant that I preferred to write "something else" (in Perl) rather than updating Planet Venus to modern Python.

Planet Grep has been running PtLink for over a year now, and my plan had been to update the code so that Planet Debian could run it too, but that has been taking a bit longer.

This month, I have finally been able to work on this, however. This screenshot shows two versions of Planet Debian:

The rendering on the left is by Planet Venus, the one on the right is by PtLink.

It's not quite ready yet, but getting there.

Stay tuned.

09 June, 2023 07:52AM

June 08, 2023

hackergotchi for Lisandro Damián Nicanor Pérez Meyer

Lisandro Damián Nicanor Pérez Meyer

Adventures in Debian's Qt land

Debian (I might as well say "we", this is the beauty of it) is about to release Debian 12 aka Bookworm. Let's take a quick look at what is new in Debian Qt land.

Qt 5

Bookworm has Qt 5.15.8, which is nothing but great news. KDE will be switching to Qt 6 sooner than later and Qt 5 has been a fun ride, but Dmitry Shachnev and I needed a break, or at very least not handling two Qt versions. But in the end I need to be fair: you REALLY need to thank Dmitry for Qt 5. He has been the man power behind it in 99.5% of the cases.

Qt 6

This will be the first Debian release to have official Qt 6 packages. NOTHING would have happened if it weren't for Patrick "Delta-One" Franz standing up to maintain it. BIG kudos to him!

Well, there is a "little lie" in the paragraph above. Thanks to The Qt Company and ICS the current Qt 6 version, 6.4.2, is also available as Bullseye's backports. The Qt Company really also helped us here by providing us almost-to-be-released tarballs of Qt 6.4.2 so we were able to push them to unstable and do a transition in time for freeze, thanks a lot for that!

So, what is the Qt 6 state?

At the binary side all but OpenGL ES support should be there. Sadly this was discovered too late in the release process and we still might need help maintaining it (read the link to know why!).

We are still not building the documentation. Properly building the whole documentation, as with Qt 5, would require all the Qt submodules' source code in one place, which we can't (easily?) do in Debian. So building the doc means hacking the build system and getting semi-linked documentation, much like with Qt 5. Now if you think you have an idea to solve this... we are happy to hear from you!

Another great thing to know about Qt 6 is that, thanks to Helmut Grohne, pure Qt 6 applications should be able to cross compile. Applications using multi-arch enabled libraries ought to work too. Even more, many Qt submodules themselves should also cross compile! Not all of them, as we missed some patches in time, but hey, if you need to cross compile Qt, you surely can apply them yourselves!

And finally tests, unit tests. In Qt 5 we had some of those, but none yet in Qt 6. This is one of the areas I would love to be able to put time... but time is scarce.

The future?

In my point of view the Debian 13 "Trixie" development cycle will see Qt 5 diminishing it's usage and Qt 6 becoming the major Qt version used, but from the Qt 4 experience I do not expect Qt 5 being dropped during this release cycle... let's see what the future brings us.

Thanks!

While I mentioned Dmitry and Patrick many more people helped us reach this place. I personally want to thank the people behind the KDE software, both upstream and, of course, the Debian maintainers. You should be thankful with them too, many hours of effort go into this.

And thanks to you our dear users. We are normally overflowed with what we have in our hands and might not be up to the task sometimes, but hey, you are part of the reason we are doing this!

08 June, 2023 03:00AM by Lisandro Damián Nicanor Pérez Meyer

June 06, 2023

hackergotchi for Steinar H. Gunderson

Steinar H. Gunderson

Faster accurate range reduction

Pretty much everybody who learns geometry and trigonometry starts off by representing angles in degrees. One circle is 360 degrees; it's a bit of a strange number, but it's highly composite (so a lot of fractions can be represented as whole numbers) and has traditions supposedly going back to the Babylonians.

However, there's nothing “natural” about 360; you could just as well use e.g. 3600 and nothing would really get easier or harder. When going into trigonometry, thus eventually most people will end up into radians, where one circle is 2π radians. This does have a natural justification; angles are represented by their arc lengths on the unit circle, and using radians for the argument means the Taylor series of sin(x) becomes x - x³/3! + x⁵/5! - …, which is pretty much as nice as it gets. (Both facts independently mean that sin(x) ≈ x for small angles, whereas for degrees, it would mean we'd get sin(x) ≈ πx/180; hardly as elegant!)

However, when implementing the sin() function in programming languages, we have no need for elegance. What we want is precision and performance, and here, surprisingly, radians turns out to be the wrong choice. Most serious implementations don't use Taylor series anyway, so the simplicity of the formulas is irrelevant; usually, they use some variation of Remez (perhaps with some tweaks) to get optimal accuracy on some range such as [0,π/2] (I know, there's an irony in that I just specified it using radians), and then symmetries for the rest. And to do that, we need range reduction.

Range reduction is the presumably simple act of subtracting the right whole multiplies of 2π (or possibly whole multiplies of π/2, depending a bit on your implementation) so that we get into a simple case. And this is a pain in floating-point. Why? For the simple, obvious reason that π isn't representable as a floating-point number (no matter what precision you're working in); it's trancendental. So if you try to just do a modulo with M_PI, you'll be doing range reduction by the wrong amount, and for large inputs, it will go completely awry.

For instance (let's work with floats to keep the number of needed decimals smaller; doubles work very much the same), say you want to range-reduce the number 6.2831897735595703125 (the closest float to 6.28319). Modulo 2π and converted to the nearest float, this is roughly 4.29153e-06. But modulo 2*M_PI, it is instead roughly 4.46638e-06! So we missed by 4% before we've even started applying our series. And for larger numbers, it becomes much worse. For instance, for 100000002004087734272 (the closest float to 1e20), the correct range-reduced number modulo 2π is 0.716271, while modulo 2*M_PI, it is… 4.29637. This is not an inaccuracy, it's not even in the right ballpark.

You could argue that nobody needs sin() of such values anyway, but those writing libraries can't really rely on that. So they painfully go and write multiprecision arithmetic for range-reduction, doing fast math for easy cases and using hundreds of hex digits of π for the ones that are harder (e.g. when you're on a very large number, or close to a multiple of π). It's a giant pain, and it's slow.

But… if we used degrees instead, this problem would go away completely! Range-reducing a float by 360.0 is easy; it's a whole number. And the polynomial in the end doesn't care much; you'll need constants on every term anyway, so some scaling of π/180 can be baked into those and don't matter much. Even better would probably be turns, where one full circle = 1, but it's surprising to me that the “less advanced” unit of degrees does so much better here.

It seems to me that the tradition for sin() taking radians is very old. The sin() function in C goes at least back to 4.3BSD (1986), and is probably even older than that; I find it in FORTRAN 77, for instance. (Curiously enough, GNU Fortran has a nonstandard and deprecated SIND(), which takes in degrees, “for compatibility reasons”.) And indeed, 4.3BSD seems to do incorrect (non-exact) range reduction, showing that they didn't think or care much about these issues.

Can we fix this? Probably not. Unless you happen to have your numbers in turns or degrees already, in which case you can do your own range reduction before converting it to radians, giving your libc an easier time. It might be worthwhile. Sometimes.

06 June, 2023 09:14PM

Russell Coker

PinePhonePro First Impression

Hardware

I received my PinePhone Pro [1] on Thursday, it seems in many ways better than the Purism Librem 5 [2] that I have previously written about. The PinePhone is thinner, lighter, and yet has a much longer battery life. A friend described the Librem5 as “the CyberTruck phone” and not in a good way.

In a test I had my PinePhone and my Librem5 fully charged, left them for 4.5 hours without doing anything much with them, and then the PinePhone was at 85% and the Librem5 was at 57%. So the Librem5 will run out of battery after about 10 hours of not being used while a PinePhonePro can be expected to last about 30 hours. The PinePhonePro isn’t as good as some of the recent Android phones in this regard but it shows the potential to be quite usable. For this test both phones were connected to a 2.4GHz Wifi network (which uses less power than 5GHz) and doing nothing much with an out of the box configuration. A phone that is checking email, social networking, and a couple of IM services will use the battery faster. But even if the PinePhone has it’s battery used twice as fast in a more realistic test that will still be usable.

Here are the passmark results from the PinePhone Pro [3] which got a CPU score of 888 compared to 507 for the Librem 5 and 678 for one of the slower laptops I’ve used. The results are excluded from the Passmark averages because they identified the CPU as only having 4 cores (expecting just 4*A72) while the PinePhonePro has 6 cores (2*A72+4*A53). This phone definitely has the CPU power for convergence [4]!

Default OS

By default the PinePhone has a KDE based GUI and the Librem5 has a GNOME based GUI. I don’t like any iteration of GNOME (I have tried them all and disliked them all) and I like KDE so I will tend to like anything that is KDE based more than anything GNOME based. But in addition to that the PinePhone has an interface that looks a lot like Android with the three on-screen buttons at the bottom of the display and the way it has the slide up tray for installed apps. Android is the most popular phone OS and looking like the most common option is often a good idea for a new and different product, this seems like an objective criteria to determine that the default GUI on the PinePhone is a better choice (at least for the default).

When I first booted it and connected it to Wifi the updates app said that there were 633 updates to apply, but never applied them (I tried clicking on the update button but to no avail) and didn’t give any error message. For me not being Debian is enough reason to dislike Manjaro, but if that wasn’t enough then the failure to update would be a good start. When I ran pacman in a terminal window it said that each package was corrupt and asked if I wanted to delete it. According to “tar tvJf” the packages weren’t corrupt. After downloading them again it said that they were corrupt again so it seemed that pacman wasn’t working correctly.

When the screen is locked and a call comes in it gives a window with Accept and Reject buttons but neither of them works. The default country code for “Spacebar” (the SMS app) is +1 (US) even though I specified Australia on the initial login. It also doesn’t get the APN unlike Android phones which seem to have some sort of list of APNs.

Upgrading to Debian

The Debian Wiki page about Installing on the PinePhone Pro has the basic information [5]. The first thing it covers is installing the TOW boot loader – which is already installed by default in recent PinePhones (such as mine). You can recognise that TOW is installed by pressing the volume-up button in the early stages of boot up (described as “before and during the second vibration”), then the LED will turn blue and the phone will act as a USB mass storage device which makes it easy to do other install/recovery tasks. The other TOW option is to press volume-down to boot from a MicroSD card (the default is to boot the OS on the eMMC).

The images linked from the Debian wiki page are designed to be installed with bmaptool from the bmap-tools Debian package. After installing that package and downloading the pre-built Mobian image I installed it with the command “bmaptool copy mobian-pinephonepro-phosh-bookworm-12.0-rc3.img.gz /dev/sdb” where /dev/sdb is the device that the USB mapped PinePhone storage was located. That took 6 minutes and then I rebooted my PinePhone into Mobian!

Unfortunately the default GUI for Mobian is GNOME/Phosh. Changing it to KDE is my next task.

06 June, 2023 01:24PM by etbe

Dell 32″ 4K Monitor and DisplayPort Switch

After determining that the Philips 43″ monitor was too large for my taste as well as not having a clear enough display [1] I bought a Dell 32″ 4K monitor for $499 on the 1st of July 2022. That monitor has been working nicely for almost a year now, for DisplayPort it’s operation is perfect and 32″ seems like an ideal size for my use. There is one problem that both HDMI ports will sometimes turn off for about half a second, I’ve tested on both ports and on multiple computers as well as a dock and it gives the same result so it’s definitely the monitor. The problem for me is that the most casual inspection won’t reveal the problem and the monitor is large and difficult to transport as I’ve thrown out the box. If I had this sort of problem with a monitor at work I’d add it to the list of things for Dell to fix next time they visit the office or use one of the many monitor boxes available to ship it back to them. But for home use it’s more of a problem for me. The easiest solution is to avoid HDMI.

A year ago I blogged about using DDC to switch monitor inputs [2], I had that running with a cheap USB switch since then to allow a workstation and a laptop to share the same monitor, keyboard, and mouse. Recently I got a USB-C dock that allows a USB-C laptop to talk to a display via DisplayPort as opposed to the HDMI connector that’s built in. But my Dell monitor only has one DisplayPort input.

So I have just bought a DisplayPort and USB KVM switch via eBay for $52, a reasonable price given that last year such things were well over $100. It has ports for 3 USB devices which is better than my previous setup of a USB switch with only a single port that I used with a 3 port hub for my keyboard and mouse.

the DisplayPort switch is described as doing 4K at 60Hz, I don’t know how it will perform with a 5K monitor, maybe it will work at 30Hz or 40Hz. But currently Dell 5K monitors are at $2,500 and 6K monitors are about $3,800 so I don’t plan to get one of them any time soon.

06 June, 2023 08:41AM by etbe

hackergotchi for Shirish Agarwal

Shirish Agarwal

Odisha Train Crash and Coverup, Demonetization 2.0 & NHFS-6 Survey

Just a few days back we came to know about the horrific Train Crash that happened in Odisha (Orissa). There are some things that are known and somethings that can be inferred by observance. Sadly, it seems the incident is going to be covered up 😦 . Some of the facts that have not been contested in the public domain are that there were three lines. One loop line on which the Goods Train was standing and there was an up and a down line. So three lines were there. Apparently, the signalling system and the inter-locking system had issues as highlighted by an official about a month back. That letter, thankfully is in the public domain and I have downloaded it as well. It’s a letter that goes to 4 pages. The RW is incensed that the letter got leaked and is in public domain. They are blaming everyone and espousing conspiracy theories rather than taking the minister to task. Incidentally, the Minister has three ministries that he currently holds. Ministry of Communication, Ministry of Electronics and Information Technology (MEIT), and Railways Ministry. Each Ministry in itself is important and has revenues of more than 6 lakh crore rupees. How he is able to do justice to all the three ministries is beyond me 😦

The other thing is funds both for safety and relaying of tracks has been either not sanctioned or unutilized. In fact, CAG and the Railway Brass had shared how derailments have increased and unfulfilled vacancies but they were given no importance 😦 In fact, not talking about safety in the recently held ‘Chintan Shivir’ (brainstorming session) tells you how much the Govt. is serious about safety. In fact, most of the programme was on high speed rail which is a white elephant. I have shared a whitepaper done by RW in the U.S. that tells how high-speed rail doesn’t make economic sense. And that is an economy that is 20 times + the Indian Economy. Even the Chinese are stopping with HSR as it doesn’t make economic sense.

Incidentally, Air Fares again went up 200% yesterday. Somebody shared in the region of 20k + for an Air ticket from their place to Bangalore 😦

Coming back to the story itself. the Goods Train was on the loopline. Some say it was a little bit on the outer, some say otherwise, but it is established that it was on the loopline. This is standard behavior on and around Railway Stations around the world. Whether it was in the Inner or Outer doesn’t make much of a difference with what happened next. The first train that collided with the goods train was the 12864 (SMVB-HWH) Yashwantpur Howrah Express and got derailed on to the next track where from the opposite direction 12841 (Shalimar- Bangalore) Coramandel Express was coming. Now they have said that around 300 people have died and that seems to be part of the cover-up. Both the trains are long trains, having between 23 odd coaches each. Even if you have reserved tickets you have 80 odd people in a coach and usually in most of these trains, it is at least double of that. Lot of money goes to TC and then above (Corruption). The Railway fares have gone up enormously but that’s a question for perhaps another time 😦 . So at the very least, we could be looking at more than 1000 people having died. The numbers are being under-reported so that nobody has to take responsibility. The Railways itself has told that it is unable to identify 80% of the people who have died. This means that 80% were unreserved ticket holders or a majority of them. There have been disturbing images as how bodies have been flung over on tractors and whatnot to be either buried or cremated without a thought. We are in peak summer season so bodies will start to rot within 24-48 hours 😦 No arrangements made to cool the bodies and take some information and identifying marks or whatever. The whole thing being done in a very callous manner, not giving dignity to even those who have died for no fault of their own. The dissent note also tells that a cover-up is also in the picture. Apparently, India doesn’t have nor does it feel to have a need for something like the NTSB that the U.S. used when it hauled both the plane manufacturer (Boeing) and the FAA when the 737 Max went down due to improper data collection and sharing of data with pilots. And with no accountability being fixed to Minister or any of the senior staff, a small junior staff person may be fired. Perhaps the same official that actually told them about the signal failures almost 3 months back 😦

There were and are also some reports that some ‘jugaadu’/temporary fixes were applied to signalling and inter-locking just before this incident happened. I do not know nor confirm one way or the other if the above happened. I can however point out that if such a thing happened, then usually a traffic block is announced and all traffic on those lines are stopped. This has been the thing I know for decades. Traveling between Mumbai and Pune multiple times over the years am aware about traffic block. If some repair work was going on and it wasn’t able to complete the work within the time-frame then that may well have contributed to the accident. There is also a bit muddying of the waters where it is being said that one of the trains was 4 hours late, which one is conflicting stories.

On top of the whole thing, they have put the case to be investigated by CBI and hinting at sabotage. They also tried to paint a religious structure as mosque, later turned out to be a temple. The RW says done by Muslims as it was Friday not taking into account as shared before that most Railway maintenance works are usually done between Friday – Monday. This is a practice followed not just in India but world over.

There has been also move over a decade to remove wooden sleepers and have concrete sleepers. Unlike the wooden ones they do not expand and contract as much and their life is much more longer than the wooden ones. Funds had been marked (although lower than last few years) but not yet spent. As we know in case of any accident, it is when all the holes in cheese line up it happens. Fukushima is a great example of that, no sea wall even though Japan is no stranger to Tsunamis. External power at the same level as the plant. (10 meters above sea-level), no training for cascading failures scenarios which is what happened. The Days mini-series shares some but not all the faults that happened at Fukushima and the Govt. response to it. There is a difference though, the Japanese Prime Minister resigned on moral grounds. Here, nor the PM, nor the Minister would be resigning on moral grounds or otherwise :(. Zero accountability and that was partly a natural disaster, here it’s man-made. In fact, both the Minister and the Prime Minister arrived with their entourages, did a PR blitzkrieg showing how concerned they are. Within 50 hours, the lines were cleared. The part-time Railway Minister shared that he knows the root cause and then few hours later has given the case to CBI. All are saying, wait for the inquiry report. To date, none of the accidents even in this Govt. has produced an investigation report. And even if it did, I am sure it will whitewash as it did in case of Adani as I had shared before in the previous blog post. Incidentally, it is reported that Adani paid off some of its debt, but when questioned as to where they got the money, complete silence on that part :(. As can be seen cover-up after cover-up 😦

FWIW, the Coramandel Express is known as the Migrant train so has a huge number of passengers, the other one which was collided with is known as ‘sick train’ as huge number of cancer patients use it to travel to Chennai and come back 😦

Demonetization 2.0

Few days back, India announced demonetization 2.0. Surprised, don’t be. Apparently, INR 2k/- is being used for corruption and Mr. Modi is unhappy about it. He actually didn’t like the INR 2k/- note but was told that it was needed, who told him we are unaware to date. At that time the RBI Governor was Mr. Urjit Patel who didn’t say about INR 2k/- he had said that INR 1k/- note redesigned would come in the market. That has yet to happen. What has happened is that just like INR 500/- and INR 1k/- note is concerned, RBI will no longer honor the INR 2k/- note. Obviously, this has made our neighbors angry, namely Nepal, Sri Lanka, Bhutan etc. who do some trading with us. 2 Deccan herald columns share the limelight on it. Apparently, India wants to be the world’s currency reserve but doesn’t want to play by the rules for everyone else. It was pointed out that both the U.S. and Singapore had retired their currencies but they will honor that promise even today. The Singapore example being a bit closer (as it’s in Asia) is perhaps a bit more relevant than the U.S. one. Singapore retired the SGD $10,000 as of 2014 but even in 2022, it remains as legal tender. They also retired the SGD $1,000 in 2020 but still remains legal tender.

So let’s have a fictitious example to illustrate what is meant by what Singapore has done. Let’s say I go to Singapore, rent a flat, and find a $1000 note in that house somewhere. Both practically and theoretically, I could go down to any of the banks, get the amount transferred to my wallet, bank account etc. and nobody will question. Because they have promised the same. Interestingly, the Singapore Dollar has been pretty resilient against the USD for quite a number of years vis-a-vis other Asian currencies.

Most of the INR 2k/- notes were also found and exchanged in Gujarat in just a few days (The PM and HM’s state.). I am sure you are looking into the mental gymnastics that the RW indulge in :(. What is sadder that most of the people who try to defend can’t make sense one way or the other and start to name-call and get personal as they have nothing else 😦

Disability questions dropped in NHFS-6

Just came to know today that in the upcoming National Family Health Survey-6 disability questions are being dropped. Why is this important. To put it simply, if you don’t have numbers, you won’t and can’t make policies for them. India is one of the worst countries to live if you are disabled. The easiest way to share to draw attention is most Railway platforms are not at level with people. Just as Mick Lynch shares in the UK, the same is pretty much true for India too. Meanwhile in Europe, they do make an effort to be level so even disabled people have some dignity. If your public transport is sorted, then people would want much more and you will be obligated to provide for them as they are citizens. Here, we have had many reports of women being sexually molested when being transferred from platform to coach irrespective of their age or whatnot 😦 The main takeaway is if you do not have their voice, you won’t make policies for them. They won’t go away but you will make life hell for them. One thing to keep in mind that most people assume that most people are disabled from birth. This may or may not be true. For e.g. in the above triple Railways accidents, there are bound to be disabled people or newly disabled people who were healthy before the accident. The most common accident is road accidents, some involving pedestrians and vehicles or both, the easiest is Ministry of Road Transport data that says 4,00,000 people sustained injuries in 2021 alone in road mishaps. And this is in a country where even accidents are highly under-reported, for more than one reason. The biggest reason especially in 2 and 4 wheeler is the increased premium they would have to pay if in an accident, so they usually compromise with the other and pay off the Traffic Inspector.

Sadly, I haven’t read a new book, although there are a few books I’m looking forward to have. People living in India and neighbors please be careful as more heat waves are expected. Till later.

06 June, 2023 07:12AM by shirishag75

Michael Ablassmeier

updating to bookworm

Just updated to bookworm. Only thing that gave me headaches was OpenVPN refusing to accept the password/username combination specified via “auth-user-pass” option..

Mystery was solved by adding “providers legacy default” to the configuration file used.

06 June, 2023 12:00AM

June 05, 2023

Reproducible Builds

Reproducible Builds in May 2023

Welcome to the May 2023 report from the Reproducible Builds project

In our reports, we outline the most important things that we have been up to over the past month. As always, if you are interested in contributing to the project, please visit our Contribute page on our website.


Holger Levsen gave a talk at the 2023 edition of the Debian Reunion Hamburg, a semi-informal meetup of Debian-related people in northern Germany. The slides are available online.


In April, Holger Levsen gave a talk at foss-north 2023 titled Reproducible Builds, the first ten years. Last month, however, Holger’s talk was covered in a round-up of the conference on the Free Software Foundation Europe (FSFE) blog.


Pronnoy Goswami, Saksham Gupta, Zhiyuan Li, Na Meng and Daphne Yao from Virginia Tech published a paper investigating the Reproducibility of NPM Packages. The abstract includes:

When using open-source NPM packages, most developers download prebuilt packages on npmjs.com instead of building those packages from available source, and implicitly trust the downloaded packages. However, it is unknown whether the blindly trusted prebuilt NPM packages are reproducible (i.e., whether there is always a verifiable path from source code to any published NPM package). […] We downloaded versions/releases of 226 most popularly used NPM packages and then built each version with the available source on GitHub. Next, we applied a differencing tool to compare the versions we built against versions downloaded from NPM, and further inspected any reported difference.

The paper reports that “among the 3,390 versions of the 226 packages, only 2,087 versions are reproducible,” and furthermore that multiple factors contribute to the non-reproducibility including “flexible versioning information in package.json file and the divergent behaviors between distinct versions of tools used in the build process.” The paper concludes with “insights for future verifiable build procedures.”

Unfortunately, a PDF is not available publically yet, but a Digital Object Identifier (DOI) is available on the paper’s IEEE page.


Elsewhere in academia, Betul Gokkaya, Leonardo Aniello and Basel Halak of the School of Electronics and Computer Science at the University of Southampton published a new paper containing a broad overview of attacks and comprehensive risk assessment for software supply chain security.

Their paper, titled Software supply chain: review of attacks, risk assessment strategies and security controls, analyses the most common software supply-chain attacks by providing the latest trend of analyzed attack, and identifies the security risks for open-source and third-party software supply chains. Furthermore, their study “introduces unique security controls to mitigate analyzed cyber-attacks and risks by linking them with real-life security incidence and attacks”. (arXiv.org, PDF)


NixOS is now tracking two new reports at reproducible.nixos.org. Aside from the collection of build-time dependencies of the minimal and Gnome installation ISOs, this page now also contains reports that are restricted to the artifacts that make it into the image. The minimal ISO is currently reproducible except for Python 3.10, which hopefully will be resolved with the coming update to Python version 3.11.


On our rb-general mailing list this month:

David A. Wheeler started a thread noting that the OSSGadget project’s oss-reproducible tool was measuring something related to but not the same as reproducible builds. Initially they had adopted the term “semantically reproducible build” term for what it measured, which they defined as being “if its build results can be either recreated exactly (a bit for bit reproducible build), or if the differences between the release package and a rebuilt package are not expected to produce functional differences in normal cases.” This generated a significant number of replies, and several were concerned that people might confuse what they were measuring with “reproducible builds”. After discussion, the OSSGadget developers decided to switch to the term “semantically equivalent” for what they measured in order to reduce the risk of confusion.

Vagrant Cascadian (vagrantc) posted an update about GCC, binutils, and Debian’s build-essential set with “some progress, some hope, and I daresay, some fears…”.

Lastly, kpcyrd asked a question about building a reproducible Linux kernel package for Arch Linux (answered by Arnout Engelen). In the same, thread David A. Wheeler pointed out that the Linux Kernel documentation has a chapter about Reproducible kernel builds now as well.


In Debian this month, nine reviews of Debian packages were added, 20 were updated and 6 were removed this month, all adding to our knowledge about identified issues. In addition, Vagrant Cascadian added a link to the source code causing various ecbuild issues. []


The F-Droid project updated its Inclusion How-To with a new section explaining why it considers reproducible builds to be best practice and hopes developers will support the team’s efforts to make as many (new) apps reproducible as it reasonably can.


In diffoscope development this month, version 242 was uploaded to Debian unstable by Chris Lamb who also made the following changes:

  • If binwalk is not available, ensure the user knows they may be missing more info. []
  • Factor out generating a human-readable comment when missing a Python module. []

In addition, Mattia Rizzolo documented how to (re)-produce a binary blob in the code [] and Vagrant Cascadian updated the version of diffoscope in GNU Guix to 242 [].


reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, Holger Levsen uploaded versions 0.7.24 and 0.7.25 to Debian unstable which added support for Tox versions 3 and 4 with help from Vagrant Cascadian [][][]


Upstream patches

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

In addition, Jason A. Donenfeld filed a bug (now fixed in the latest alpha version) in the Android issue tracker to report that generateLocaleConfig in Android Gradle Plugin version 8.1.0 generates XML files using non-deterministic ordering, breaking reproducible builds. []


Testing framework

The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In May, a number of changes were made by Holger Levsen:

  • Update the kernel configuration of arm64 nodes only put required modules in the initrd to save space in the /boot partition. []
  • A huge number of changes to a new tool to document/track Jenkins node maintenance, including adding --fetch, --help, --no-future and --verbose options [][][][] as well as adding a suite of new actions, such as apt-upgrade, command, deploy-git, rmstamp, etc. [][][][] in addition a significant amount of refactoring [][][][].
  • Issue warnings if apt has updates to install. []
  • Allow Jenkins to run apt get update in maintenance job. []
  • Installed bind9-dnsutils on some Ubuntu 18.04 nodes. [][]
  • Fixed the Jenkins shell monitor to correctly deal with little-used directories. []
  • Updated the node health check to warn when apt upgrades are available. []
  • Performed some node maintenance. []

In addition, Vagrant Cascadian added the nocheck, nopgo and nolto when building gcc-* and binutils packages [] as well as performed some node maintenance [][]. In addition, Roland Clobus updated the openQA configuration to specify longer timeouts and access to the developer mode [] and updated the URL used for reproducible Debian Live images [].



If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

05 June, 2023 05:35PM

June 04, 2023

Thorsten Alteholz

My Debian Activities in May 2023

FTP master

This month I accepted 157 and rejected 22 packages. The overall number of packages that got accepted was 160.

Debian LTS

This was my hundred-seventh month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. 

This month my all in all workload has been 14h.

During that time I uploaded:

  • [DLA 3430-1] cups-filters security update for one CVE
  • [DSA 5407-1] cups-filters security update for one CVE
  • [unstable] upload of cups-filters to fix CVE-2023-24805
  • [#1036548] unblock bug to fix CVE-2023-24805 in bookworm
  • [unstable] upload of sniproxy to fix CVE-2023-25076
  • [DSA 5413-1] sniproxy security update in Bullseye for one CVE
  • [cups] working to fix CVE-2023-32324 in unstable, Bookworm, Bullseye, Buster

The CVEs for cups-filters and cups have been embargoed ones, so the work for cups was done in May but the uploads happen in June.

I also did some work on security-master to inject missing dependencies for hugo and gitlab-workhose.

Last but not least I did some days on frontdesk duties.

Debian ELTS

This month was the fifty eighth ELTS month.

  • [ELA-852-1] cups-filters security update in Jessie and Stretch for one CVE
  • [ELA-856-1] freetype security update in Jessie and Stretch for two CVEs
  • [ELA-857-1] libtasn1-6 security update in Jessie and Stretch for one CVE
  • [cups] working to fix CVE-2023-32324 in Jessie and Stretch

The CVEs for cups-filters and cups have been embargoed ones, so the work for cups was done in May but the uploads happen in June.

Last but not least I did some days on frontdesk duties.

Debian Astro

This month I uploaded some packages to fix RC bugs, that were
detected by one of many QA tools:

Thanks a lot to all the hardworking people who run these tools!

Debian Printing

This month I could fix RC bugs in:

This work is generously funded by Freexian!

Debian Mobcom

This month I could fix RC bugs in:

Other stuff

Some other packages also had last minute RC bugs:

I even did an upload of a new package force-ip-protocol. I finally had enough of people using IPv6 for their hosts but are unable to configure it. Now I can force firefox, or whatever software, to only use IPv4. One nuisance settled.

04 June, 2023 10:43AM by alteholz

hackergotchi for Debian Brasil

Debian Brasil

Oficina de tradução do Manual do(a) Administrador(a) Debian em 13 de junho

A equipe de tradução do Debian para o português do Brasil realizará, no dia 13 de junho a partir das 20h, uma oficina de tradução do Manual do(a) Administrador(a) Debian (The Debian Administrator's Handbook).

O objetivo é mostrar aos(às) iniciantes como colaborar na tradução deste importante material, que existe desde 2004 e vem sendo traduzido para o português ao longo dos anos. Agora a tradução precisa ser atualizada para a versão 12 do Debian (bookworm), que será lançada este mês.

A ferramenta usada para traduzir o Manual é o site weblate, então você já pode criar sua conta e acessar o Projeto Debian Handbook para se ambientar.

A oficina acontecerá no formato online, e o link para participar da sala no jitsi será divulgado no grupo debl10nptBR no telegram e no canal #debian-l10n-br do IRC.

04 June, 2023 10:00AM

June 03, 2023

hackergotchi for Ben Hutchings

Ben Hutchings

FOSS activity in May 2023

03 June, 2023 04:50PM

June 02, 2023

Jelmer Vernooij

Porting Python projects to Rust

I’ve recently been working on porting some of my Python code to rust, both for performance reasons, and because of the strong typing in the language. As a fan of Haskell, I also just really enjoy using the language.

Porting any large project to a new language can be a challenge. There is a temptation to do a rewrite from the ground-up in idiomatic rust and using all new fancy features of the language.

Porting in one go

However, this is a bit of a trap:

  • It blocks other work. It can take a long time to finish the rewrite, during which time there is no good place to make other bug fixes/feature changes. If you make the change in the python branch, then you may also have to patch the in-progress rust fork.
  • No immediate return on investment. While the rewrite is happening, all of the investment in it is sunk costs.
  • Throughout the process, you can only run the tests for subsystems that have already been ported. It’s common to find subtle bugs later in code ported early.
  • Understanding existing code, porting it and making it idiomatic rust all at the same time takes more time and post-facto debugging.

Iterative porting

Instead, we’ve found that it works much better to take an iterative approach. One of the hidden gems of rust is the excellent PyO3 crate, which allows creating python bindings for rust code in a way that is several times less verbose and less painful than C or SWIG. Because of rust’s strong ownership model, it’s also really hard to muck up e.g. reference counts when creating Python bindings for rust code.

We port individual functions or classes to rust one at a time, starting with functionality that doesn’t have dependencies on other python code and gradually working our way up the call stack.

Each subsystem of the code is converted to two matching rust crates: one with a port of the code to pure rust, and one with python bindings for the rust code. Generally multiple python modules end up being a single pair of rust crates.

The signature for the pure Rust code follow rust conventions, but the business logic is mostly ported as-is (just in rust syntax) and the signatures of the python bindings match that of the original python code.

This then allows running the original python tests to verify that the code still behaves the same way. Changes can also immediately land on the main branch.

A subsequent step is usually to refactor the rust code to be more idiomatic - all the while keeping the tests passing. There is also the potential to e.g. switch to using external rust crates (with perhaps subtly different behaviour), or drop functionality altogether.

At some point, we will also port the tests from python to rust, and potentially drop the python bindings - once all the caller’s have been converted to rust.

Example

For example, imagine I have a Python module janitor/mail_filter.py with this function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
def parse_plain_text_body(text):
   lines = text.splitlines()

   for i, line in enumerate(lines):
       if line == 'Reply to this email directly or view it on GitHub:':
           return lines[i + 1].split('#')[0]
       if (line == 'For more details, see:'
               and lines[i + 1].startswith('https://code.launchpad.net/')):
           return lines[i + 1]
       try:
           (field, value) = line.split(':', 1)
       except ValueError:
           continue
       if field.lower() == 'merge request url':
           return value.strip()
   return None

Porting this to rust naively (in a crate I’ve called “mailfilter”) it might look something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
pub fn parse_plain_text_body(text: &str) -> Option<String> {
     let lines: Vec<&str> = text.lines().collect();

     for (i, line) in lines.iter().enumerate() {
         if line == &"Reply to this email directly or view it on GitHub:" {
             return Some(lines[i + 1].split('#').next().unwrap().to_string());
         }
         if line == &"For more details, see:"
             && lines[i + 1].starts_with("https://code.launchpad.net/")
         {
             return Some(lines[i + 1].to_string());
         }
         if let Some((field, value)) = line.split_once(':') {
             if field.to_lowercase() == "merge request url" {
                 return Some(value.trim().to_string());
             }
         }
     }
     None
 }

Bindings are created in a crate called mailfilter-py, which looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
use pyo3::prelude::*;

 #[pyfunction]
 fn parse_plain_text_body(text: &str) -> Option<String> {
     janitor_mail_filter::parse_plain_text_body(text)
 }

 #[pymodule]
 pub fn _mail_filter(py: Python, m: &PyModule) -> PyResult<()> {
     m.add_function(wrap_pyfunction!(parse_plain_text_body, m)?)?;

     Ok(())
 }

The metadata for the crates is what you’d expect. mailfilter-py uses PyO3 and depends on mailfilter.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
[package]
 name = "mailfilter-py"
 version = "0.0.0"
 authors = ["Jelmer Vernooij <jelmer@jelmer.uk>"]
 edition = "2018"

 [lib]
 crate-type = ["cdylib"]

 [dependencies]
 janitor-mail-filter = { path = "../mailfilter" }
 pyo3 = { version = ">=0.14", features = ["extension-module"]}

I use python-setuptools-rust to get the python ecosystem to build the python bindings. Here is what setup.py looks like:

1
2
3
4
5
6
7
8
9
#!/usr/bin/python3
from setuptools import setup
from setuptools_rust import RustExtension, Binding

setup(
        rust_extensions=[RustExtension(
        "janitor._mailfilter", "crates/mailfilter-py/Cargo.toml",
        binding=Binding.PyO3)],
)

And of course, setuptools-rust needs to be listed as a setup requirement in pyproject.toml or setup.cfg.

After that, we can replace the original python code with a simple import and verify that the tests still run:

1
from ._mailfilter import parse_plain_text_body

Of course, not all bindings are as simple as this. Iterators in particular are more complicated, as is code that has a loose idea of ownership in python. But I’ve found that the time investment is usually well worth the ability to land changes on the development head early and often.

I’d be curious to hear if people have had success with other approaches to porting Python code to Rust. If you do, please leave a comment.

02 June, 2023 05:00PM by Jelmer Vernooij

hackergotchi for Matt Brown

Matt Brown

Calling time on DNSSEC: The costs exceed the benefits

I’m calling time on DNSSEC. Last week, prompted by a change in my DNS hosting setup, I began removing it from the few personal zones I had signed. Then this Monday the .nz ccTLD experienced a multi-day availability incident triggered by the annual DNSSEC key rotation process. This incident broke several of my unsigned zones, which led me to say very unkind things about DNSSEC on Mastodon and now I feel compelled to more completely explain my thinking:

For almost all domains and use-cases, the costs and risks of deploying DNSSEC outweigh the benefits it provides. Don’t bother signing your zones.

The .nz incident, while topical, is not the motivation or the trigger for this conclusion. Had it been a novel incident, it would still have been annoying, but novel incidents are how we learn so I have a small tolerance for them. The problem with DNSSEC is precisely that this incident was not novel, just the latest in a long and growing list.

It’s a clear pattern. DNSSEC is complex and risky to deploy. Choosing to sign your zone will almost inevitably mean that you will experience lower availability for your domain over time than if you leave it unsigned. Even if you have a team of DNS experts maintaining your zone and DNS infrastructure, the risk of routine operational tasks triggering a loss of availability (unrelated to any attempted attacks that DNSSEC may thwart) is very high - almost guaranteed to occur. Worse, because of the nature of DNS and DNSSEC these incidents will tend to be prolonged and out of your control to remediate in a timely fashion.

The only benefit you get in return for accepting this almost certain reduction in availability is trust in the integrity of the DNS data a subset of your users (those who validate DNSSEC) receive. Trusted DNS data that is then used to communicate across an untrusted network layer. An untrusted network layer which you are almost certainly protecting with TLS which provides a more comprehensive and trustworthy set of security guarantees than DNSSEC is capable of, and provides those guarantees to all your users regardless of whether they are validating DNSSEC or not.

In summary, in our modern world where TLS is ubiquitous, DNSSEC provides only a thin layer of redundant protection on top of the comprehensive guarantees provided by TLS, but adds significant operational complexity, cost and a high likelihood of lowered availability.

In an ideal world, where the deployment cost of DNSSEC and the risk of DNSSEC-induced outages were both low, it would absolutely be desirable to have that redundancy in our layers of protection. In the real world, given the DNSSEC protocol we have today, the choice to avoid its complexity and rely on TLS alone is not at all painful or risky to make as the operator of an online service. In fact, it’s the prudent choice that will result in better overall security outcomes for your users.

Ignore DNSSEC and invest the time and resources you would have spent deploying it improving your TLS key and certificate management.

Ironically, the one use-case where I think a valid counter-argument for this position can be made is TLDs (including ccTLDs such as .nz). Despite its many failings, DNSSEC is an Internet Standard, and as infrastructure providers, TLDs have an obligation to enable its use. Unfortunately this means that everyone has to bear the costs, complexities and availability risks that DNSSEC burdens these operators with. We can’t avoid that fact, but we can avoid creating further costs, complexities and risks by choosing not to deploy DNSSEC on the rest of our non-TLD zones.

But DNSSEC will save us from the evil CA ecosystem!

Historically, the strongest motivation for DNSSEC has not been the direct security benefits themselves (which as explained above are minimal compared to what TLS provides), but in the new capabilities and use-cases that could be enabled if DNS were able to provide integrity and trusted data to applications.

Specifically, the promise of DNS-based Authentication of Named Entities (DANE) is that with DNSSEC we can be free of the X.509 certificate authority ecosystem and along with it the expensive certificate issuance racket and dubious trust properties that have long been its most distinguishing features.

Ten years ago this was an extremely compelling proposition with significant potential to improve the Internet. That potential has gone unfulfilled.

Instead of maturing as deployments progressed and associated operational experience was gained, DNSSEC has been beset by the discovery of issue after issue. Each of these has necessitated further changes and additions to the protocol, increasing complexity and deployment cost. For many zones, including significant zones like google.com (where I led the attempt to evaluate and deploy DNSSEC in the mid 2010s), it is simply infeasible to deploy the protocol at all, let alone in a reliable and dependable manner.

While DNSSEC maturation and deployment has been languishing, the TLS ecosystem has been steadily and impressively improving. Thanks to the efforts of many individuals and companies, although still founded on the use of a set of root certificate authorities, the TLS and CA ecosystem today features transparency, validation and multi-party accountability that comprehensively build trust in the ability to depend and rely upon the security guarantees that TLS provides. When you use TLS today, you benefit from:

  • Free/cheap issuance from a number of different certificate authorities.
  • Regular, automated issuance/renewal via the ACME protocol.
  • Visibility into who has issued certificates for your domain and when through Certificate Transparency logs.
  • Confidence that certificates issued without certificate transparency (and therefore lacking an SCT) will not be accepted by the leading modern browsers.
  • The use of modern cryptographic protocols as a baseline, with a plausible and compelling story for how these can be steadily and promptly updated over time.

DNSSEC with DANE can match the TLS ecosystem on the first benefit (up front price) and perhaps makes the second benefit moot, but has no ability to match any of the other transparency and accountability measures that today’s TLS ecosystem offers. If your ZSK is stolen, or a parent zone is compromised or coerced, validly signed TLSA records for a forged certificate can be produced and spoofed to users under attack with minimal chances of detection.

Finally, in terms of overall trust in the roots of the system, the CA/Browser forum requirements continue to improve the accountability and transparency of TLS certificate authorities, significantly reducing the ability for any single actor (say a nefarious government) to subvert the system. The DNS root has a well established transparent multi-party system for establishing trust in the DNSSEC root itself, but at the TLD level, almost intentionally thanks to the hierarchical nature of DNS, DNSSEC has multiple single points of control (or coercion) which exist outside of any formal system of transparency or accountability.

We’ve moved from DANE being a potential improvement in security over TLS when it was first proposed, to being a definite regression from what TLS provides today.

That’s not to say that TLS is perfect, but given where we’re at, we’ll get a better security return from further investment and improvements in the TLS ecosystem than we will from trying to fix DNSSEC.

But TLS is not ubiquitous for non-HTTP applications

The arguments above are most compelling when applied to the web-based HTTP-oriented ecosystem which has driven most of the TLS improvements we’ve seen to date. Non-HTTP protocols are lagging in adoption of many of the improvements and best practices TLS has on the web. Some claim this need to provide a solution for non-HTTP, non-web applications provides a motivation to continue pushing DNSSEC deployment.

I disagree, I think it provides a motivation to instead double-down on moving those applications to TLS. TLS as the new TCP.

The problem is that costs of deploying and operating DNSSEC are largely fixed regardless of how many protocols you are intending to protect with it, and worse, the negative side-effects of DNSSEC deployment can and will easily spill over to affect zones and protocols that don’t want or need DNSSEC’s protection. To justify continued DNSSEC deployment and operation in this context means using a smaller set of benefits (just for the non-HTTP applications) to justify the already high costs of deploying DNSSEC itself, plus the cost of the risk that DNSSEC poses to the reliability to your websites. I don’t see how that equation can ever balance, particularly when you evaluate it against the much lower costs of just turning on TLS for the rest of your non-HTTP protocols instead of deploying DNSSEC. MTA-STS is a worked example of how this can be achieved.

If you’re still not convinced, consider that even DNS itself is considering moving to TLS (via DoT and DoH) in order to add the confidentiality/privacy attributes the protocol currently lacks. I’m not a huge fan of the latency implications of these approaches, but the ongoing discussion shows that clever solutions and mitigations for that may exist.

DoT/DoH solve distinct problems from DNSSEC and in principle should be used in combination with it, but in a world where DNS itself is relying on TLS and therefore has eliminated the majority of spoofing and cache poisoning attacks through DoT/DoH deployment the benefit side of the DNSSEC equation gets smaller and smaller still while the costs remain the same.

OK, but better software or more careful operations can reduce DNSSEC’s cost

Some see the current DNSSEC costs simply as teething problems that will reduce as the software and tooling matures to provide more automation of the risky processes and operational teams learn from their mistakes or opt to simply transfer the risk by outsourcing the management and complexity to larger providers to take care of.

I don’t find these arguments compelling. We’ve already had 15+ years to develop improved software for DNSSEC without success. What’s changed that we should expect a better outcome this year or next? Nothing.

Even if we did have better software or outsourced operations, the approach is still only hiding the costs behind automation or transferring the risk to another organisation. That may appear to work in the short-term, but eventually when the time comes to upgrade the software, migrate between providers or change registrars the debt will come due and incidents will occur.

The problem is the complexity of the protocol itself. No amount of software improvement or outsourcing addresses that.

After 15+ years of trying, I think it’s worth considering that combining cryptography, caching and distributed consensus, some of the most fundamental and complex computer science problems, into a slow-moving and hard to evolve low-level infrastructure protocol while appropriately balancing security, performance and reliability appears to be beyond our collective ability.

That doesn’t have to be the end of the world, the improvements achieved in the TLS ecosystem over the same time frame provide a positive counter example - perhaps DNSSEC is simply focusing our attention at the wrong layer of the stack.

Ideally secure DNS data would be something we could have, but if the complexity of DNSSEC is the price we have to pay to achieve it, I’m out. I would rather opt to remain with the simpler yet insecure DNS protocol and compensate for its short comings at higher transport or application layers where experience shows we are able to more rapidly improve and develop our security capabilities.

Summing up

For the vast majority of domains and use-cases there is simply no net benefit to deploying DNSSEC in 2023. I’d even go so far as to say that if you’ve already signed your zones, you should (carefully) move them back to being unsigned - you’ll reduce the complexity of your operating environment and lower your risk of availability loss triggered by DNS. Your users will thank you.

The threats that DNSSEC defends against are already amply defended by the now mature and still improving TLS ecosystem at the application layer, and investing in further improvements here carries far more return than deployment of DNSSEC.

For TLDs, like .nz whose outage triggered this post, DNSSEC is not going anywhere and investment in mitigating its complexities and risks is an unfortunate burden that must be shouldered. While the full incident report of what went wrong with .nz is not yet available, the interim report already hints at some useful insights. It is important that InternetNZ publishes a full and comprehensive review so that the full set of learnings and improvements this incident can provide can be fully realised by .nz and other TLD operators stuck with the unenviable task of trying to safely operate DNSSEC.

Postscript

After taking a few days to draft and edit this post, I’ve just stumbled across a presentation from the well respected Geoff Huston at last weeks RIPE86 meeting. I’ve only had time to skim the slides (video here) - they don’t seem to disagree with my thinking regarding the futility of the current state of DNSSEC, but also contain some interesting ideas for what it might take for DNSSEC to become a compelling proposition.

Probably worth a read/watch!

02 June, 2023 12:20AM

June 01, 2023

hackergotchi for Gunnar Wolf

Gunnar Wolf

Cheatable e-voting booths in Coahuila, Mexico, detected at the last minute

It’s been a very long time I haven’t blogged about e-voting, although some might remember it’s been a topic I have long worked with; particularly, it was the topic of my 2018 Masters thesis, plus some five articles I wrote in the 2010-2018 period. After the thesis, I have to admit I got weary of the subject, and haven’t pursued it anymore.

So, I was saddened and dismayed to read that –once again, as it has already happened– the electoral authorities would set up a pilot e-voting program in the local elections this year, that would probably lead to a wider deployment next year, in the Federal elections.

This year (…this week!), two States will have elections for their Governors and local Legislative branches: Coahuila (North, bordering with Texas) and Mexico (Center, surrounding Mexico City). They are very different states, demographically and in their development level.

Pilot programs with e-voting booths have been seen in four states TTBOMK in the last ~15 years: Jalisco (West), Mexico City, State of Mexico and Coahuila. In Coahuila, several universities have teamed up with the Electoral Institute to develop their e-voting booth; a good thing that I can say about how this has been done in my country is that, at least, the Electoral Institute is providing their own implementations, instead of sourcing with e-booth vendors (which have their long, tragic story mostly in the USA, but also in other places). Not only that: They are subjecting the machines to audit processes. Not open audit processes, as demanded by academics in the field, but nevertheless, external, rigorous audit processes.

But still, what me and other colleagues with Computer Security background oppose to is not a specific e-voting implementation, but the adoption of e-voting in general. If for nothing else, because of the extra complexity it brings, because of the many more checks that have to be put in place, and… Because as programmers, we are aware of the ease with which bugs can creep in any given implementation… both honest bugs (mistakes) and, much worse, bugs that are secretly requested and paid for.

Anyway, leave this bit aside for a while. I’m not implying there was any ill intent in the design or implementation of these e-voting booths.

Two days ago, the Electoral Institute announced there was an important bug found in the Coahuila implementation. The bug consists, as far as I can understand from the information reported in newspapers, in:

  • Each voter approaches their electoral authorities, who verify their identity and their authorization to vote in that precinct
  • The voter is given an activation code, with which they go to the voting booth
  • The booth is activated and enables each voter to cast a vote only once

The problem was that the activation codes remained active after voting, so a voter could vote multiple times.

This seems like an easy problem to be patched — It most likely is. However, given the inability to patch, properly test, and deploy in a timely manner the fix to all of the booths (even though only 74 e-voting booths were to be deployed for this pilot), the whole pilot for Coahuila was scratched; Mexico State is voting with a different implementation that is not affected by this issue.

This illustrates very well one of the main issues with e-voting technology: It requires a team of domain-specific experts to perform a highly specialized task (code and physical audits). I am happy and proud to say that part of the auditing experts were the professors of the Information Security Masters program of ESIME Culhuacán (the Masters program I was part of).

The reaction by the Electoral Institute was correct. As far as I understand, there is no evidence suggesting this bug could have been purposefully built, but it’s not impossible to rule it out.

A traditional, paper-and-ink-based process is not only immune to attacks (or mistakes!) based on code such as this one, but can be audited by anybody. And that is, I believe, a fundamental property of democracy: ensuring the process is done right is not limited to a handful of domain experts. Not only that: In Mexico, I am sure there are hundreds of very proficient developers that could perform a code and equipment audit such as this one, but the audits are open by invitation only, so being an expert is not enough to get clearance to do this.

In a democracy, the whole process should be observable and verifiable by anybody interested in doing so.

Some links about this news:

01 June, 2023 04:22PM

hackergotchi for Holger Levsen

Holger Levsen

20230601-developers-reference-translations

src:developers-reference translations wanted

I've just uploaded developers-reference 12.19, bringing the German translation status back to 100% complete, thanks to Carsten Schoenert. Some other translations however could use some updates:

$ make status
for l in de fr it ja ru; do     \
    if [ -d source/locales/$l/LC_MESSAGES ] ; then  \
        echo -n "Stats for $l: " ;          \
        msgcat --use-first source/locales/$l/LC_MESSAGES/*.po | msgfmt --statistics - 2>&1 ; \
    fi ;                            \
done
Stats for de: 1374 translated messages.
Stats for fr: 1286 translated messages, 39 fuzzy translations, 49 untranslated messages.
Stats for it: 869 translated messages, 46 fuzzy translations, 459 untranslated messages.
Stats for ja: 891 translated messages, 26 fuzzy translations, 457 untranslated messages.
Stats for ru: 870 translated messages, 44 fuzzy translations, 460 untranslated messages.

01 June, 2023 01:39PM

Russell Coker

Do Desktop Computers Make Sense?

Laptop vs Desktop Price

Currently the smaller and cheaper USB-C docks start at about $25 and Dell has a new Vostro with 8G of RAM and 2*USB-C ports for $788. That gives a bit over $800 for a laptop and dock vs $795 for the cheapest Dell desktop which also has 8G of RAM. For every way of buying laptops and desktops (EG buying from Officeworks, buying on ebay, etc) the prices for laptops and desktops seem very similar. For all those comparisons the desktop will typically have a faster CPU and more options for PCIe cards, larger storage, etc. But if you don’t want to expand storage beyond the affordable 4TB NVMe/SSD devices, don’t need to add PCIe cards, and don’t need much CPU power then a laptop will do well. For the vast majority of the computer work I do my Thinkpad Carbon X1 Gen1 (from 2012) had plenty of CPU power.

If someone who’s not an expert in PC hardware was to buy a computer of a given age then laptops probably aren’t more expensive than desktops even disregarding the fact that a laptop works without the need to purchase a monitor, a keyboard, or a mouse. I can get regular desktop PCs for almost nothing and get parts to upgrade them very cheaply but most people can’t do that. I can also get a decent second-hand laptop and USB-C dock for well under $400.

Servers and Gaming Systems

For people doing serious programming or other compute or IO intensive tasks some variation on the server theme is the best option. That may be something more like the servers used by the r/homelab people than the corporate servers, or it might be something in the cloud, but a server is a server. If you are going to have a home server that’s a tower PC then it makes sense to put a monitor on it and use it as a workstation. If your server makes so much noise that you can’t spend much time in the same room or if it’s hosted elsewhere then using a laptop to access it makes sense.

Desktop computers for PC gaming makes sense as no-one seems to be making laptops with moderately powerful GPUs. The most powerful GPUs draw 150W which is more than most laptop PSUs can supply and even if a laptop PSU could supply that much there would be the issue of cooling. The Steam Deck [1] and the Nintendo Switch [2] can both work with USB-C docks. The PlayStation 5 [3] has a 350W PSU and doesn’t support video over USB-C. The Steam Deck can do 8K resolution at 60Hz or 4K at 120Hz but presumably the newer Steam games will need a desktop PC with a more powerful GPU to properly use such resolutions.

For people who want the best FPS rates on graphics intensive games it could make sense to have a tower PC. Also a laptop that’s run at high CPU/GPU use for a long time will tend to have it’s vents clogged by dust and possibly have the cooling fan wear out.

Monitor Resolution

Laptop support for a single 4K monitor became common in 2012 with the release of the Ivy Bridge mobile CPUs from Intel in 2012. My own experience of setting up 4K monitors for a Linux desktop in 2019 was that it was unreasonably painful and that the soon to be released Debian/Bookworm will make things work nicely for 4K monitors with KDE on X11. So laptop hardware has handled the case of a single high resolution monitor since before such monitors were cheap or common and before software supported it well. Of course at that time you had to use either a proprietary dock or a mini-DisplayPort to HDMI adaptor to get 4K working. But that was still easier than getting PCIe video cards supporting 4K resolution which is something that according to spec sheets wasn’t well supported by affordable cards in 2017.

Since USB-C became a standard feature in laptops in about 2017 support of more monitors than most people would want through a USB-C dock became standard. My Thinkpad X1 Carbon Gen5 which was released in 2017 will support 2*FullHD monitors plus a 4K monitor via a USB-C dock, I suspect it would do at least 2*4K monitors but haven’t had a chance to test. Cheap USB-C docks supporting this sort of thing have only become common in the last year or so.

How Many Computers per Home

Among middle class Australians it’s common to have multiple desktop PCs per household. One for each child who’s over the age of about 13 and one for the parents seems to be reasonably common. Students in the later years of high-school and university students are often compelled to have laptops so having the number of laptops plus the number of desktops be larger than the population of the house probably isn’t uncommon even among people who aren’t really into computers. As an aside it’s probably common among people who read my blog to have 2 desktops, a laptop, and a cloud server for their own personal use. But even among people who don’t do that sort of thing having computers outnumber people in a home is probably common.

A large portion of the computer users can do everything they need on a laptop. For gamers the graphics intensive games often run well on a console and that’s probably the most effective way of getting to playing the games. Of course the fact that there is “RGB RAM” (RAM with Red, Green, and Blue LEDs to light up) along with a lot of other wild products sold to gamers suggests that gaming PCs are not about what runs the game most effectively and that an art/craft project with the PC is more important than actually playing games.

Instead of having one desktop PC per bedroom and laptops for school/university as well it would make more sense to have a laptop per person and have a USB-C dock and monitor in each bedroom and a USB-C dock connected to a large screen TV in the lounge. This gives plenty of flexibility for moving around to do work and sharing what’s on your computer with other people. It also allows taking a work computer home and having work with your monitor, having a friend bring their laptop to your home to work on something together, etc.

For most people desktop computers don’t make sense. While I think that convergence of phones with laptops and desktops is the way of the future [4] for most people having laptops take over all functions of desktops is the best option today.

01 June, 2023 12:38PM by etbe

Jamie McClelland

Enough about the AI Apocalypse Already

After watching Democracy Now’s segment on artificial intelligence I started to wonder - am I out of step on this topic?

When people claim artificial intelligence will surpass human intelligence and thus threaten humanity with extinction, they seem to be referring specifically to advances made with large language models.

As I understand them, large language models are probability machines that have ingested massive amounts of text scraped from the Internet. They answer questions based on the probability of one series of words (their answer) following another series of words (the question).

It seems like a stretch to call this intelligence, but if we accept that definition then it follows that this kind of intelligence is nothing remotely like human intelligence, which makes the claim that it will surpass human intelligence confusing. Hasn’t this kind of machine learning surpassed us decades ago?

Or when we say “surpass” does that simply refer to fooling people into thinking an AI machine is a human via conversation? That is an important milestone, but I’m not ready to accept the turing test as proof of equal intelligence.

Furthermore, large language models “hallucinate” and also reflect the biases of their training data. The word “hallucinate” seems like a euphemism, as if it could be corrected with the right medication when in fact it seems hard to avoid when your strategy is to correlate words based on probability. But even if you could solve the “here is a completely wrong answer presented with sociopathic confidence” problem, reflecting the biases of your data sources seems fairly intractable. In what world would a system with built-in bias be considered on the brink of surpassing human intelligence?

The danger from LLMs seems to be their ability to convince people that their answers are correct, including their patently wrong and/or biased answers.

Why do people think they are giving correct answers? Oh right… terrifying right wing billionaires (with terrifying agendas have been claiming AI will exceed human intelligence and threaten humanity and every time they sign a hyperbolic statement they get front page mainstream coverage. And even progressive news outlets are spreading this narrative with minimal space for contrary opinions (thank you Tawana Petty from the Algorithmic Justice League for providing the only glimpse of reason in the segment).

The belief that artificial intelligence is or will soon become omnipotent has real world harms today: specifically it creates the misperception that current LLMs are accurate, which paves the way for greater adoption among police forces, social service agencies, medical facilities and other places where racial and economic biases have life and death consequences.

When the CEO of OpenAI calls the technology dangerous and in need of regulation, he gets both free advertising promoting the power and supposed accuracy of his product and the possibility of freezing further developments in the field that might challenge OpenAI’s current dominance.

The real threat to humanity is not AI, it’s massive inequality and the use of tactics ranging from mundane bureaucracy to deadly force and incarceration to segregate the affluent from the growing number of people unable to make ends meet. We have spent decades training bureaucrats, judges and cops to robotically follow biased laws to maintain this order without compassion or empathy. Replacing them with AI would be make things worse and should be stopped. But, let’s be clear, the narrative that AI is poised to surpass human intelligence and make humanity extinct is a dangerous distraction that runs counter to a much more important story about “the very real and very present exploitative practices of the [companies building AI], who are rapidly centralizing power and increasing social inequities.”.

Maybe we should talk about that instead?

01 June, 2023 12:27PM

hackergotchi for Junichi Uekawa

Junichi Uekawa

Already June.

Already June.

01 June, 2023 02:23AM by Junichi Uekawa

Paul Wise

FLOSS Activities May 2023

Focus

This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration

  • Debian IRC: set topic on new #debian-sa channel
  • Debian wiki: unblock IP addresses, approve accounts

Communication

  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors

The SIMDe, gensim, sptag work was sponsored. All other work was done on a volunteer basis.

01 June, 2023 12:09AM

May 31, 2023

Arturo Borrero González

Wikimedia Hackathon 2023 Athens summary

Post logo

During the weekend of 19-23 May 2023 I attended the Wikimedia hackathon 2023 in Athens, Greece. The event physically reunited folks interested in the more technological aspects of the Wikimedia movement in person for the first time since 2019. The scope of the hacking projects include (but was not limited to) tools, wikipedia bots, gadgets, server and network infrastructure, data and other technical systems.

My role in the event was two-fold: on one hand I was in the event because of my role as SRE in the Wikimedia Cloud Services team, where we provided very valuable services to the community, and I was expected to support the technical contributors of the movement that were around. Additionally, and because of that same role, I did some hacking myself too, which was specially augmented given I generally collaborate on a daily basis with some community members that were present in the hacking room.

The hackathon had some conference-style track and I ran a session with my coworker Bryan, called Past, Present and Future of Wikimedia Cloud Services (Toolforge and friends) (slides) which was very satisfying to deliver given the friendly space that it was. I attended a bunch of other sessions, and all of them were interesting and well presented. The number of ML themes that were present in the program schedule was exciting. I definitely learned a lot from attending those sessions, from how LLMs work, some fascinating applications for them in the wikimedia space, to what were some industry trends for training and hosting ML models.

Session

Despite the sessions, the main purpose of the hackathon was, well, hacking. While I was in the hacking space for more than 12 hours each day, my ability to get things done was greatly reduced by the constant conversations, help requests, and other social interactions with the folks. Don’t get me wrong, I embraced that reality with joy, because the social bonding aspect of it is perhaps the main reason why we gathered in person instead of virtually.

That being said, this is a rough list of what I did:

The hackathon was also the final days of Technical Engagement as an umbrella group for WMCS and Developer Advocacy teams within the Technology department of the Wikimedia Foundation because of an internal reorg.. We used the chance to reflect on the pleasant time we have had together since 2019 and take a final picture of the few of us that were in person in the event.

Technical Engagement

It wasn’t the first Wikimedia Hackathon for me, and I felt the same as in previous iterations: it was a welcoming space, and I was surrounded by friends and nice human beings. I ended the event with a profound feeling of being privileged, because I was part of the Wikimedia movement, and because I was invited to participate in it.

31 May, 2023 12:11PM

Russ Allbery

Review: Night Watch

Review: Night Watch, by Terry Pratchett

Series: Discworld #29
Publisher: Harper
Copyright: November 2002
Printing: August 2014
ISBN: 0-06-230740-1
Format: Mass market
Pages: 451

Night Watch is the 29th Discworld novel and the sixth Watch novel. I would really like to tell people they could start here if they wanted to, for reasons that I will get into in a moment, but I think I would be doing you a disservice. The emotional heft added by having read the previous Watch novels and followed Vimes's character evolution is significant.

It's the 25th of May. Vimes is about to become a father. He and several of the other members of the Watch are wearing sprigs of lilac for reasons that Sergeant Colon is quite vehemently uninterested in explaining. A serial killer named Carcer the Watch has been after for weeks has just murdered an off-duty sergeant. It's a tense and awkward sort of day and Vimes is feeling weird and wistful, remembering the days when he was a copper and not a manager who has to dress up in ceremonial armor and meet with committees.

That may be part of why, when the message comes over the clacks that the Watch have Carcer cornered on the roof of the New Hall of the Unseen University, Vimes responds in person. He's grappling with Carcer on the roof of the University Library in the middle of a magical storm when lightning strikes. When he wakes up, he's in the past, shortly after he joined the Watch and shortly before the events of the 25th of May that the older Watch members so vividly remember and don't talk about.

I have been saying recently in Discworld reviews that it felt like Pratchett was on the verge of a breakout book that's head and shoulders above Discworld prior to that point. This is it. This is that book.

The setup here is masterful: the sprigs of lilac that slowly tell the reader something is going on, the refusal of any of the older Watch members to talk about it, the scene in the graveyard to establish the stakes, the disconcerting fact that Vetinari is wearing a sprig of lilac as well, and the feeling of building tension that matches the growing electrical storm. And Pratchett never gives into the temptation to explain everything and tip his hand prematurely. We know the 25th is coming and something is going to happen, and the reader can put together hints from Vimes's thoughts, but Pratchett lets us guess and sometimes be right and sometimes be wrong. Vimes is trying to change history, which adds another layer of uncertainty and enjoyment as the reader tries to piece together both the true history and the changes. This is a masterful job at a "what if?" story.

And, beneath that, the commentary on policing and government and ethics is astonishingly good. In a review of an earlier Watch novel, I compared Pratchett to Dickens in the way that he focuses on a sort of common-sense morality rather than political theory. That is true here too, but oh that moral analysis is sharp enough to slide into you like a knife. This is not the Vimes that we first met in Guards! Guards!. He has has turned his cynical stubbornness into a working theory of policing, and it's subtle and complicated and full of nuance that he only barely knows how to explain. But he knows how to show it to people.

Keep the peace. That was the thing. People often failed to understand what that meant. You'd go to some life-threatening disturbance like a couple of neighbors scrapping in the street over who owned the hedge between their properties, and they'd both be bursting with aggrieved self-righteousness, both yelling, their wives would either be having a private scrap on the side or would have adjourned to a kitchen for a shared pot of tea and a chat, and they all expected you to sort it out.

And they could never understand that it wasn't your job. Sorting it out was a job for a good surveyor and a couple of lawyers, maybe. Your job was to quell the impulse to bang their stupid fat heads together, to ignore the affronted speeches of dodgy self-justification, to get them to stop shouting and to get them off the street. Once that had been achieved, your job was over. You weren't some walking god, dispensing finely tuned natural justice. Your job was simply to bring back peace.

When Vimes is thrown back in time, he has to pick up the role of his own mentor, the person who taught him what policing should be like. His younger self is right there, watching everything he does, and he's desperately afraid he'll screw it up and set a worse example. Make history worse when he's trying to make it better. It's a beautifully well-done bit of tension that uses time travel as the hook to show both how difficult mentorship is and also how irritating one's earlier naive self would be.

He wondered if it was at all possible to give this idiot some lessons in basic politics. That was always the dream, wasn't it? "I wish I'd known then what I know now"? But when you got older you found out that you now wasn't you then. You then was a twerp. You then was what you had to be to start out on the rocky road of becoming you now, and one of the rocky patches on that road was being a twerp.

The backdrop of this story, as advertised by the map at the front of the book, is a revolution of sorts. And the revolution does matter, but not in the obvious way. It creates space and circumstance for some other things to happen that are all about the abuse of policing as a tool of politics rather than Vimes's principle of keeping the peace. I mentioned when reviewing Men at Arms that it was an awkward book to read in the United States in 2020. This book tackles the ethics of policing head-on, in exactly the way that book didn't.

It's also a marvelous bit of competence porn. Somehow over the years, Vimes has become extremely good at what he does, and not just in the obvious cop-walking-a-beat sort of ways. He's become a leader. It's not something he thinks about, even when thrown back in time, but it's something Pratchett can show the reader directly, and have the other characters in the book comment on.

There is so much more that I'd like to say, but so much would be spoilers, and I think Night Watch is more effective when you have the suspense of slowly puzzling out what's going to happen. Pratchett's pacing is exquisite. It's also one of the rare Discworld novels where Pratchett fully commits to a point of view and lets Vimes tell the story. There are a few interludes with other people, but the only other significant protagonist is, quite fittingly, Vetinari. I won't say anything more about that except to note that the relationship between Vimes and Vetinari is one of the best bits of fascinating subtlety in all of Discworld.

I think it's also telling that nothing about Night Watch reads as parody. Sure, there is a nod to Back to the Future in the lightning storm, and it's impossible to write a book about police and street revolutions without making the reader think about Les Miserables, but nothing about this plot matches either of those stories. This is Pratchett telling his own story in his own world, unapologetically, and without trying to wedge it into parody shape, and it is so much the better book for it.

The one quibble I have with the book is that the bits with the Time Monks don't really work. Lu-Tze is annoying and flippant given the emotional stakes of this story, the interludes with him are frustrating and out of step with the rest of the book, and the time travel hand-waving doesn't add much. I see structurally why Pratchett put this in: it gives Vimes (and the reader) a time frame and a deadline, it establishes some of the ground rules and stakes, and it provides a couple of important opportunities for exposition so that the reader doesn't get lost. But it's not good story. The rest of the book is so amazingly good, though, that it doesn't matter (and the framing stories for "what if?" explorations almost never make much sense).

The other thing I have a bit of a quibble with is outside the book. Night Watch, as you may have guessed by now, is the origin of the May 25th Pratchett memes that you will be familiar with if you've spent much time around SFF fandom. But this book is dramatically different from what I was expecting based on the memes. You will, for example see a lot of people posting "Truth, Justice, Freedom, Reasonably Priced Love, And a Hard-Boiled Egg!", and before reading the book it sounds like a Pratchett-style humorous revolutionary slogan. And I guess it is, sort of, but, well... I have to quote the scene:

"You'd like Freedom, Truth, and Justice, wouldn't you, Comrade Sergeant?" said Reg encouragingly.

"I'd like a hard-boiled egg," said Vimes, shaking the match out.

There was some nervous laughter, but Reg looked offended.

"In the circumstances, Sergeant, I think we should set our sights a little higher—"

"Well, yes, we could," said Vimes, coming down the steps. He glanced at the sheets of papers in front of Reg. The man cared. He really did. And he was serious. He really was. "But...well, Reg, tomorrow the sun will come up again, and I'm pretty sure that whatever happens we won't have found Freedom, and there won't be a whole lot of Justice, and I'm damn sure we won't have found Truth. But it's just possible that I might get a hard-boiled egg."

I think I'm feeling defensive of the heart of this book because it's such an emotional gut punch and says such complicated and nuanced things about politics and ethics (and such deeply cynical things about revolution). But I think if I were to try to represent this story in a meme, it would be the "angels rise up" song, with all the layers of meaning that it gains in this story. I'm still at the point where the lilac sprigs remind me of Sergeant Colon becoming quietly furious at the overstep of someone who wasn't there.

There's one other thing I want to say about that scene: I'm not naturally on Vimes's side of this argument. I think it's important to note that Vimes's attitude throughout this book is profoundly, deeply conservative. The hard-boiled egg captures that perfectly: it's a bit of physical comfort, something you can buy or make, something that's part of the day-to-day wheels of the city that Vimes talks about elsewhere in Night Watch. It's a rejection of revolution, something that Vimes does elsewhere far more explicitly.

Vimes is a cop. He is in some profound sense a defender of the status quo. He doesn't believe things are going to fundamentally change, and it's not clear he would want them to if they did.

And yet. And yet, this is where Pratchett's Dickensian morality comes out. Vimes is a conservative at heart. He's grumpy and cynical and jaded and he doesn't like change. But if you put him in a situation where people are being hurt, he will break every rule and twist every principle to stop it.

He wanted to go home. He wanted it so much that he trembled at the thought. But if the price of that was selling good men to the night, if the price was filling those graves, if the price was not fighting with every trick he knew... then it was too high.

It wasn't a decision that he was making, he knew. It was happening far below the areas of the brain that made decisions. It was something built in. There was no universe, anywhere, where a Sam Vimes would give in on this, because if he did then he wouldn't be Sam Vimes any more.

This is truly exceptional stuff. It is the best Discworld novel I have read, by far. I feel like this was the Watch novel that Pratchett was always trying to write, and he had to write five other novels first to figure out how to write it. And maybe to prepare Discworld readers to read it.

There are a lot of Discworld novels that are great on their own merits, but also it is 100% worth reading all the Watch novels just so that you can read this book.

Followed in publication order by The Wee Free Men and later, thematically, by Thud!.

Rating: 10 out of 10

31 May, 2023 02:51AM

May 30, 2023

Review: The Mimicking of Known Successes

Review: The Mimicking of Known Successes, by Malka Older

Series: Mossa and Pleiti #1
Publisher: Tordotcom
Copyright: 2023
ISBN: 1-250-86051-2
Format: Kindle
Pages: 169

The Mimicking of Known Successes is a science fiction mystery novella, the first of an expected series. (The second novella is scheduled to be published in February of 2024.)

Mossa is an Investigator, called in after a man disappears from the eastward platform on the 4°63' line. It's an isolated platform, five hours away from Mossa's base, and home to only four residential buildings and a pub. The most likely explanation is that the man jumped, but his behavior before he disappeared doesn't seem consistent with that theory. He was bragging about being from Valdegeld University, talking to anyone who would listen about the important work he was doing — not typically the behavior of someone who is suicidal. Valdegeld is the obvious next stop in the investigation.

Pleiti is a Classics scholar at Valdegeld. She is also Mossa's ex-girlfriend, making her both an obvious and a fraught person to ask for investigative help. Mossa is the last person she expected to be waiting for her on the railcar platform when she returns from a trip to visit her parents.

The Mimicking of Known Successes is mostly a mystery, following Mossa's attempts to untangle the story of what happened to the disappeared man, but as you might have guessed there's a substantial sapphic romance subplot. It's also at least adjacent to Sherlock Holmes: Mossa is brilliant, observant, somewhat monomaniacal, and very bad at human relationships. All of this story except for the prologue is told from Pleiti's perspective as she plays a bit of a Watson role, finding Mossa unreadable, attractive, frustrating, and charming in turn. Following more recent Holmes adaptations, Mossa is portrayed as probably neurodivergent, although the story doesn't attach any specific labels.

I have no strong opinions about this novella. It was fine? There's a mystery with a few twists, there's a sapphic romance of the second chance variety, there's a bit of action and a bit of hurt/comfort after the action, and it all felt comfortably entertaining but kind of predictable. Susan Stepney has a "passes the time" review rating, and while that may be a bit harsh, that's about where I ended up.

The most interesting part of the story is the science fiction setting. We're some indefinite period into the future. Humans have completely messed up Earth to the point of making it uninhabitable. We then took a shot at terraforming Mars and messed that planet up to the point of uninhabitability as well. Now, what's left of humanity (maybe not all of it — the story isn't clear) lives on platforms connected by rail lines high in the atmosphere of Jupiter. (Everyone in the story calls Jupiter "Giant" for reasons that I didn't follow, given that they didn't rename any of its moons.) Pleiti's position as a Classics scholar means that she studies Earth and its now-lost ecosystems, whereas the Modern faculty focus on their new platform life.

This background does become relevant to the mystery, although exactly how is not clear at the start.

I wouldn't call this a very realistic setting. One has to accept that people are living on platforms attached to artificial rings around the solar system's largest planet and walk around in shirt sleeves and only minor technological support due to "atmoshields" of some unspecified capability, and where the native atmosphere plays the role of London fog. Everything feels vaguely Edwardian, including to the occasional human porter and message runner, which matches the story concept but seems unlikely as a plausible future culture. I also disbelieve in humanity's ability to do anything to Earth that would make it less inhabitable than the clouds of Jupiter.

That said, the setting is a lot of fun, which is probably more important. It's fun to try to visualize, and it has that slightly off-balance, occasionally surprising feel of science fiction settings where everyone is recognizably human but the things they consider routine and unremarkable are unexpected by the reader.

This novella also has a great title. The Mimicking of Known Successes is simultaneously a reference a specific plot point from late in the story, a nod to the shape of the romance, and an acknowledgment of the Holmes pastiche, and all of those references work even better once you know what the plot point is. That was nicely done.

This was not very memorable apart from the setting, but it was pleasant enough. I can't say that I'm inspired to pre-order the next novella in this series, but I also wouldn't object to reading it. If you're in the mood for gender-swapped Holmes in an exotic setting, you could do worse.

Followed by The Imposition of Unnecessary Obstacles.

Rating: 6 out of 10

30 May, 2023 02:09AM

May 29, 2023

hackergotchi for Shirish Agarwal

Shirish Agarwal

Pearls of Luthra, Dahaad, Tetris & Discord.

Pearls of Luthra

Pearls of Luthra is the first book by Brian Jacques and I think I am going to be a fan of his work. This particular book you have to be wary of. While it is a beautiful book with quite a few illustrations, I have to warn that if you are somebody who feels hungry at the very mention of food, then you will be hungry throughout the book. There isn’t a single page where food isn’t mentioned and not just any kind of food, the kind of food that is geared towards sweet tooth. So if you fancy tarts or chocolates or anything sweet you will right at home. The book also touches upon various teas and wines and various liquors but food is where it shines in literally. The tale is very much like a Harry Potter adventure but isn’t as dark as HP was. In fact, apart from one death and one ear missing rest of our heroes and heroines and there are quite a few. I don’t want to give too much away as it’s a book to be treasured.

Dahaad

Dahaad (the roar) is Sonakshi Sinha’s entry in OTT/Web Series. The stage is set somewhere in North India while the exploits are based on a real life person called Cyanide Mohan who killed 20 women between 2005-2009. In the web series however, the antagonist’s crimes are done over a period of 12 years and has 29 women as his victims. Apart from that it’s pretty much a copy of what was done by the person above. It’s a melting pot of a series which quite a few stories enmeshed along with the main one. The main onus and plot of the movie is about women from lower economic and caste order whose families want them to be wed but cannot due to huge demands for dowry. Now in such a situation, if a person were to give them a bit of attention, promise marriage and ask them to steal a bit and come with him and whatever, they will do it. The same modus operandi was done by Cynaide Mohan. He had a car that was not actually is but used it show off that he’s from a richer background, entice the women, have sex, promise marriage and in the morning after pill there will be cynaide which the women unwittingly will consume.

This is also framed by the protagonist Sonakshi Sinha to her mother as her mother is also forcing her to get married as she is becoming older. She shows some of the photographs of the victims and says that while the perpetrator is guilty but so is the overall society that puts women in such vulnerable positions. AFAIK, that is still the state of things. In fact, there is a series called ‘Indian Matchmaking‘ that has all the snobbishness that you want. How many people could have a lifestyle like the ones shown in that, less than 2% of the population. It’s actually shows like the above that make the whole thing even more precarious 😦

Apart from it, the show also shows prejudice about caste and background. I wouldn’t go much into it as it’s worth seeing and experiencing.

Tetris

Tetris in many a ways is a story of greed. It’s also a story of a lone inventor who had to wait almost 20 odd years to profit from his invention. Forbes does a marvelous job of giving some more background and foreground info. about Tetris, the inventor and the producer that went to strike it rich. It also does share about copyright misrepresentation happens but does nothing to address it. Could talk a whole lot but better to see the movie and draw your own conclusions. For me it was 4/5.

Discord

Discord became Discord 2.0 and is a blank to me. A blank page. Can’t do anything. First I thought it was a bug. Waited for a few days as sometimes webservices do fix themselves. But two weeks on and it still wasn’t fixed then decided to look under. One of the tools in Firefox is Web Developer Tools ( CTRL+Shift+I) that tells you if an element of a page is not appearing or at least gives you a hint. To me it gave me the following –


Content Security Policy: Ignoring “'unsafe-inline'” within script-src or style-src: nonce-source or hash-source specified
Content Security Policy: The page’s settings blocked the loading of a resource at data:text/css,%0A%20%20%20%20%20%20%20%2… (“style-src”). data:44:30
Content Security Policy: Ignoring “'unsafe-inline'” within script-src or style-src: nonce-source or hash-source specified
TypeError: AudioContext is not a constructor 138875 https://discord.com/assets/cbf3a75da6e6b6a4202e.js:262 l https://discord.com/assets/f5f0b113e28d4d12ba16.js:1ed46a18578285e5c048b.js:241:118

What is being done is dom.webaudio.enabled being disabled in Firefox.

Then on a hunch, searched on reddit and saw the following. Be careful while visiting the link as it’s labelled NSFW although to my mind there wasn’t anything remotely NSFW about it. They do mention using another tool ‘AudioContext Fingerprint Defender‘ which supposedly fakes or spoofs an id. As this add-on isn’t tracked by Firefox privacy team it’s hard for me to say anything positive or negative.

So, in the end I stopped using discord as the alternative was being tracked by them 😦

Last but not the least, saw this about a week back. Sooner or later this had to happen as Elon tries to make money off Twitter.


29 May, 2023 11:49PM by shirishag75

John Goerzen

Recommendations for Tools for Backing Up and Archiving to Removable Media

I have several TB worth of family photos, videos, and other data. This needs to be backed up — and archived.

Backups and archives are often thought of as similar. And indeed, they may be done with the same tools at the same time. But the goals differ somewhat:

Backups are designed to recover from a disaster that you can fairly rapidly detect.

Archives are designed to survive for many years, protecting against disaster not only impacting the original equipment but also the original person that created them.

Reflecting on this, it implies that while a nice ZFS snapshot-based scheme that supports twice-hourly backups may be fantastic for that purpose, if you think about things like family members being able to access it if you are incapacitated, or accessibility in a few decades’ time, it becomes much less appealing for archives. ZFS doesn’t have the wide software support that NTFS, FAT, UDF, ISO-9660, etc. do.

This post isn’t about the pros and cons of the different storage media, nor is it about the pros and cons of cloud storage for archiving; these conversations can readily be found elsewhere. Let’s assume, for the point of conversation, that we are considering BD-R optical discs as well as external HDDs, both of which are too small to hold the entire backup set.

What would you use for archiving in these circumstances?

Establishing goals

The goals I have are:

  • Archives can be restored using Linux or Windows (even though I don’t use Windows, this requirement will ensure the broadest compatibility in the future)
  • The archival system must be able to accommodate periodic updates consisting of new files, deleted files, moved files, and modified files, without requiring a rewrite of the entire archive dataset
  • Archives can ideally be mounted on any common OS and the component files directly copied off
  • Redundancy must be possible. In the worst case, one could manually copy one drive/disc to another. Ideally, the archiving system would automatically track making n copies of data.
  • While a full restore may be a goal, simply finding one file or one directory may also be a goal. Ideally, an archiving system would be able to quickly tell me which discs/drives contain a given file.
  • Ideally, preserves as much POSIX metadata as possible (hard links, symlinks, modification date, permissions, etc). However, for the archiving case, this is less important than for the backup case, with the possible exception of modification date.
  • Must be easy enough to do, and sufficiently automatable, to allow frequent updates without error-prone or time-consuming manual hassle

I would welcome your ideas for what to use. Below, I’ll highlight different approaches I’ve looked into and how they stack up.

Basic copies of directories

The initial approach might be one of simply copying directories across. This would work well if the data set to be archived is smaller than the archival media. In that case, you could just burn or rsync a new copy with every update and be done. Unfortunately, this is much less convenient with data of the size I’m dealing with. rsync is unavailable in that case. With some datasets, you could manually design some rsyncs to store individual directories on individual devices, but that gets unwieldy fast and isn’t scalable.

You could use something like my datapacker program to split the data across multiple discs/drives efficiently. However, updates will be a problem; you’d have to re-burn the entire set to get a consistent copy, or rely on external tools like mtree to reflect deletions. Not very convenient in any case.

So I won’t be using this.

tar or zip

While you can split tar and zip files across multiple media, they have a lot of issues. GNU tar’s incremental mode is clunky and buggy; zip is even worse. tar files can’t be read randomly, making it extremely time-consuming to extract just certain files out of a tar file.

The only thing going for these formats (and especially zip) is the wide compatibility for restoration.

dar

Here we start to get into the more interesting tools. Dar is, in my opinion, one of the best Linux tools that few people know about. Since I first wrote about dar in 2008, it’s added some interesting new features; among them, binary deltas and cloud storage support. So, dar has quite a few interesting features that I make use of in other ways, and could also be quite helpful here:

  • Dar can both read and write files sequentially (streaming, like tar), or with random-access (quick seek to extract a subset without having to read the entire archive)
  • Dar can apply compression to individual files, rather than to the archive as a whole, faciliting both random access and resilience (corruption in one file doesn’t invalidate all subsequent files). Dar also supports numerous compression algorithms including gzip, bzip2, xz, lzo, etc., and can omit compressing already-compressed files.
  • The end of each dar file contains a central directory (dar calls this a catalog). The catalog contains everything necessary to extract individual files from the archive quickly, as well as everything necessary to make a future incremental archive based on this one. Additionally, dar can make and work with “isolated catalogs” — a file containing the catalog only, without data.
  • Dar can split the archive into multiple pieces called slices. This can best be done with fixed-size slices (–slice and –first-slice options), which let the catalog regord the slice number and preserves random access capabilities. With the –execute option, dar can easily wait for a given slice to be burned, etc.
  • Dar normally stores an entire new copy of a modified file, but can optionally store an rdiff binary delta instead. This has the potential to be far smaller (think of a case of modifying metadata for a photo, for instance).

Additionally, dar comes with a dar_manager program. dar_manager makes a database out of dar catalogs (or archives). This can then be used to identify the precise archive containing a particular version of a particular file.

All this combines to make a useful system for archiving. Isolated catalogs are tiny, and it would be easy enough to include the isolated catalogs for the entire set of archives that came before (or even the dar_manager database file) with each new incremental archive. This would make restoration of a particular subset easy.

The main thing to address with dar is that you do need dar to extract the archive. Every dar release comes with source code and a win64 build. dar also supports building a statically-linked Linux binary. It would therefore be easy to include win64 binary, Linux binary, and source with every archive run. dar is also a part of multiple Linux and BSD distributions, which are archived around the Internet. I think this provides a reasonable future-proofing to make sure dar archives will still be readable in the future.

The other challenge is user ability. While dar is highly portable, it is fundamentally a CLI tool and will require CLI abilities on the part of users. I suspect, though, that I could write up a few pages of instructions to include and make that a reasonably easy process. Not everyone can use a CLI, but I would expect a person that could follow those instructions could be readily-enough found.

One other benefit of dar is that it could easily be used with tapes. The LTO series is liked by various hobbyists, though it could pose formidable obstacles to non-hobbyists trying to aceess data in future decades. Additionally, since the archive is a big file, it lends itself to working with par2 to provide redundancy for certain amounts of data corruption.

git-annex

git-annex is an interesting program that is designed to facilitate managing large sets of data and moving it between repositories. git-annex has particular support for offline archive drives and tracks which drives contain which files.

The idea would be to store the data to be archived in a git-annex repository. Then git-annex commands could generate filesystem trees on the external drives (or trees to br burned to read-only media).

In a post about using git-annex for blu-ray backups, an earlier thread about DVD-Rs was mentioned.

This has a few interesting properties. For one, with due care, the files can be stored on archival media as regular files. There are some different options for how to generate the archives; some of them would place the entire git-annex metadata on each drive/disc. With that arrangement, one could access the individual files without git-annex. With git-annex, one could reconstruct the final (or any intermediate) state of the archive appropriately, handling deltions, renames, etc. You would also easily be able to know where copies of your files are.

The practice is somewhat more challenging. Hundreds of thousands of files — what I would consider a medium-sized archive — can pose some challenges, running into hours-long execution if used in conjunction with the directory special remote (but only minutes-long with a standard git-annex repo).

Ruling out the directory special remote, I had thought I could maybe just work with my files in git-annex directly. However, I ran into some challenges with that approach as well. I am uncomfortable with git-annex mucking about with hard links in my source data. While it does try to preserve timestamps in the source data, these are lost on the clones. I wrote up my best effort to work around all this.

In a forum post, the author of git-annex comments that “I don’t think that CDs/DVDs are a particularly good fit for git-annex, but it seems a couple of users have gotten something working.” The page he references is Managing a large number of files archived on many pieces of read-only medium. Some of that discussion is a bit dated (for instance, the directory special remote has the importtree feature that implements what was being asked for there), but has some interesting tips.

git-annex supplies win64 binaries, and git-annex is included with many distributions as well. So it should be nearly as accessible as dar in the future. Since git-annex would be required to restore a consistent recovery image, similar caveats as with dar apply; CLI experience would be needed, along with some written instructions.

Bacula and BareOS

Although primarily tape-based archivers, these do also also nominally support drives and optical media. However, they are much more tailored as backup tools, especially with the ability to pull from multiple machines. They require a database and extensive configuration, making them a poor fit for both the creation and future extractability of this project.

Conclusions

I’m going to spend some more time with dar and git-annex, testing them out, and hope to write some future posts about my experiences.

29 May, 2023 04:57PM by John Goerzen