March 28, 2017

hackergotchi for Axel Beckert

Axel Beckert

System Tray Icon to Monitor a Linux Software RAID Locally

About a year ago I bought a new workstation computer for myself at home. It’s a Tuxedo XUX_Cube which is advertised as gaming PC. But I ordered a slightly atypical non-gamer configuration:

  • As much RAM as possible (64 GB)
  • Intel i7 CPU, but the low power variant
  • Only with the onboard Intel graphics card. (No need for NVidia binary crap drivers.)
  • 2× Samsung 128 GB SSD for OS and $HOME plus 2× 3 TB WD Red disks for media storage; both pairs set up as RAID 1
  • Bitfenix Prodigy-M case in Orange. (Not available in Tuxedo Computer’s online shop, but they nevertheless ordered it for me. :-)

Of course the box runs Debian. To be more precise, it runs Debian Sid with sysvinit-core as init system and i3 as window manager. As I usually have no monitoring clients on my laptops and private workstations, I rather often felt the urge to do a cat /proc/mdstat on that box.

So at some point I wanted something like smart-notifier, but for Linux Software (MD) RAIDs. And since I found nothing, I did what Open Source guys usually do in such cases: I wrote it myself — of course in Perl — and called it systray-mdstat.

First I wondered about which build system would be most suitable for that task, but in the end I once again went with Dist::Zilla for the upstream build system and hence dh-dist-zilla for the Debian packaging.

Ideas for the actual implementation were taken from Wouter’s fdpowermon for the systray icon framework in Perl and Myon’s mdstat Xymon plugin for an already proven logic to parse /proc/mdstat. (Both, Wouter and Myon have stated in a GnuPG-signed e-mail that I copied less code than would validate their copyrights, so I was able to license it under a single license, namely GNU GPL version 3.)

As of now, systray-mdstat is also available as package in Debian Unstable. It won’t make it to Stretch as its first line of code has been written after the soft-freeze for Stretch was already in place.

28 March, 2017 02:09AM by Axel Beckert (abe+blog@deuxchevaux.org)

March 27, 2017

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

#0: Introducing R^4

So I had been toying with the idea of getting back to the blog and more regularly writing / posting little tips and tricks. I even started taking some notes but because perfect is always the enemy of the good it never quite materialized.

But the relatively broad discussion spawned by last week's short rant on Suggests != Depends made a few things clear. There appears to be an audience. It doesn't have to be long. And it doesn't have to be too polished.

So with that, let's get the blogging back from micro-blogging.

This note forms post zero of what will a new segment I call R4 which is shorthand for relatively random R rambling.

Stay tuned.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

27 March, 2017 11:08PM

Elena 'valhalla' Grandi

New pajama

New pajama

I may have been sewing myself a new pajama.

Immagine/fotohttp://social.gl-como.it/photos/valhalla/image/81b600789aa02a91fdf62f54a71b1ba0

It was plagued with issues; one of the sleeve is wrong side out and I only realized it when everything was almost done (luckily the pattern is symmetric and it is barely noticeable) and the swirl moved while I was sewing it on (and the sewing machine got stuck multiple times: next time I'm using interfacing, full stop.), and it's a bit deformed, but it's done.

For the swirl, I used Inkscape to Simplify (Ctrl-L) the original Debian Swirl a few times, removed the isolated bits, adjusted some spline nodes by hand and printed on paper. I've then cut, used water soluble glue to attach it to the wrong side of a scrap of red fabric, cut the fabric, removed the paper and then pinned and sewed the fabric on the pajama top.
As mentioned above, the next time I'm doing something like this, some interfacing will be involved somewhere, to keep me sane and the sewing machine happy.

Blogging, because it is somewhat relevant to Free Software :) and there are even sources https://www.trueelena.org/clothing/projects/pajamas_set.html#downloads, under a DFSG-Free license :)

27 March, 2017 07:38AM by Elena ``of Valhalla''

March 26, 2017

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

RcppTOML 0.1.2

A new release of RcppTOML is now on CRAN. This release fixes a few parsing issues for less frequently-used inputs: vectors of boolean or date(time) types, as well as table array input.

RcppTOML brings TOML to R. TOML is a file format that is most suitable for configurations, as it is meant to be edited by humans but read by computers. It emphasizes strong readability for humans while at the same time supporting strong typing as well as immediate and clear error reports. On small typos you get parse errors, rather than silently corrupted garbage. Much preferable to any and all of XML, JSON or YAML -- though sadly these may be too ubiquitous now. TOML is making good inroads with newer and more flexible projects such as the Hugo static blog compiler, or the Cargo system of Crates (aka "packages") for the Rust language.

Changes in version 0.1.2 (2017-03-26)

  • Dates and Datetimes in arrays in the input now preserve their types instead of converting to numeric vectors (#13)

  • Boolean vectors are also correctly handled (#14)

  • TableArray types are now stored as lists in a single named list (#15)

  • The README.md file was expanded with an example and screenshot.

  • Added file init.c with calls to R_registerRoutines() and R_useDynamicSymbols(); also use .registration=TRUE in useDynLib in NAMESPACE

  • Two example files were updated.

Courtesy of CRANberries, there is a diffstat report for this release.

More information is on the RcppTOML page page. Issues and bugreports should go to the GitHub issue tracker.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

26 March, 2017 10:12PM

March 25, 2017

RApiDatetime 0.0.2

Two days after the initial 0.0.1 release, a new version of RApiDatetime has just arrived on CRAN.

RApiDatetime provides six entry points for C-level functions of the R API for Date and Datetime calculations. The functions asPOSIXlt and asPOSIXct convert between long and compact datetime representation, formatPOSIXlt and Rstrptime convert to and from character strings, and POSIXlt2D and D2POSIXlt convert between Date and POSIXlt datetime. These six functions are all fairly essential and useful, but not one of them was previously exported by R.

Josh Ulrich took one hard look at the package -- and added the one line we needed to enable the Windows support that was missing in the initial release. We now build on all platforms supported by R and CRAN. Otherwise, I just added a NEWS file and called it a bugfix release.

Changes in RApiDatetime version 0.0.2 (2017-03-25)

  • Windows support has added (Josh Ulrich in #1)

Changes in RApiDatetime version 0.0.1 (2017-03-23)

  • Initial release with six accessible functions

Courtesy of CRANberries, there is a comparison to the previous release. More information is on the rapidatetime page.

For questions or comments please use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

25 March, 2017 10:38PM

Bits from Debian

Debian Project Leader elections 2017

It's that time of year again for the Debian Project: the elections of its Project Leader!

The Project Leader position is described in the Debian Constitution.

Two Debian Developers run this year to become Project Leader: Mehdi Dogguy, who has held the office for the last year, and Chris Lamb.

We are in the middle of the campaigning period that will last until the end of April 1st. The candidates and Debian contributors are already engaging in debates and discussions on the debian-vote mailing list.

The voting period starts on April 2nd, and during the following two weeks, Debian Developers can vote to choose the person that will fit that role for one year.

The results will be published on April 16th with the term for new the project leader starting the following day.

25 March, 2017 09:30PM by Laura Arjona Reina

Russ Allbery

Spring haul

Work has been hellishly busy lately, so that's pretty much all I've been doing. The major project I'm working on should be basically done in the next couple of weeks, though (fingers crossed), so maybe I'll be able to surface a bit more after that.

In the meantime, I'm still acquiring books I don't have time to read, since that's my life. In this case, two great Humble Book Bundles were too good of a bargain to pass up. There are a bunch of books in here that I already own in paperback (and hence showed up in previous haul posts), but I'm running low on shelf room, so some of those paper copies may go to the used bookstore to make more space.

Kelley Armstrong — Lost Souls (sff)
Clive Barker — Tortured Souls (horror)
Jim Butcher — Working for Bigfoot (sff collection)
Octavia E. Butler — Parable of the Sower (sff)
Octavia E. Butler — Parable of the Talents (sff)
Octavia E. Butler — Unexpected Stories (sff collection)
Octavia E. Butler — Wild Seed (sff)
Jacqueline Carey — One Hundred Ablutions (sff)
Richard Chizmar — A Long December (sff collection)
Jo Clayton — Skeen's Leap (sff)
Kate Elliot — Jaran (sff)
Harlan Ellison — Can & Can'tankerous (sff collection)
Diana Pharoh Francis — Path of Fate (sff)
Mira Grant — Final Girls (sff)
Elizabeth Hand — Black Light (sff)
Elizabeth Hand — Saffron & Brimstone (sff collection)
Elizabeth Hand — Wylding Hall (sff)
Kevin Hearne — The Purloined Poodle (sff)
Nalo Hopkinson — Skin Folk (sff)
Katherine Kurtz — Camber of Culdi (sff)
Katherine Kurtz — Lammas Night (sff)
Joe R. Lansdale — Fender Lizards (mainstream)
Robert McCammon — The Border (sff)
Robin McKinley — Beauty (sff)
Robin McKinley — The Hero and the Crown (sff)
Robin McKinley — Sunshine (sff)
Tim Powers — Down and Out in Purgatory (sff)
Cherie Priest — Jacaranda (sff)
Alastair Reynolds — Deep Navigation (sff collection)
Pamela Sargent — The Shore of Women (sff)
John Scalzi — Miniatures (sff collection)
Lewis Shiner — Glimpses (sff)
Angie Thomas — The Hate U Give (mainstream)
Catherynne M. Valente — The Bread We Eat in Dreams (sff collection)
Connie Willis — The Winds of Marble Arch (sff collection)
M.K. Wren — Sword of the Lamb (sff)
M.K. Wren — Shadow of the Swan (sff)
M.K. Wren — House of the Wolf (sff)
Jane Yolen — Sister Light, Sister Dark (sff)

25 March, 2017 09:21PM

hackergotchi for Eddy Petrișor

Eddy Petrișor

LVM: Converting root partition from linear to raid1 leads to boot failure... and how to recover

I have a system which has 3 distinct HDDs used as physucal volumes for Linux LVM. One logical volume is the root partition and it was initally created as a linear LV (vg0/OS).
Since I have PV redundancy, I thought it might be a good idea to convert the root LV from liear to raid1 with 2 mirrors.

WARNING: It seems LVM raid1 logicalvolume for / is not supported with grub2, at least not with Ubuntu's 2.02~beta2-9ubuntu1.6 (14.04LTS) or Debian Jessie's grub-pc 2.02~beta2-22+deb8u1!

So I did this:
lvconvert -m2 --type raid1 vg0/OS

Then I restarted to find myself at the 'grub rescue>' prompt.

The initial problem was seen on an Ubuntu 14.04 LTS (aka trusty) system, but I reproduced it on a VM with Debian Jessie.

I downloaded the Super Grub2 Disk and tried to boot the VM. After choosing the option to load the LVM and RAID support, I was able to boot my previous system.

I tried several times to reinstall GRUB, thinking that was the issue, but I always got this  kind of error:


/usr/sbin/grub-probe: error: disk `lvmid/QtJiw0-wsDf-A2zh-2v2y-7JVA-NhPQ-TfjQlN/phCDlj-1XAM-VZnl-RzRy-g3kf-eeUB-dBcgmb' not found.

In the end, after digging for more than 4 hours for answers,  I decided I might be able to revert the config to linear configuration, from the (initramfs) prompt.

Initally the LV was inactive, so I activated it:

lvchange -a y /dev/vg0/OS

Then restored the LV to linear:

lvconvert -m0 vg0/OS

Then tried to reboot without reinstalling GRUB, just for kicks, which succeded.

In order to confirm this was the issue, I redid the whole thing, and indeed, with a raid1 root, I always got the error lvmid error.

I'll have to check on Monday at work if I can revert it the same way the Ubuntu 14.04 system, but I suspect I will have no issues.


Is it true root on lvm-raid1 is nto supported?

25 March, 2017 03:39PM by eddyp (noreply@blogger.com)

hackergotchi for Urvika Gola

Urvika Gola

Speaking at FOSSASIA’17 | Seasons of Debian : Summer of Code & Winter of Outreachy

I got an amazing chance to speak at FOSSASIA 2017 held at Singapore on “Seasons of Debian – Summer of Code and Winter of Outreachy“. I gave a combined talk with my co-speaker Pranav Jain, who contributed to Debian through GSoC. We talked about two major open source initiatives – Outreachy and Google Summer of Code and the work we did on a common project – Lumicall under Debian.

WhatsApp Image 2017-03-23 at 11.32.33 PM

The excitement started even before the first day! On 16th March, there was a speakers meetup at Microsoft office in Singapore. There, I got the chance to connect with other speakers and learn about their work The meetup was concluded by Microsoft Office tour! As a student it was very exciting to see first hand the office of a company that I had only dreamt of being at.

On 17th March, i.e the first day of the three days long conference, I met Hong Phuc Dang, Founder of FOSSASIA. She is very kind and just talking to her just made me cheerful!
Meeting so many great developers from different organizations was exciting.

On 18th March, was the day of our talk!  I was a bit nervous to speak in front of amazing developers but, that’s how you grow 🙂 Our talk was preceded by a lovely introduction by Mario Behling.

WhatsApp Image 2017-03-25 at 2.32.01 PM.jpeg

 

I talked about how Outreachy Programme has made a significant impact in increasing the participation of women in Open Source, with one such woman being me

I also talked about Android Programming concepts which I used in while adding new features into Lumicall. Pranav talked about Debian Organization and how to get started with GSoC by sharing his journey!

After our talk, students approached us asking questions about how to participate in Outreachy and GsOC. I felt that a lot more students were receptive to knowing about this new opportunity.

Our own talk was part of the mini DebConf track. Under this track, there were two other amazing sessions namely, “Debian – The best Linux distribution” and “Open Build Service in Debian”.

The variety of experiences I gained from FOSSASIA was very diverse. I  learned how to speak at a huge platform, learned from other interesting talks, share ideas with smart developers and saw an exciting venue and wonderful city!

I would not be able to experience this without the continuous support of Debian and Outreachy  ! 🙂

 

 


25 March, 2017 09:10AM by urvikagola

March 24, 2017

hackergotchi for Gunnar Wolf

Gunnar Wolf

Dear lazyweb: How would you visualize..?

Dear lazyweb,

I am trying to get a good way to present the categorization of several cases studied with a fitting graph. I am rating several vulnerabilities / failures according to James Cebula et. al.'s paper, A taxonomy of Operational Cyber Security Risks; this is a somewhat deep taxonomy, with 57 end items, but organized in a three levels deep hierarchy. Copying a table from the cited paper (click to display it full-sized):

My categorization is binary: I care only whether it falls within a given category or not. My first stab at this was to represent each case using a star or radar graph. As an example:

As you can see, to a "bare" star graph, I added a background color for each top-level category (blue for actions of people, green for systems and technology failures), red for failed internal processes and gray for external events), and printed out only the labels for the second level categories; for an accurate reading of the graphs, you have to refer to the table and count bars. And, yes, according to the Engineering Statistics Handbook:

Star plots are helpful for small-to-moderate-sized multivariate data sets. Their primary weakness is that their effectiveness is limited to data sets with less than a few hundred points. After that, they tend to be overwhelming.

I strongly agree with the above statement — And stating that "a few hundred points" can be understood is even an overstatement. 50 points are just too much. Now, trying to increase usability for this graph, I came across the Sunburst diagram. One of the proponents for this diagram, John Stasko, has written quite a bit about it.

Now... How to create my beautiful Sunburst diagram? That's a tougher one. Even though the page I linked to in the (great!) Data visualization catalogue presents even some free-as-in-software tools to do this... They are Javascript projects that will render their beautiful plots (even including an animation)... To the browser. I need them for a static (i.e. to be printed) document. Yes, I can screenshot and all, but I want them to be automatically generated, so I can review and regenerate them all automatically. Oh, I could just write JSON and use SaaS sites such as Aculocity to do the heavy-lifting, but if you know me, you will understand why I don't want to.

So... I set out to find a Gunnar-approved way to display the information I need. Now, as the Protovis documentation says, an icicle is simply a sunburst transformed from polar to cartesian coordinates... But I came to a similar conclusion: The tools I found are not what I need. OK, but an icicle graph seems much simpler to produce — I fired up my Emacs, and started writing using Ruby, RMagick and RVG... I decided to try a different way. This is my result so far:

So... What do you think? Does this look right to you? Clearer than the previous one? Worst? Do you have any idea on how I could make this better?

Oh... You want to tell me there is something odd about it? Well, yes, of course! I still need to tweak it quite a bit. Would you believe me if I told you this is not really a left-to-right icicle graph, but rather a strangely formatted Graphviz non-directed graph using the dot formatter?

I can assure you you don't want to look at my Graphviz sources... But in case you insist... Take them and laugh. Or cry. Of course, this file comes from a hand-crafted template, but has some autogenerated bits to it. I have still to tweak it quite a bit to correct several of its usability shortcomings, but at least it looks somewhat like what I want to achieve.

Anyway, I started out by making a "dear lazyweb" question. So, here it goes: Do you think I'm using the right visualization for my data? Do you have any better suggestions, either of a graph or of a graph-generating tool?

Thanks!

[update] Thanks for the first pointer, Lazyweb! I found a beautiful solution; we will see if it is what I need or not (it is too space-greedy to be readable... But I will check it out more thoroughly). It lays out much better than anything I can spew out by myself — Writing it as a mindmap using TikZ directly from within LaTeX, I get the following result:

24 March, 2017 08:46PM by gwolf

hackergotchi for Jo Shields

Jo Shields

Mono repository changes, beginning Mono vNext

Up to now, Linux packages on mono-project.com have come in two flavours – RPM built for CentOS 7 (and RHEL 7), and .deb built for Debian 7. Universal packages that work on the named distributions, and anything newer.

Except that’s not entirely true.

Firstly, there have been “compatibility repositories” users need to add, to deal with ABI changes in libtiff, libjpeg, and Apache, since Debian 7. Then there’s the packages for ARM64 and PPC64el – neither of those architectures is available in Debian 7, so they’re published in the 7 repo but actually built on 8.

A large reason for this is difficulty in our package publishing pipeline – apt only allows one version-architecture mix in the repository at once, so I can’t have, say, 4.8.0.520-0xamarin1 built on AMD64 on both Debian 7 and Ubuntu 16.04.

We’ve been working hard on a new package build/publish pipeline, which can properly support multiple distributions, based on Jenkins Pipeline. This new packaging system also resolves longstanding issues such as “can’t really build anything except Mono” and “Architecture: All packages still get built on Jo’s laptop, with no public build logs”

So, here’s the old build matrix:

Distribution Architectures
Debian 7 ARM hard float, ARM soft float, ARM64 (actually Debian 8), AMD64, i386, PPC64el (actually Debian 8)
CentOS 7 AMD64

And here’s the new one:

Distribution Architectures
Debian 7 ARM hard float (v7), ARM soft float, AMD64, i386
Debian 8 ARM hard float (v7), ARM soft float, ARM64, AMD64, i386, PPC64el
Raspbian 8 ARM hard float (v6)
Ubuntu 14.04 ARM hard float (v7), ARM64, AMD64, i386, PPC64el
Ubuntu 16.04 ARM hard float (v7), ARM64, AMD64, i386, PPC64el
CentOS 6 AMD64, i386
CentOS 7 AMD64

The compatibility repositories will no longer be needed on recent Ubuntu or Debian – just use the right repository for your system. If your distribution isn’t listed… sorry, but we need to draw a line somewhere on support, and the distributions listed here are based on heavy analysis of our web server logs and bug requests.

You’ll want to change your package manager repositories to reflect your system more accurately, once Mono vNext is published. We’re debating some kind of automated handling of this, but I’m loathe to touch users’ sources.list without their knowledge.

CentOS builds are going to be late – I’ve been doing all my prototyping against the Debian builds, as I have better command of the tooling. Hopefully no worse than a week or two.

24 March, 2017 10:06AM by directhex

Sylvain Beucler

Practical basics of reproducible builds

As GNU FreeDink upstream, I'd very much like to offer pre-built binaries: one (1) official, tested, current, distro-agnostic version of the game with its dependencies.
I'm actually already doing that for the Windows version.
One issue though: people have to trust me -- and my computer's integrity.
Reproducible builds could address that.
My release process is tightly controlled, but is my project reproducible? If not, what do I need? Let's check!

I quickly see that documentation is getting better, namely https://reproducible-builds.org/ :)
(The first docs I read on reproducibility looked more like a crazed date-o-phobic rant than actual solution - plus now we have SOURCE_DATE_EPOCH implemented in gcc ;))

However I was left unsatisfied by the very high-level viewpoint and the lack of concrete examples.
The document points to various issues but is very vague about what tools are impacted.

So let's do some tests!


Let's start with a trivial program:

$ cat > hello.c
#include <stdio.h>
int main(void) {
    printf("Hello, world!\n");
}

OK, first does GCC compile this reproducibly?
I'm not sure because I heard of randomness in identifiers and such in the compilation process...

$ gcc-5 hello.c -o hello-5
$ md5sum hello-5
a00416d7392442321bad4afc5a461321  hello-5
$ gcc-5 hello.c -o hello-5
$ md5sum hello-5
a00416d7392442321bad4afc5a461321  hello-5

Cool, ELF compiler output is stable through time!
Now do 2 versions of GCC compile a hello world identically?

$ gcc-6 hello.c -o hello-6
$ md5sum hello-6
f7f52c2f5f82fe2a95061a771a6c5acd  hello-6
$ hexcompare hello-5 hello-6
[lots of red]
...

Well let's not get our hopes too high ;)
Trivial build options change?

$ gcc-6 hello.c -lc -o hello-6
$ gcc-6 -lc hello.c -o hello-6b
$ md5sum hello-6 hello-6b
f7f52c2f5f82fe2a95061a771a6c5acd  hello-6
f73ee6d8c3789fd8f899f5762025420e  hello-6b
$ hexcompare hello-6 hello-6b
[lots of red]
...

OK, let's be very careful with build options then. What about 2 different build paths?

$ cd ..
$ cp -a repro/ repro2/
$ cd repro2/
$ gcc-6 hello.c -o hello-6
$ md5sum hello-6
f7f52c2f5f82fe2a95061a771a6c5acd  hello-6

Basic compilation is stable across directories.
Now I tried recompiling identically FreeDink on 2 different git clones.
Disappointment:

$ md5sum freedink/native/src/freedink freedink2/native/src/freedink
839ccd9180c72343e23e5d9e2e65e237  freedink/native/src/freedink
6d5dc6aab321fab01b424ac44c568dcf  freedink2/native/src/freedink
$ hexcompare freedink2/native/src/freedink freedink/native/src/freedink
[lots of red]

Hmm, what about stripped versions?

$ strip freedink/native/src/freedink freedink2/native/src/freedink
$ md5sum freedink/native/src/freedink freedink2/native/src/freedink
415e96bb54456f3f2a759f404f18c711  freedink/native/src/freedink
e0702d798807c83d21f728106c9261ad  freedink2/native/src/freedink
$ hexcompare freedink/native/src/freedink freedink2/native/src/freedink
[1 single red spot]

OK, what's happening? diffoscope to the rescue:

$ diffoscope freedink/native/src/freedink freedink2/native/src/freedink
--- freedink/native/src/freedink
+++ freedink2/native/src/freedink
├── readelf --wide --notes {}
│ @@ -3,8 +3,8 @@
│    Owner                 Data size  Description
│    GNU                  0x00000010  NT_GNU_ABI_TAG (ABI version tag)
│      OS: Linux, ABI: 2.6.32
│  
│  Displaying notes found in: .note.gnu.build-id
│    Owner                 Data size  Description
│    GNU                  0x00000014  NT_GNU_BUILD_ID (unique build ID bitstring)-    Build ID: a689574d69072bb64b28ffb82547e126284713fa
│ +    Build ID: d7be191a61e84648a58c18e9c108b3f3ce500302

What on earth is Build ID and how it is computed?
After much digging, I find it's a 2008 plan with application in selecting matching detached debugging symbols.
https://fedoraproject.org/wiki/RolandMcGrath/BuildID is the most detailed overview/rationale I found.
It is supposed to be computed from parts of the binary. It's actually pretty resistant to changes, e.g. I could add the missing "return 0;" in my hello source and get the exact same Build ID!
On the other hand my FreeDink binaries do match except for the Build ID so there must be a catch.

Let's try our basic example with default ./configure CFLAGS:

$ (cd repro/ && gcc -g -O2 hello.c -o hello)
$ (cd repro/ && gcc -g -O2 hello.c -o hello-b)
$ md5sum repro/hello repro/hello-b
6b2cd79947d7c5ed2e505ddfce167116  repro/hello
6b2cd79947d7c5ed2e505ddfce167116  repro/hello-b
# => OK for now

$ (cd repro2/ && gcc -g -O2 hello.c -o hello)
$ md5sum repro2/hello
20b4d09d94de5840400be05bc76e4172  repro2/hello
$ strip repro/hello repro2/hello
$ diffoscope repro/hello repro2/hello
--- repro/hello
+++ repro2/hello2
├── readelf --wide --notes {}
│ @@ -3,8 +3,8 @@
│    Owner                 Data size  Description
│    GNU                  0x00000010  NT_GNU_ABI_TAG (ABI version tag)
│      OS: Linux, ABI: 2.6.32
│  
│  Displaying notes found in: .note.gnu.build-id
│    Owner                 Data size  Description
│    GNU                  0x00000014  NT_GNU_BUILD_ID (unique build ID bitstring)-    Build ID: 462a3c613537bb57f20bd3ccbe6b7f6d2bdc72ba
│ +    Build ID: b4b448cf93e7b541ad995075d2b688ef296bd88b
# => issue reproduced with -g -O2 and different build directories

$ (cd repro/ && gcc -O2 hello.c -o hello)
$ (cd repro2/ && gcc -O2 hello.c -o hello)
$ md5sum repro/hello repro2/hello
1571d45eb5807f7a074210be17caa87b  repro/hello
1571d45eb5807f7a074210be17caa87b  repro2/hello
# => culprit is not -O2, so culprit is -g

Bummer. So the build ID must be computed also from the debug symbols, even if I strip them afterwards :(
OK, so when https://reproducible-builds.org/docs/build-path/ says "Some tools will record the path of the source files in their output", that means the compiler, and more importantly the stripped executable.

Conclusion: apparently to achieve reproducible builds I need identical full build paths and to keep track of them.

What about Windows/MinGW btw?

$ /opt/mxe/usr/bin/i686-w64-mingw32.static-gcc hello.c -o hello.exe
$ md5sum hello.exe 
e0fa685f6866029b8e03f9f2837dc263  hello.exe
$ /opt/mxe/usr/bin/i686-w64-mingw32.static-gcc hello.c -o hello.exe
$ md5sum hello.exe 
df7566c0ac93ea4a0b53f4af83d7fbc9  hello.exe
$ /opt/mxe/usr/bin/i686-w64-mingw32.static-gcc hello.c -o hello.exe
$ md5sum hello.exe 
bbf4ab22cbe2df1ddc21d6203e506eb5  hello.exe

PE compiler output is not stable through time.
(any clue?)

OK, there's still a long road ahead of us...


There are lots of other questions.
Is autoconf output reproducible?
Does it actually matter if autoconf is reproducible if upstream is providing a pre-generated ./configure?
If not what about all the documentation on making tarballs reproducible, along with the strip-nondeterminism tool?
Where do we draw the line between build and build environment?
What are the legal issues of distributing a docker-based build environment without every single matching distro source packages?

That was my modest contribution to practical reproducible builds documentation for developers, I'd very much like to hear about more of it.
Who knows, maybe in the near future we'll get reproducible official builds for Eclipse, ZAP, JetBrains, Krita, Android SDK/NDK... :)

24 March, 2017 08:40AM

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

RApiDatetime 0.0.1

Very happy to announce a new package of mine is now up on the CRAN repository network: RApiDatetime.

It provides six entry points for C-level functions of the R API for Date and Datetime calculations: asPOSIXlt and asPOSIXct convert between long and compact datetime representation, formatPOSIXlt and Rstrptime convert to and from character strings, and POSIXlt2D and D2POSIXlt convert between Date and POSIXlt datetime. These six functions are all fairly essential and useful, but not one of them was previously exported by R. Hence the need to put them together in the this package to complete the accessible API somewhat.

These should be helpful for fellow package authors as many of us have either our own partial copies of some of this code, or rather farm back out into R to get this done.

As a simple (yet real!) illustration, here is an actual Rcpp function which we could now cover at the C level rather than having to go back up to R (via Rcpp::Function()):

    inline Datetime::Datetime(const std::string &s, const std::string &fmt) {
        Rcpp::Function strptime("strptime");    // we cheat and call strptime() from R
        Rcpp::Function asPOSIXct("as.POSIXct"); // and we need to convert to POSIXct
        m_dt = Rcpp::as<double>(asPOSIXct(strptime(s, fmt)));
        update_tm();
    }

I had taken a first brief stab at this about two years ago, but never finished. With the recent emphasis on C-level function registration, coupled with a possible use case from anytime I more or less put this together last weekend.

It currently builds and tests fine on POSIX-alike operating systems. If someone with some skill and patience in working on Windows would like to help complete the Windows side of things then I would certainly welcome help and pull requests.

For questions or comments please use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

24 March, 2017 01:30AM

March 23, 2017

hackergotchi for Simon McVittie

Simon McVittie

GTK hackfest 2017: D-Bus communication with containers

At the GTK hackfest in London (which accidentally became mostly a Flatpak hackfest) I've mainly been looking into how to make D-Bus work better for app container technologies like Flatpak and Snap.

The initial motivating use cases are:

  • Portals: Portal authors need to be able to identify whether the container is being contacted by an uncontained process (running with the user's full privileges), or whether it is being contacted by a contained process (in a container created by Flatpak or Snap).

  • dconf: Currently, a contained app either has full read/write access to dconf, or no access. It should have read/write access to its own subtree of dconf configuration space, and no access to the rest.

At the moment, Flatpak runs a D-Bus proxy for each app instance that has access to D-Bus, connects to the appropriate bus on the app's behalf, and passes messages through. That proxy is in a container similar to the actual app instance, but not actually the same container; it is trusted to not pass messages through that it shouldn't pass through. The app-identification mechanism works in practice, but is Flatpak-specific, and has a known race condition due to process ID reuse and limitations in the metadata that the Linux kernel maintains for AF_UNIX sockets. In practice the use of X11 rather than Wayland in current systems is a much larger loophole in the container than this race condition, but we want to do better in future.

Meanwhile, Snap does its sandboxing with AppArmor, on kernels where it is enabled both at compile-time (Ubuntu, openSUSE, Debian, Debian derivatives like Tails) and at runtime (Ubuntu, openSUSE and Tails, but not Debian by default). Ubuntu's kernel has extra AppArmor features that haven't yet gone upstream, some of which provide reliable app identification via LSM labels, which dbus-daemon can learn by querying its AF_UNIX socket. However, other kernels like the ones in openSUSE and Debian don't have those. The access-control (AppArmor mediation) is implemented in upstream dbus-daemon, but again doesn't work portably, and is not sufficiently fine-grained or flexible to do some of the things we'll likely want to do, particularly in dconf.

After a lot of discussion with dconf maintainer Allison Lortie and Flatpak maintainer Alexander Larsson, I think I have a plan for fixing this.

This is all subject to change: see fd.o #100344 for the latest ideas.

Identity model

Each user (uid) has some uncontained processes, plus 0 or more containers.

The uncontained processes include dbus-daemon itself, desktop environment components such as gnome-session and gnome-shell, the container managers like Flatpak and Snap, and so on. They have the user's full privileges, and in particular they are allowed to do privileged things on the user's session bus (like running dbus-monitor), and act with the user's full privileges on the system bus. In generic information security jargon, they are the trusted computing base; in AppArmor jargon, they are unconfined.

The containers are Flatpak apps, or Snap apps, or other app-container technologies like Firejail and AppImage (if they adopt this mechanism, which I hope they will), or even a mixture (different app-container technologies can coexist on a single system). They are containers (or container instances) and not "apps", because in principle, you could install com.example.MyApp 1.0, run it, and while it's still running, upgrade to com.example.MyApp 2.0 and run that; you'd have two containers for the same app, perhaps with different permissions.

Each container has an container type, which is a reversed DNS name like org.flatpak or io.snapcraft representing the container technology, and an app identifier, an arbitrary non-empty string whose meaning is defined by the container technology. For Flatpak, that string would be another reversed DNS name like com.example.MyGreatApp; for Snap, as far as I can tell it would look like example-my-great-app.

The container technology can also put arbitrary metadata on the D-Bus representation of a container, again defined and namespaced by the container technology. For instance, Flatpak would use some serialization of the same fields that go in the Flatpak metadata file at the moment.

Finally, the container has an opaque container identifier identifying a particular container instance. For example, launching com.example.MyApp twice (maybe different versions or with different command-line options to flatpak run) might result in two containers with different privileges, so they need to have different container identifiers.

Contained server sockets

App-container managers like Flatpak and Snap would create an AF_UNIX socket inside the container, bind() it to an address that will be made available to the contained processes, and listen(), but not accept() any new connections. Instead, they would fd-pass the new socket to the dbus-daemon by calling a new method, and the dbus-daemon would proceed to accept() connections after the app-container manager has signalled that it has called both bind() and listen(). (See fd.o #100344 for full details.)

Processes inside the container must not be allowed to contact the AF_UNIX socket used by the wider, uncontained system - if they could, the dbus-daemon wouldn't be able to distinguish between them and uncontained processes and we'd be back where we started. Instead, they should have the new socket bind-mounted into their container's XDG_RUNTIME_DIR and connect to that, or have the new socket set as their DBUS_SESSION_BUS_ADDRESS and be prevented from connecting to the uncontained socket in some other way. Those familiar with the kdbus proposals a while ago might recognise this as being quite similar to kdbus' concept of endpoints, and I'm considering reusing that name.

Along with the socket, the container manager would pass in the container's identity and metadata, and the method would return a unique, opaque identifier for this particular container instance. The basic fields (container technology, technology-specific app ID, container ID) should probably be added to the result of GetConnectionCredentials(), and there should be a new API call to get all of those plus the arbitrary technology-specific metadata.

When a process from a container connects to the contained server socket, every message that it sends should also have the container instance ID in a new header field. This is OK even though dbus-daemon does not (in general) forbid sender-specified future header fields, because any dbus-daemon that supported this new feature would guarantee to set that header field correctly, the existing Flatpak D-Bus proxy already filters out unknown header fields, and adding this header field is only ever a reduction in privilege.

The reasoning for using the sender's container instance ID (as opposed to the sender's unique name) is for services like dconf to be able to treat multiple unique bus names as belonging to the same equivalence class of contained processes: instead of having to look up the container metadata once per unique name, dconf can look it up once per container instance the first time it sees a new identifier in a header field. For the second and subsequent unique names in the container, dconf can know that the container metadata and permissions are identical to the one it already saw.

Access control

In principle, we could have the new identification feature without adding any new access control, by keeping Flatpak's proxies. However, in the short term that would mean we'd be adding new API to set up a socket for a container without any access control, and having to keep the proxies anyway, which doesn't seem great; in the longer term, I think we'd find ourselves adding a second new API to set up a socket for a container with new access control. So we might as well bite the bullet and go for the version with access control immediately.

In principle, we could also avoid the need for new access control by ensuring that each service that will serve contained clients does its own. However, that makes it really hard to send broadcasts and not have them unintentionally leak information to contained clients - we would need to do something more like kdbus' approach to multicast, where services know who has subscribed to their multicast signals, and that is just not how dbus-daemon works at the moment. If we're going to have access control for broadcasts, it might as well also cover unicast.

The plan is that messages from containers to the outside world will be mediated by a new access control mechanism, in parallel with dbus-daemon's current support for firewall-style rules in the XML bus configuration, AppArmor mediation, and SELinux mediation. A message would only be allowed through if the XML configuration, the new container access control mechanism, and the LSM (if any) all agree it should be allowed.

By default, processes in a container can send broadcast signals, and send method calls and unicast signals to other processes in the same container. They can also receive method calls from outside the container (so that interfaces like org.freedesktop.Application can work), and send exactly one reply to each of those method calls. They cannot own bus names, communicate with other containers, or send file descriptors (which reduces the scope for denial of service).

Obviously, that's not going to be enough for a lot of contained apps, so we need a way to add more access. I'm intending this to be purely additive (start by denying everything except what is always allowed, then add new rules), not a mixture of adding and removing access like the current XML policy language.

There are two ways we've identified for rules to be added:

  • The container manager can pass a list of rules into the dbus-daemon at the time it attaches the contained server socket, and they'll be allowed. The obvious example is that an org.freedesktop.Application needs to be allowed to own its own bus name. Flatpak apps' implicit permission to talk to portals, and Flatpak metadata like org.gnome.SessionManager=talk, could also be added this way.

  • System or session services that are specifically designed to be used by untrusted clients, like the version of dconf that Allison is working on, could opt-in to having contained apps allowed to talk to them (effectively making them a generalization of Flatpak portals). The simplest such request, for something like a portal, is "allow connections from any container to contact this service"; but for dconf, we want to go a bit finer-grained, with all containers allowed to contact a single well-known rendezvous object path, and each container allowed to contact an additional object path subtree that is allocated by dconf on-demand for that app.

Initially, many contained apps would work in the first way (and in particular sockets=session-bus would add a rule that allows almost everything), while over time we'll probably want to head towards recommending more use of the second.

Related topics

Access control on the system bus

We talked about the possibility of using a very similar ruleset to control access to the system bus, as an alternative to the XML rules found in /etc/dbus-1/system.d and /usr/share/dbus-1/system.d. We didn't really come to a conclusion here.

Allison had the useful insight that the XML rules are acting like a firewall: they're something that is placed in front of potentially-broken services, and not part of the services themselves (which, as with firewalls like ufw, makes it seem rather odd when the services themselves install rules). D-Bus system services already have total control over what requests they will accept from D-Bus peers, and if they rely on the XML rules to mediate that access, they're essentially rejecting that responsibility and hoping the dbus-daemon will protect them. The D-Bus maintainers would much prefer it if system services took responsibility for their own access control (with or without using polkit), because fundamentally the system service is always going to understand its domain and its intended security model better than the dbus-daemon can.

Analogously, when a network service listens on all addresses and accepts requests from elsewhere on the LAN, we sometimes work around that by protecting it with a firewall, but the optimal resolution is to get that network service fixed to do proper authentication and access control instead.

For system services, we continue to recommend essentially this "firewall" configuration, filling in the ${} variables as appropriate:

<busconfig>
    <policy user="${the daemon uid under which the service runs}">
        <allow own="${the service's bus name}"/>
    </policy>
    <policy context="default">
        <allow send_destination="${the service's bus name}"/>
    </policy>
</busconfig>

We discussed the possibility of moving towards a model where the daemon uid to be allowed is written in the .service file, together with an opt-in to "modern D-Bus access control" that makes the "firewall" unnecessary; after some flag day when all significant system services follow that pattern, dbus-daemon would even have the option of no longer applying the "firewall" (moving to an allow-by-default model) and just refusing to activate system services that have not opted in to being safe to use without it. However, the "firewall" also protects system bus clients, and services like Avahi that are not bus-activatable, against unintended access, which is harder to solve via that approach; so this is going to take more thought.

For system services' clients that follow the "agent" pattern (BlueZ, polkit, NetworkManager, Geoclue), the correct "firewall" configuration is more complicated. At some point I'll try to write up a best-practice for these.

New header fields for the system bus

At the moment, it's harder than it needs to be to provide non-trivial access control on the system bus, because on receiving a method call, a service has to remember what was in the method call, then call GetConnectionCredentials() to find out who sent it, then only process the actual request when it has the information necessary to do access control.

Allison and I had hoped to resolve this by adding new D-Bus message header fields with the user ID, the LSM label, and other interesting facts for access control. These could be "opt-in" to avoid increasing message sizes for no reason: in particular, it is not typically useful for session services to receive the user ID, because only one user ID is allowed to connect to the session bus anyway.

Unfortunately, the dbus-daemon currently lets unknown fields through without modification. With hindsight this seems an unwise design choice, because header fields are a finite resource (there are 255 possible header fields) and are defined by the D-Bus Specification. The only field that can currently be trusted is the sender's unique name, because the dbus-daemon sets that field, overwriting the value in the original message (if any).

To make it safe to rely on the new fields, we would have to make the dbus-daemon filter out all unknown header fields, and introduce a mechanism for the service to check (during connection to the bus) whether the dbus-daemon is sufficiently new that it does so. If connected to an older dbus-daemon, the service would not be able to rely on the new fields being true, so it would have to ignore the new fields and treat them as unset. The specification is sufficiently vague that making new dbus-daemons filter out unknown header fields is a valid change (it just says that "Header fields with an unknown or unexpected field code must be ignored", without specifying who must ignore them, so having the dbus-daemon delete those fields seems spec-compliant).

This all seemed fine when we discussed it in person; but GDBus already has accessors for arbitrary header fields by numeric ID, and I'm concerned that this might mean it's too easy for a system service to be accidentally insecure: It would be natural (but wrong!) for an implementor to assume that if g_message_get_header (message, G_DBUS_MESSAGE_HEADER_FIELD_SENDER_UID) returned non-NULL, then that was guaranteed to be the correct, valid sender uid. As a result, fd.o #100317 might have to be abandoned. I think more thought is needed on that one.

Unrelated topics

As happens at any good meeting, we took the opportunity of high-bandwidth discussion to cover many useful things and several useless ones. Other discussions that I got into during the hackfest included, in no particular order:

  • .desktop file categories and how to adapt them for AppStream, perhaps involving using the .desktop vocabulary but relaxing some of the hierarchy restrictions so they behave more like "tags"
  • how to build a recommended/reference "app store" around Flatpak, aiming to host upstream-supported builds of major projects like LibreOffice
  • how Endless do their content-presenting and content-consuming apps in GTK, with a lot of "tile"-based UIs with automatic resizing and reflowing (similar to responsive design), and the applicability of similar widgets to GNOME and upstream GTK
  • whether and how to switch GNOME developer documentation to Hotdoc
  • whether pies, fish and chips or scotch eggs were the most British lunch available from Borough Market
  • the distinction between stout, mild and porter

More notes are available from the GNOME wiki.

Acknowledgements

The GTK hackfest was organised by GNOME and hosted by Red Hat and Endless. My attendance was sponsored by Collabora. Thanks to all the sponsors and organisers, and the developers and organisations who attended.

23 March, 2017 06:07PM

hackergotchi for Neil McGovern

Neil McGovern

GNOME ED Update – Week 12

New release!

In case you haven’t seen it yet, there’s a new GNOME release – 3.24! The release is the result of 6 months’ work by the GNOME community.

The new release is a major step forward for us, with new features and improvements, and some exciting developments in how we build applications. You can read more about it in the announcement and release notes.

As always, this release was made possible partially thanks to the Friends of GNOME project. In particular, it helped us provide a Core apps hackfest in Berlin last November, which had a direct impact on this release.

Conferences

GTK+ hackfest

I’ve just come back from the GTK+ hackfest in London – thanks to RedHat and Endless for sponsoring the venues! It was great to meet a load of people who are involved with GNOME and GTK, and some great discussions were had about Flatpak and the creation of a “FlatHub” – somewhere that people can get all their latest Flatpaks from.

LibrePlanet

As I’m writing this, I’m sitting on a train going to Heathrow, for my flight to LibrePlanet 2017! If you’re going to be there, come and say hi. I’ve a load of new stickers that have been produced as well so these can brighten up your laptop.

23 March, 2017 11:43AM by Neil McGovern

Mike Hommey

Why is the git-cinnabar master branch slower to clone?

Apart from the memory considerations, one thing that the data presented in the “When the memory allocator works against you” post that I haven’t touched in the followup posts is that there is a large difference in the time it takes to clone mozilla-central with git-cinnabar 0.4.0 vs. the master branch.

One thing that was mentioned in the first followup is that reducing the amount of realloc and substring copies made the cloning more than 15 minutes faster on master. But the same code exists in 0.4.0, so this isn’t part of the difference.

So what’s going on? Looking at the CPU usage during the clone is enlightening.

On 0.4.0:

On master:

(Note: the data gathering is flawed in some ways, which explains why the git-remote-hg process goes above 100%, which is not possible for this python process. The data is however good enough for the high level analysis that follows, so I didn’t bother to get something more acurate)

On 0.4.0, the git-cinnabar-helper process was saturating one CPU core during the File import phase, and the git-remote-hg process was saturating one CPU core during the Manifest import phase. Overall, the sum of both processes usually used more than one and a half core.

On master, however, the total of both processes barely uses more than one CPU core.

What happened?

This and that happened.

Essentially, before those changes, git-remote-hg would send instructions to git-fast-import (technically, git-cinnabar-helper, but in this case it’s only used as a wrapper for git-fast-import), and use marks to track the git objects that git-fast-import created.

After those changes, git-remote-hg asks git-fast-import the git object SHA1 of objects it just asked to be created. In other words, those changes replaced something asynchronous with something synchronous: while it used to be possible for git-remote-hg to work on the next file/manifest/changeset while git-fast-import was working on the previous one, it now waits.

The changes helped simplify the python code, but made the overall clone process much slower.

If I’m not mistaken, the only real use for that information is for the mapping of mercurial to git SHA1s, which is actually rarely used during the clone, except at the end, when storing it. So what I’m planning to do is to move that mapping to the git-cinnabar-helper process, which, incidentally, will kill not 2, but 3 birds with 1 stone:

  • It will restore the asynchronicity, obviously (at least, that’s the expected main outcome).
  • Storing the mapping in the git-cinnabar-helper process is very likely to take less memory than what it currently takes in the git-remote-hg process. Even if it doesn’t (which I doubt), that should still help stay under the 2GB limit of 32-bit processes.
  • The whole thing that spikes memory usage during the finalization phase, as seen in previous post, will just go away, because the git-cinnabar-helper process will just have prepared the git notes-like tree on its own.

So expect git-cinnabar 0.5 to get moar faster, and to use moar less memory.

23 March, 2017 07:38AM by glandium

Analyzing git-cinnabar memory use

In previous post, I was looking at the allocations git-cinnabar makes. While I had the data, I figured I’d also look how the memory use correlates with expectations based on repository data, to put things in perspective.

As a reminder, this is what the allocations look like (horizontal axis being the number of allocator function calls):

There are 7 different phases happening during a git clone using git-cinnabar, most of which can easily be identified on the graph above:

  • Negotiation.

    During this phase, git-cinnabar talks to the mercurial server to determine what needs to be pulled. Once that is done, a getbundle request is emitted, which response is read in the next three phases. This phase is essentially invisible on the graph.

  • Reading changeset data.

    The first thing that a mercurial server sends in the response for a getbundle request is changesets. They are sent in the RevChunk format. Translated to git, they become commit objects. But to create commit objects, we need the entire corresponding trees and files (blobs), which we don’t have yet. So we keep this data in memory.

    In the git clone analyzed here, there are 345643 changesets loaded in memory. Their raw size in RawChunk format is 237MB. I think by the end of this phase, we made 20 million allocator calls, have about 300MB of live data in about 840k allocations. (No certainty because I don’t actually have definite data that would allow to correlate between the phases and allocator calls, and the memory usage change between this phase and next is not as clear-cut as with other phases). This puts us at less than 3 live allocations per changeset, with “only” about 60MB overhead over the raw data.

  • Reading manifest data.

    In the stream we receive, manifests follow changesets. Each changeset points to one manifest ; several changesets can point to the same manifest. Manifests describe the content of the entire source code tree in a similar manner as git trees, except they are flat (there’s one manifest for the entire tree, where git trees would reference other git trees for sub directories). And like git trees, they only map file paths to file SHA1s. The way they are currently stored by git-cinnabar (which is planned to change) requires knowing the corresponding git SHA1s for those files, and we haven’t got those yet, so again, we keep everything in memory.

    In the git clone analyzed here, there are 345398 manifests loaded in memory. Their raw size in RawChunk format is 1.18GB. By the end of this phase, we made 23 million more allocator calls, and have about 1.52GB of live data in about 1.86M allocations. We’re still at less than 3 live allocations for each object (changeset or manifest) we’re keeping in memory, and barely over 100MB of overhead over the raw data, which, on average puts the overhead at 150 bytes per object.

    The three phases so far are relatively fast and account for a small part of the overall process, so they don’t appear clear-cut to each other, and don’t take much space on the graph.

  • Reading and Importing files.

    After the manifests, we finally get files data, grouped by path, such that we get all the file revisions of e.g. .cargo/.gitignore, followed by all the file revisions of .cargo/config.in, .clang-format, and so on. The data here doesn’t depend on anything else, so we can finally directly import the data.

    This means that for each revision, we actually expand the RawChunk into the full file data (RawChunks contain patches against a previous revision), and don’t keep the RawChunk around. We also don’t keep the full data after it was sent to the git-cinnabar-helper process (as far as cloning is concerned, it’s essentially a wrapper for git-fast-import), except for the previous revision of the file, which is likely the patch base for the next revision.

    We however keep in memory one or two things for each file revision: a mapping of its mercurial SHA1 and the corresponding git SHA1 of the imported data, and, when there is one, the file metadata (containing information about file copy/renames) that lives as a header in the file data in mercurial, but can’t be stored in the corresponding git blobs, otherwise we’d have irrelevant data in checkouts.

    On the graph, this is where there is a steady and rather long increase of both live allocations and memory usage, in stairs for the latter.

    In the git clone analyzed here, there are 2.02M file revisions, 78k of which have copy/move metadata for a cumulated size of 8.5MB of metadata. The raw size of the file revisions in RawChunk format is 3.85GB. The expanded data size is 67GB. By the end of this phase, we made 622 million more allocator calls, and peaked at about 2.05GB of live data in about 6.9M allocations. Compared to the beginning of this phase, that added about 530MB in 5 million allocations.

    File metadata is stored in memory as python dicts, with 2 entries each, instead of raw form for convenience and future-proofing, so that would be at least 3 allocations each: one for each value, one for the dict, and maybe one for the dict storage ; their keys are all the same and are probably interned by python, so wouldn’t count.

    As mentioned above, we store a mapping of mercurial to git SHA1s, so for each file that makes 2 allocations, 4.04M total. Plus the 230k or 310k from metadata. Let’s say 4.45M total. We’re short 550k allocations, but considering the numbers involved, it would take less than one allocation per file on average to go over this count.

    As for memory size, per this answer on stackoverflow, python strings have an overhead of 37 bytes, so each SHA1 (kept in hex form) will take 77 bytes (Note, that’s partly why I didn’t particularly care about storing them as binary form, that would only save 25%, not 50%). That’s 311MB just for the SHA1s, to which the size of the mapping dict needs to be added. If it were a plain array of pointers to keys and values, it would take 2 * 8 bytes per file, or about 32MB. But that would be a hash table with no room for more items (By the way, I suspect the stairs that can be seen on the requested and in-use bytes is the hash table being realloc()ed). Plus at least 290 bytes per dict for each of the 78k metadata, which is an additional 22M. All in all, 530MB doesn’t seem too much of a stretch.

  • Importing manifests.

    At this point, we’re done receiving data from the server, so we begin by dropping objects related to the bundle we got from the server. On the graph, I assume this is the big dip that can be observed after the initial increase in memory use, bringing us down to 5.6 million allocations and 1.92GB.

    Now begins the most time consuming process, as far as mozilla-central is concerned: transforming the manifests into git trees, while also storing enough data to be able to reconstruct manifests later (which is required to be able to pull from the mercurial server after the clone).

    So for each manifest, we expand the RawChunk into the full manifest data, and generate new git trees from that. The latter is mostly performed by the git-cinnabar-helper process. Once we’re done pushing data about a manifest to that process, we drop the corresponding data, except when we know it will be required later as the delta base for a subsequent RevChunk (which can happen in bundle2).

    As with file revisions, for each manifest, we keep track of the mapping of SHA1s between mercurial and git. We also keep a DAG of the manifests history (contrary to git trees, mercurial manifests track their ancestry ; files do too, but git-cinnabar doesn’t actually keep track of that separately ; it just relies on the manifests data to infer file ancestry).

    On the graph, this is where the number of live allocations increases while both requested and in-use bytes decrease, noisily.

    By the end of this phase, we made about 1 billion more allocator calls. Requested allocations went down to 1.02GB, for close to 7 million live allocations. Compared to the end of the dip at the beginning of this phase, that added 1.4 million allocations, and released 900MB. By now, we expect everything from the “Reading manifests” phase to have been released, which means we allocated around 620MB (1.52GB – 900MB), for a total of 3.26M additional allocations (1.4M + 1.86M).

    We have a dict for the SHA1s mapping (345k * 77 * 2 for strings, plus the hash table with 345k items, so at least 60MB), and the DAG, which, now that I’m looking at memory usage, I figure has the one of the possibly worst structure, using 2 sets for each node (at least 232 bytes per set, that’s at least 160MB, plus 2 hash tables with 345k items). I think 250MB for those data structures would be largely underestimated. It’s not hard to imagine them taking 620MB, because really, that DAG implementation is awful. The number of allocations expected from them would be around 1.4M (4 * 345k), but I might be missing something. That’s way less than the actual number, so it would be interesting to take a closer look, but not before doing something about the DAG itself.

    Fun fact: the amount of data we’re dealing with in this phase (the expanded size of all the manifests) is close to 2.9TB (yes, terabytes). With about 4700 seconds spent on this phase on a real clone (less with the release branch), we’re still handling more than 615MB per second.

  • Importing changesets.

    This is where we finally create the git commits corresponding to the mercurial changesets. For each changeset, we expand its RawChunk, find the git tree we created in the previous phase that corresponds to the associated manifest, and create a git commit for that tree, with the right date, author, and commit message. For data that appears in the mercurial changeset that can’t be stored or doesn’t make sense to store in the git commit (e.g. the manifest SHA1, the list of changed files[*], or some extra metadata like the source of rebases), we keep some metadata we’ll store in git notes later on.

    [*] Fun fact: the list of changed files stored in mercurial changesets does not necessarily match the list of files in a `git diff` between the corresponding git commit and its parents, for essentially two reasons:

    • Old buggy versions of mercurial have generated erroneous lists that are now there forever (they are part of what makes the changeset SHA1).
    • Mercurial may create new revisions for files even when the file content is not modified, most notably during merges (but that also happened on non-merges due to, presumably, bugs).
    … so we keep it verbatim.

    On the graph, this is where both requested and in-use bytes are only slightly increasing.

    By the end of this phase, we made about half a billion more allocator calls. Requested allocations went up to 1.06GB, for close to 7.7 million live allocations. Compared to the end of the previous phase, that added 700k allocations, and 400MB. By now, we expect everything from the “Reading changesets” phase to have been released (at least the raw data we kept there), which means we may have allocated at most around 700MB (400MB + 300MB), for a total of 1.5M additional allocations (700k + 840k).

    All these are extra data we keep for the next and final phase. It’s hard to evaluate the exact size we’d expect here in memory, but if we divide by the number of changesets (345k), that’s less than 5 allocations per changeset and less than 2KB per changeset, which is low enough not to raise eyebrows, at least for now.

  • Finalizing the clone.

    The final phase is where we actually go ahead storing the mappings between mercurial and git SHA1s (all 2.7M of them), the git notes where we store the data necessary to recreate mercurial changesets from git commits, and a cache for mercurial tags.

    On the graph, this is where the requested and in-use bytes, as well as the number of live allocations peak like crazy (up to 21M allocations for 2.27GB requested).

    This is very much unwanted, but easily explained with the current state of the code. The way the mappings between mercurial and git SHA1s are stored is via a tree similar to how git notes are stored. So for each mercurial SHA1, we have a file that points to the corresponding git SHA1 through git links for commits or directly for blobs (look at the output of git ls-tree -r refs/cinnabar/metadata^3 if you’re curious about the details). If I remember correctly, it’s faster if the tree is created with an ordered list of paths, so the code created a list of paths, and then sorted it to send commands to create the tree. The former creates a new str of length 42 and a tuple of 3 elements for each and every one of the 2.7M mappings. With the 37 bytes overhead by str instance and the 56 + 3 * 8 bytes per tuple, we have at least 429MB wasted. Creating the tree itself keeps the corresponding fast-import commands in a buffer, where each command is going to be a tuple of 2 elements: a pointer to a method, and a str of length between 90 and 93. That’s at least another 440MB wasted.

    I already fixed the first half, but the second half still needs addressing.

Overall, except for the stupid spike during the final phase, the manifest DAG and the glibc allocator runaway memory use described in previous posts, there is nothing terribly bad with the git-cinnabar memory usage, all things considered. Mozilla-central is just big.

The spike is already half addressed, and work is under way for the glibc allocator runaway memory use. The manifest DAG, interestingly, is actually mostly useless. It’s only used to track the heads of the DAG, and it’s very much possible to track heads of a DAG without actually storing the entire DAG. In fact, that’s what git-cinnabar already does for changeset heads… so we would only need to do the same for manifest heads.

One could argue that the 1.4GB of raw RevChunk data we’re keeping in memory for later user could be kept on disk instead. I haven’t done this so far because I didn’t want to have to handle temporary files (and answer questions like “where to put them?”, “what if there isn’t enough disk space there?”, “what if disk access is slow?”, etc.). But the majority of this data is from manifests. I’m already planning changes in how git-cinnabar stores manifests data that will actually allow to import them directly, instead of keeping them in memory until files are imported. This would instantly remove 1.18GB of memory usage. The downside, however, is that this would be more CPU intensive: Importing changesets will require creating the corresponding git trees, and getting the stored manifest data. I think it’s worth, though.

Finally, one thing that isn’t obvious here, but that was found while analyzing why RSS would be going up despite memory usage going down, is that git-cinnabar is doing way too many reallocations and substring allocations.

So let’s look at two metrics that hopefully will highlight the problem:

  • The cumulated amount of requested memory. That is, the sum of all sizes ever given to malloc, realloc, calloc, etc.
  • The compensated cumulated amount of requested memory (naming is hard). That is, the sum of all sizes ever given to malloc, calloc, etc. except realloc. For realloc, we only count the delta in size between what the size was before and after the realloc.

Assuming all the requested memory is filled at some point, the former gives us an upper bound to the amount of memory that is ever filled or copied (the amount that would be filled if no realloc was ever in-place), while the the latter gives us a lower bound (the amount that would be filled or copied if all reallocs were in-place).

Ideally, we’d want the upper and lower bounds to be close to each other (indicating few realloc calls), and the total amount at the end of the process to be as close as possible to the amount of data we’re handling (which we’ve seen is around 3TB).

… and this is clearly bad. Like, really bad. But we already knew that from the previous post, although it’s nice to put numbers on it. The lower bound is about twice the amount of data we’re handling, and the upper bound is more than 10 times that amount. Clearly, we can do better.

We’ll see how things evolve after the necessary code changes happen. Stay tuned.

23 March, 2017 04:30AM by glandium

March 22, 2017

Arturo Borrero González

IPv6 and CGNAT

IPv6

Today I ended reading an interesting article by the 4th spanish ISP regarding IPv6 and CGNAT. The article is in spanish, but I will translate the most important statements here.

Having a spanish Internet operator to talk about this subject is itself good news. We have been lacking any news regarding IPv6 in our country for years. I mean, no news from private operators. Public networks like the one where I develop my daily job has been offering native IPv6 since almost a decade…

The title of the article is “What is CGNAT and why is it used”.

They start by admiting that this technique is used to address the issue of IPv4 exhaustion. Good. They move on to say that IPv6 was designed to address IPv4 exhaustion. Great. Then, they state that ‘‘the internet network is not ready for IPv6 support’’. Also that ‘‘IPv6 has the handicap of many websites not supporting it’’. Sorry?

That is not true. If they refer to the core of internet (i.e, RIRs, interexchangers, root DNS servers, core BGP routers, etc) they have been working with IPv6 for ages now. If they refer to something else, for example Google, Wikipedia, Facebook, Twitter, Youtube, Netflix or any random hosting company, they do support IPv6 as well. Hosting companies which don’t support IPv6 are only a few, at least here in Europe.

The traffic to/from these services is clearly the vast majority of the traffic traveling in the wires nowaday. And they support IPv6.

The article continues defending CGNAT. They refer to IPv6 as an alternative to CGNAT. No, sorry, CGNAT is an alternative to you not doing your IPv6 homework.

The article ends by insinuing that CGNAT is more secure and useful than IPv6. That’s the final joke. They mention some absurd example of IP cams being accessed from the internet by anyone.

Sure, by using CGNAT you are indeed making the network practically one-way only. There exists RFC7021 which refers to the big issues of a CGNAT network. So, by using CGNAT you sacrifice a lot of usability in the name of security. This supposed security can be replicated by the most simple possible firewall, which could be deployed in Dual Stack IPv4/IPv6 using any modern firewalling system, like nftables.

(Here is a good blogpost of RFC7021 for spanish readers: Midiendo el impacto del Carrier-Grade NAT sobre las aplicaciones en red)

By the way, Google kindly provides some statistics regarding their IPv6 traffic. These stats clearly show an exponential growth:

Google IPv6 traffic

Others ISP operators are giving IPv6 strong precedence over IPv4, that’s the case of Verizon in USA: Verizon Static IP Changes IPv4 to Persistent Prefix IPv6.

My article seems a bit like a rant, but I couldn’t miss the oportunity to claim for native IPv6. None of the major spanish ISP have IPv6.

22 March, 2017 05:47PM

Michael Stapelberg

Debian stretch on the Raspberry Pi 3 (update)

I previously wrote about my Debian stretch preview image for the Raspberry Pi 3.

Now, I’m publishing an updated version, containing the following changes:

  • A new version of the upstream firmware makes the Ethernet MAC address persist across reboots.
  • Updated initramfs files (without updating the kernel) are now correctly copied to the VFAT boot partition.
  • The initramfs’s file system check now works as the required fsck binaries are now available.
  • The root file system is now resized to fill the available space of the SD card on first boot.
  • SSH access is now enabled, restricted via iptables to local network source addresses only.
  • The image uses the linux-image-4.9.0-2-arm64 4.9.13-1 kernel.

A couple of issues remain, notably the lack of HDMI, WiFi and bluetooth support (see wiki:RaspberryPi3 for details. Any help with fixing these issues is very welcome!

As a preview version (i.e. unofficial, unsupported, etc.) until all the necessary bits and pieces are in place to build images in a proper place in Debian, I built and uploaded the resulting image. Find it at https://people.debian.org/~stapelberg/raspberrypi3/2017-03-22/. To install the image, insert the SD card into your computer (I’m assuming it’s available as /dev/sdb) and copy the image onto it:

$ wget https://people.debian.org/~stapelberg/raspberrypi3/2017-03-22/2017-03-22-raspberry-pi-3-stretch-PREVIEW.img
$ sudo dd if=2017-03-22-raspberry-pi-3-stretch-PREVIEW.img of=/dev/sdb bs=5M

If resolving client-supplied DHCP hostnames works in your network, you should be able to log into the Raspberry Pi 3 using SSH after booting it:

$ ssh root@rpi3
# Password is “raspberry”

22 March, 2017 04:36PM

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

Suggests != Depends

A number of packages on CRAN use Suggests: casually.

They list other packages as "not required" in Suggests: -- as opposed to absolutely required via Imports: or the older Depends: -- yet do not test for their use in either examples or, more commonly, unit tests.

So e.g. the unit tests are bound to fail because, well, Suggests != Depends.

This has been accomodated for many years by all parties involved by treating Suggests as a Depends and installing unconditionally. As I understand it, CRAN appears to flip a switch to automatically install all Suggests from major repositories glossing over what I consider to be a packaging shortcoming. (As an aside, treatment of Additonal_repositories: is indeed optional; Brooke Anderson and I have a fine paper under review on this)

I spend a fair amount of time with reverse dependency ("revdep") checks of packages I maintain, and I will no longer accomodate these packages.

These revdep checks take long enough as it is, so I will now blacklist these packages that are guaranteed to fail when their "optional" dependencies are not present.

Writing R Extensions says in Section 1.1.3

All packages that are needed10 to successfully run R CMD check on the package must be listed in one of ‘Depends’ or ‘Suggests’ or ‘Imports’. Packages used to run examples or tests conditionally (e.g. via if(require(pkgname))) should be listed in ‘Suggests’ or ‘Enhances’. (This allows checkers to ensure that all the packages needed for a complete check are installed.)

In particular, packages providing “only” data for examples or vignettes should be listed in ‘Suggests’ rather than ‘Depends’ in order to make lean installations possible.

[...]

It used to be common practice to use require calls for packages listed in ‘Suggests’ in functions which used their functionality, but nowadays it is better to access such functionality via :: calls.

and continues in Section 1.1.3.1

Note that someone wanting to run the examples/tests/vignettes may not have a suggested package available (and it may not even be possible to install it for that platform). The recommendation used to be to make their use conditional via if(require("pkgname"))): this is fine if that conditioning is done in examples/tests/vignettes.

I will now exercise my option to use 'lean installations' as discussed here. If you want your package included in tests I run, please make sure it tests successfully when only its required packages are present.

22 March, 2017 03:16PM

Mike Hommey

When the memory allocator works against you, part 2

This is a followup to the “When the memory allocator works against you” post from a few days ago. You may want to read that one first if you haven’t, and come back. In case you don’t or didn’t read it, it was all about memory consumption during a git clone of the mozilla-central mercurial repository using git-cinnabar, and how the glibc memory allocator is using more than one would expect. This post is going to explore how/why it’s happening.

I happen to have written a basic memory allocation logger for Firefox, so I used it to log all the allocations happening during a git clone exhibiting the runaway memory increase behavior (using a python that doesn’t use its own allocator for small allocations).

The result was a 6.5 GB log file (compressed with zstd ; 125 GB uncompressed!) with 2.7 billion calls to malloc, calloc, free, and realloc, recorded across (mostly) 2 processes (the python git-remote-hg process and the native git-cinnabar-helper process ; there are other short-lived processes involved, but they do less than 5000 calls in total).

The vast majority of those 2.7 billion calls is done by the python git-remote-hg process: 2.34 billion calls. We’ll only focus on this process.

Replaying those 2.34 billion calls with a program that reads the log allowed to reproduce the runaway memory increase behavior to some extent. I went an extra mile and modified glibc’s realloc code in memory so it doesn’t call memcpy, to make things faster. I also ran under setarch x86_64 -R to disable ASLR for reproducible results (two consecutive runs return the exact same numbers, which doesn’t happen with ASLR enabled).

I also modified the program to report the number of live allocations (allocations that haven’t been freed yet), and the cumulated size of the actually requested allocations (that is, the sum of all the sizes given to malloc, calloc, and realloc calls for live allocations, as opposed to what the memory allocator really allocated, which can be more, per malloc_usable_size).

RSS was not tracked because the allocations are never filled to make things faster, such that pages for large allocations are never dirty, and RSS doesn’t grow as much because of that.

Full disclosure: it turns out the “system bytes” and “in-use bytes” numbers I had been collecting in the previous post were smaller than what they should have been, and were excluding memory that the glibc memory allocator would have mmap()ed. That however doesn’t affect the trends that had been witnessed. The data below is corrected.

(Note that in the graph above and the graphs that follow, the horizontal axis represents the number of allocator function calls performed)

While I was here, I figured I’d check how mozjemalloc performs, and it has a better behavior (although it has more overhead).

What doesn’t appear on this graph, though, is that mozjemalloc also tells the OS to drop some pages even if it keeps them mapped (madvise(MADV_DONTNEED)), so in practice, it is possible the actual RSS decreases too.

And jemalloc 4.5:

(It looks like it has better memory usage than mozjemalloc for this use case, but its stats are being thrown off at some point, I’ll have to investigate)

Going back to the first graph, let’s get a closer look at what the allocations look like when the “system bytes” number is increasing a lot. The highlights in the following graphs indicate the range the next graph will be showing.

So what we have here is a bunch of small allocations (small enough that they don’t seem to move the “requested” line ; most under 512 bytes, so under normal circumstances, they would be allocated by python, a few between 512 and 2048 bytes), and a few large allocations, one of which triggers a bump in memory use.

What can appear weird at first glance is that we have a large allocation not requiring more system memory, later followed by a smaller one that does. What the allocations actually look like is the following:


void *ptr0 = malloc(4850928); // #1391340418
void *ptr1 = realloc(some_old_ptr, 8000835); // #1391340419
free(ptr0); // #1391340420
ptr1 = realloc(ptr1, 8000925); // #1391340421
/* ... */
void *ptrn = malloc(879931); // #1391340465
ptr1 = realloc(ptr1, 8880819); // #1391340466
free(ptrn); // #1391340467

As it turns out, inspecting all the live allocations at that point, while there was a hole large enough to do the first two reallocs (the second actually happens in place), at the point of the third one, there wasn’t a large enough hole to fit 8.8MB.

What inspecting the live allocations also reveals, is that there is a large number of large holes between all the allocated memory ranges, presumably coming from previous similar patterns. There are, in fact, 91 holes larger than 1MB, 24 of which are larger than 8MB. It’s the accumulation of those holes that can’t be used to fulfil larger allocations that makes the memory use explode. And there aren’t enough small allocations happening to fill those holes. In fact, the global trend is for less and less memory to be allocated, so, smaller holes are also being poked all the time.

Really, it’s all a straightforward case of memory fragmentation. The reason it tends not to happen with jemalloc is that jemalloc groups allocations by sizes, which the glibc allocator doesn’t seem to be doing. The following is how we got a hole that couldn’t fit the 8.8MB allocation in the first place:


ptr1 = realloc(ptr1, 8880467); // #1391324989; ptr1 is 0x5555de145600
/* ... */
void *ptrx = malloc(232); // #1391325001; ptrx is 0x5555de9bd760 ; that is 13 bytes after the end of ptr1.
/* ... */
free(ptr1); // #1391325728; this leaves a hole of 8880480 bytes at 0x5555de145600.

All would go well if ptrx was free()d, but it looks like it’s long-lived. At least, it’s still allocated by the time we reach the allocator call #1391340466. And since the hole is 339 bytes too short for the requested allocation, the allocator has no other choice than request more memory to the system.

What’s bothering, though, is that the allocator chose to allocate ptrx in the space following ptr1, when it allocated similarly sized buffers after allocating ptr1 and before allocating ptrx in completely different places, and while there are plenty of holes in the allocated memory where it could fit.

Interestingly enough, ptrx is a 232 bytes allocation, which means under normal circumstances, python itself would be allocating it. In all likeliness, when the python allocator is enabled, it’s allocations larger than 512 bytes that become obstacles to the larger allocations. Another possibility is that the 256KB fragments that the python allocator itself allocates to hold its own allocations become the obstacles (my original hypothesis). I think the former is more likely, though, putting back the blame entirely on glibc’s shoulders.

Now, it looks like the allocation pattern we have here is suboptimal, so I re-ran a git clone under a debugger to catch when a realloc() for 8880819 bytes happens (the size is peculiar enough that it only happened once in the allocation log). But doing that with a conditional breakpoint is just too slow, so I injected a realloc wrapper with LD_PRELOAD that sends a SIGTRAP signal to the process, so that an attached debugger can catch it.

Thanks to the support for python in gdb, it was then posible to pinpoint the exact python instructions that made the realloc() call (it didn’t come as a surprise ; in fact, that was one of the places I had in mind, but I wanted definite proof):


new = ''
end = 0
# ...
for diff in RevDiff(rev_patch):
    new += data[end:diff.start]
    new += diff.text_data
    end = diff.end
    # ...
new += data[end:]

What happens here is that we’re creating a mercurial manifest we got from the server in patch form against a previous manifest. So data contains the original manifest, and rev_patch the patch. The patch essentially contains instructions of the form “replace n bytes at offset o with the content c“.

The code here just does that in the most straightforward way, implementation-wise, but also, it turns out, possibly the worst way.

So let’s unroll this loop over a couple iterations:

new = ''

This allocates an empty str object. In fact, this doesn’t actually allocate anything, and only creates a pointer to an interned representation of an empty string.

new += data[0:diff.start]

This is where things start to get awful. data is a str, so data[0:diff.start] creates a new, separate, str for the substring. One allocation, one copy.

Then appends it to new. Fortunately, CPython is smart enough, and just assigns data[0:diff.start] to new. This can easily be verified with the CPython REPL:

>>> foo = ''
>>> bar = 'bar'
>>> foo += bar
>>> foo is bar
True

(and this is not happening because the example string is small here ; it also happens with larger strings, like 'bar' * 42000)

Back to our loop:

new += diff.text_data

Now, new is realloc()ated to have the right size to fit the appended text in it, and the contents of diff.text_data is copied. One realloc, one copy.

Let’s go for a second iteration:

new += data[diff.end:new_diff.start]

Here again, we’re doing an allocation for the substring, and one copy. Then new is realloc()ated again to append the substring, which is an additional copy.

new += new_diff.text_data

new is realloc()ated yet again to append the contents of new_diff.text_data.

We now finish with:

new += data[new_diff.end:]

which, again creates a substring from the data, and then proceeds to realloc()ate new one freaking more time.

That’s a lot of malloc()s and realloc()s to be doing…

  • It is possible to limit the number of realloc()s by using new = bytearray() instead of new = ''. I haven’t looked in the CPython code what the growth strategy is, but, for example, appending a 4KB string to a 500KB bytearray makes it grow to 600KB instead of 504KB, like what happens when using str.
  • It is possible to avoid realloc()s completely by preallocating the right size for the bytearray (with bytearray(size)), but that requires looping over the patch once first to know the new size, or using an estimate (the new manifest can’t be larger than the size of the previous manifest + the size of the patch) and truncating later (although I’m not sure it’s possible to truncate a bytearray without a realloc()). As a downside, this initializes the buffer with null bytes, which is a waste of time.
  • Another possibility is to reuse bytearrays previously allocated for previous manifests.
  • Yet another possibility is to accumulate the strings to append and use ''.join(). CPython is smart enough to create a single allocation for the total size in that case. That would be the most convenient solution, but see below.
  • It is possible to avoid the intermediate allocations and copies for substrings from the original manifest by using memoryview.
  • Unfortunately, you can’t use ''.join() on a list of memoryviews before Python 3.4.

After modifying the code to implement the first and fifth items, memory usage during a git clone of mozilla-central looks like the following (with the python allocator enabled):

(Note this hasn’t actually landed on the master branch yet)

Compared to what it looked like before, this is largely better. But that’s not the only difference: the clone was also about 1000 seconds faster. That’s more than 15 minutes! But that’s not all so surprising when you know the volumes of data handled here. More insight about this coming in an upcoming post.

But while the changes outlined above make the glibc allocator behavior less likely to happen, it doesn’t totally obliviate it. In fact, it seems it is still happening by the end of the manifest import phase. We’re still allocating increasingly large temporary buffers because the size of the imported manifests grows larger and larger, and every one of them is the result of patching a previous one.

The only way to avoid those large allocations creating holes would be to avoid doing them in the first place. My first attempt at doing that, keeping manifests as lists of lines instead of raw buffers, worked, but was terribly slow. So slow, in fact, that I had to stop a clone early and estimated the process would likely have taken a couple days. Iterating over multiple generators at the same time, a lot, kills performance, apparently. I’ll have to try with significantly less of that.

22 March, 2017 06:57AM by glandium

Elena 'valhalla' Grandi

XMPP VirtualHosts, SRV records and letsencrypt certificates

XMPP VirtualHosts, SRV records and letsencrypt certificates

When I set up my XMPP server, a friend of mine asked if I was willing to have a virtualhost with his domain on my server, using the same address as the email.

Setting up prosody and the SRV record on the DNS was quite easy, but then we stumbled on the issue of certificates: of course we would like to use letsencrypt, but as far as we know that means that we would have to setup something custom so that the certificate gets renewed on his server and then sent to mine, and that looks more of a hassle than just him setting up his own prosody/ejabberd on his server.

So I was wondering: dear lazyweb, did any of you have the same issue and already came up with a solution that is easy to implement and trivial to maintain that we missed?

22 March, 2017 06:32AM by Elena ``of Valhalla''

hackergotchi for Clint Adams

Clint Adams

Then Moises claimed that T.G.I. Friday's was acceptable

“Itʼs really sad listening to a friend talk about how he doesnʼt care for his wife and doesnʼt find her attractive anymore,” he whined, “while at the same time talking about the kid she is pregnant with—obviously they havenʼt had sex in awhile—and how though he only wants one kid, she wants multiple so they will probably have more. He said he couldnʼt afford to have a divorce. He literally said that one morning, watching her get dressed he laughed and told her, ‘Your boobs look weird.’ She didnʼt like that. I reminded him that they will continue to age. That didnʼt make him feel good. He said that he realized before getting married that he thought he was a good person, but now heʼs realizing heʼs a bad person. He said he was a misogynist. I said, ‘Worse, youʼre the type of misogynist who pretends to be a feminist.’ He agreed. He lived in Park Slope, but he moved once they became pregnant.”

“Good luck finding a kid-friendly restaurant,” she said.

Posted on 2017-03-22
Tags: umismu

22 March, 2017 02:46AM

March 21, 2017

hackergotchi for Steinar H. Gunderson

Steinar H. Gunderson

10-bit H.264 support

Following my previous tests about 10-bit H.264, I did some more practical tests; since media.xiph.org is up again, I did some tests with actual 10-bit input. The results were pretty similar, although of course 4K 60 fps organic content is going to be different at times from the partially rendered 1080p 24 fps clip I used.

But I also tested browser support, with good help from people on IRC. It was every bit as bad as I feared: Chrome on desktop (Windows, Linux, macOS) supports 10-bit H.264, although of course without hardware acceleration. Chrome on Android does not. Firefox does not (it tries on macOS, but plays back buggy). iOS does not. VLC does; I didn't try a lot of media players, but obviously ffmpeg-based players should do quite well. I haven't tried Chromecast, but I doubt it works.

So I guess that yes, it really is 8-bit H.264 or 10-bit HEVC—but I haven't tested the latter yet either :-)

21 March, 2017 11:41PM

hackergotchi for Matthew Garrett

Matthew Garrett

Announcing the Shim review process

Shim has been hugely successful, to the point of being used by the majority of significant Linux distributions and many other third party products (even, apparently, Solaris). The aim was to ensure that it would remain possible to install free operating systems on UEFI Secure Boot platforms while still allowing machine owners to replace their bootloaders and kernels, and it's achieved this goal.

However, a legitimate criticism has been that there's very little transparency in Microsoft's signing process. Some people have waited for significant periods of time before being receiving a response. A large part of this is simply that demand has been greater than expected, and Microsoft aren't in the best position to review code that they didn't write in the first place.

To that end, we're adopting a new model. A mailing list has been created at shim-review@lists.freedesktop.org, and members of this list will review submissions and provide a recommendation to Microsoft on whether these should be signed or not. The current set of expectations around binaries to be signed documented here and the current process here - it is expected that this will evolve slightly as we get used to the process, and we'll provide a more formal set of documentation once things have settled down.

This is a new initiative and one that will probably take a little while to get working smoothly, but we hope it'll make it much easier to get signed releases of Shim out without compromising security in the process.

comment count unavailable comments

21 March, 2017 08:29PM

Reproducible builds folks

Reproducible Builds: week 99 in Stretch cycle

Here's what happened in the Reproducible Builds effort between Sunday March 12 and Saturday March 18 2017:

Upcoming events

Reproducible Builds Hackathon Hamburg 2017

The Reproducible Builds Hamburg Hackathon 2017, or RB-HH-2017 for short is a 3 day hacking event taking place May 5th-7th in the CCC Hamburg Hackerspace located inside Frappant, as collective art space located in a historical monument in Hamburg, Germany.

The aim of the hackathon is to spent some days working on Reproducible Builds in every distribution and project. The event is open to anybody interested on working on Reproducible Builds issues, with or without prior experience!

Accomodation is available and travel sponsorship may be available by agreement. Please register your interest as soon as possible.

Reproducible Builds Summit Berlin 2016

This is just a quick note, that all the pads we've written during the Berlin summit in December 2016 are now online (thanks to Holger), nicely complementing the report by Aspiration Tech.

Request For Comments for new specification: BUILD_PATH_PREFIX_MAP

Ximin Luo posted a draft version of our BUILD_PATH_PREFIX_MAP specification for passing build-time paths between high-level and low-level build tools. This is meant to help eliminate irreproducibility caused by different paths being used at build time. At the time of writing, this affects an estimated 15-20% of 25000 Debian packages.

This is a continuation of an older proposal SOURCE_PREFIX_MAP, which has been updated based on feedback on our patches from GCC upstream, attendees of our Berlin 2016 summit, and participants on our mailing list. Thanks to everyone that contributed!

The specification also contains runnable source code examples and test cases; see our git repo.

Please comment on this draft ASAP - we plan to release version 1.0 of this in a few weeks.

Toolchain changes

  • #857632 apt: ignore the currently running kernel if attempting a reproducible build (Chris Lamb)
  • #857803 shadow: Make the sp_lstchg shadow field reproducible. (Chris Lamb)
  • #857892 fontconfig: please make the cache files reproducible (Chris Lamb)

Packages reviewed and fixed, and bugs filed

Chris Lamb:

Reviews of unreproducible packages

5 package reviews have been added, 274 have been updated and 800 have been removed in this week, adding to our knowledge about identified issues.

1 issue type has been added:

Weekly QA work

During our reproducibility testing, FTBFS bugs have been detected and reported by:

  • Chris Lamb (5)
  • Mattia Rizzolo (1)

diffoscope development

diffoscope 79 and 80 were uploaded to experimental by Chris Lamb. It included contributions from:

Chris Lamb:

  • Ensure that we really are using ImageMagick. (Closes: #857940)
  • Extract SquashFS images in one go rather than per-file, speeding up (eg.) Tails ISO comparison by ~10x.
  • Support newer versions of cbfstool to avoid test failures. (Closes: #856446)
  • Skip icc test that varies on endian if the Debian-specific patch is not present. (Closes: #856447)
  • Compare GIF images using gifbuild. (Closes: #857610)
  • Various other code quality, build and UI improvements.

Maria Glukhova:

  • Improve AndroidManifest.xml comparison for APK files. (Closes: #850758)

strip-nondeterminism development

strip-nondeterminism 0.032-1 was uploaded to unstable by Chris Lamb. It included contributions from:

Chris Lamb:

  • Fix a possible endless loop while stripping ar files due to trusting the file's file size data. Thanks to Tobias Stoeckmann for the report, patch and testcase. (Closes: #857975)
  • Add support for testing files we should reject.

tests.reproducible-builds.org

Misc.

This week's edition was written by Ximin Luo, Holger Levsen and Chris Lamb & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

21 March, 2017 06:44PM

hackergotchi for Tanguy Ortolo

Tanguy Ortolo

Bad support of ZIP archives with extra fields

For sharing multiple files, it is often convenient to pack them into an archive, and the most widely supported format to do so is probably ZIP. Under *nix, you can archive a directory with Info-ZIP:

% zip -r something.zip something/

(When you have several files, it is recommended to archive them in a directory, to avoid cluttering the directory where people will extract them.)

Unsupported ZIP archive

Unfortunately, while we would expect ZIP files to be widely supported, I found out that this is not always the case, and I had many recipients failing to open them under operating systems such as iOS.

Avoid extra fields

That issue seems to be linked to the usage of extra file attributes, that are enabled by default, in order to store Unix file metadata. The field designed to store such extra attributes was designed from the beginning so each implementation can take into account attributes it supports and ignore any other ones, but some buggy ZIP implementation appear not to function at all with them.

Therefore, unless you actually need to preserve Unix file metadata, you should avoid using extra fields. With Info-ZIP, you would have to add the option -X:

% zip -rX something.zip something/

21 March, 2017 06:33PM by Tanguy

March 20, 2017

hackergotchi for Matthew Garrett

Matthew Garrett

Buying a Utah teapot

The Utah teapot was one of the early 3D reference objects. It's canonically a Melitta but hasn't been part of their range in a long time, so I'd been watching Ebay in the hope of one turning up. Until last week, when I discovered that a company called Friesland had apparently bought a chunk of Melitta's range some years ago and sell the original teapot[1]. I've just ordered one, and am utterly unreasonably excited about this.

Update: Friesland have apparently always produced the Utah teapot, but were part of the Melitta group for some time - they didn't buy the range from Melitta.

[1] They have them in 0.35, 0.85 and 1.4 litre sizes. I believe (based on the measurements here) that the 1.4 litre one matches the Utah teapot.

comment count unavailable comments

20 March, 2017 08:45PM

hackergotchi for Shirish Agarwal

Shirish Agarwal

Tale of two countries, India and Canada

Apologies – the first blog post got sent out by mistake.

Weather comparisons between the two countries

Last year, I had come to know that this year’s debconf is happening in Canada, a cold country. Hence, few weeks/month back, I started trying to find information online when I stumbled across few discussion boards where people were discussing about Innerwear and Outerwear and I couldn’t understand what that was all about. Then somehow stumbled across this Video, which is of a game called the Long Dark and just seeing couple of episodes it became pretty clear to me why the people there were obsessing with getting the right clothes and everything about it. Couple of Debconf people were talking about the weather in Montreal, and surprise, surprise it was snowing there, in fact supposed to be near the storm of the century. Was amazed to see that they have a website to track how much snow has been lifted.

If we compare that to Pune, India weather-wise we are polar opposites. There used to be a time, when I was very young, maybe 5 yrs. old that once the weather went above 30 degree celsius, rains would fall, but now its gonna touch 40 degrees soon. And April and May, the two hottest months are yet to come.

China Gate

Before I venture further, I was gifted the book ‘China Gate‘ written by an author named William Arnold. When I read the cover and the back cover, it seemed the story was set between China and Taiwan, later when I started reading it, it shares history of Taiwan going back 200 or so odd years. This became relevant as next year’s Debconf, Debconf 2018 will be in Taiwan, yes in Asia very much near to India. I am ashamed to say that except for the Tiananmen Square Massacre and the Chinese High-Speed Rail there wasn’t much that I knew. According to the book, and I’m paraphrasing here the gist I got was that for a long time, the Americans promised Taiwan it will be an Independent country forever, but due to budgetary and other political constraints, the United States took the same stand as China from 1979. Interestingly, now it seems Mr. Trump wants to again recognize Taiwan as a separate entity from China itself but as is with Mr. Trump you can’t be sure of why he does, what he does. Is it just a manoeuvrer designed to out-smart the chinese and have a trade war or something else, only time will tell.

One thing which hasn’t been shared in the book but came to know via web is that Taiwan calls itself ‘Republic of China’ . If Taiwan wants to be independent then why the name ‘Republic of China’ ? Doesn’t that strengthen China’s claim that Taiwan is an integral part of China. I don’t understand it.

The book does seduce you into thinking that the events are happening in real-time, as in happening now.

That’s enough OT for now.

null

Population Density

As well in the game and whatever I could find on the web, Canada seems to be on the lower side as far as population is concerned. IIRC, few years back, Canadians invited Indian farmers and gave them large land-holdings for over 100 years on some small pittance. While the link I have shared is from 2006, I read it online and in newspapers even as late as in 2013/2014. The point being there seems to be lot of open spaces in Canada, whereas in India we fight for even one inch literally, due to overpopulation. This sharing reminded me of ‘Mark of Gideon‘. While I was young, I didn’t understand the political meaning of it and still struggle to understand about whom the show was talking about. Was it India, Africa or some other continent they were talking about ?

This also becomes obvious when you figure out the surface area of the two countries. When I had started to learn about Canada, I had no idea, nor a clue that Canada is three times the size of India. And this is when I know India is a large country. but seeing that Canada is thrice larger just boggled my mind. As a typical urbanite, would probably become mad if in a rural area in Canada. Montreal, however seems to be something like Gwalior or Bangalore before IT stormed in, seems to be a place where people can work, play and have quite a few gardens as well.

Rail

This is one thing that is similar in both the great countries. India has Indian Railways and while the Canadians have their own mountain railway called viarail. India chugs on its 68k kilometre network, Canada is at fourth position with 52k network. With thrice the land size, it should have been somewhere where Russia is or even better than them. It would be interesting if a Canadian/s comment about their railway network and why it is so bad in terms of reach.

As far as food is concerned, somebody shared this

Also, have no idea if Canadian trains are as entertaining as Indian ones, in terms of diverse group of people as well as variety of food to eat as also shared a bit in the video. I am not aware whether Via Rail is the only network operator and there are other network operators unlike Indian Railways which has monopoly on most of the operations.

Countries which have first past the post system - Wikipedia

Business houses, Political Families

This is again something that is similar in both the countries, it seems (from afar) that its only few business houses and more importantly political families which have governed for years. From what little I could understand, both India and Canada have first past the post system which as shared by its critics is unfair to new and small parties. It would be interesting to see if Canada does a re-think. For India, it would need a massive public education outreach policy and implementation. We just had elections in 5 states of India with U.P. (with respect to area-size and population density) and from the last several years, the EVM’s (Electronic Voting Machines) tries to make sure that nobody could know which area which party got the most votes. This is to make sure the winning party is not able to take revenge on people or areas which did not vote for them. Instead you have general region counting of votes with probably even the Election Commission not knowing which EVM went to what area and what results are there in sort of double-blind methodology.

As far as Business houses are concerned, I am guessing it’s the same world-over, only certain people hold the wealth while majority of us are in hard-working, non-wealthy status.

Northern Lights - Aurora Borealis

Northern Lights, Aurora Borealis

Apart from all the social activities that Montreal is famous for, somebody told/shared with me that it is possible to see the Northern Lights, Aurora Borealis can be seen in Canada. I dunno how true or not it is, while probably in Montreal it isn’t possible to see due to light pollution, but maybe around 40-50 kms. from the city ? Can people see it from Canada ? IF yes, how far would you have to go ? Are there any companies or people who take people to see the Northern Lights.

While I still have to apply for bursary, and if that gets ok, then try getting the visa, but if that goes through, apart from debconf and social activities happening in and around Montreal, Museums, Music etc. , this would be something I would like to experience if it’s possible. While I certainly would have to be prepared for the cold that would be, if it’s possible, no offence to debconf or anybody else but it probably would be the highlight of the entire trip if its possible. This should be called/labelled as the greatest show on earth TM.


Filed under: Miscellenous Tagged: # Population Density, #Area size, #Aurora Borealis, #Canada, #Trains, DebConf, India, politics

20 March, 2017 07:38PM by shirishag75

Canada and India, similarities and differences.

Weather comparisons between the two countries

Few days/weeks back, I had come to know that Canada, where this year’s debconf is happening is cold country. I started trying to find information online when I stumbled across few boards where people were discussing about innerwear and outerwear and I couldn’t understand what that was all about. Then somehow stumbled across this game, it’s called the Long Dark and just seeing couple of episodes it became pretty clear to me why the people there were obsessing with getting the right clothes and everything about it. Couple of Debconf people were talking about weather in Montreal, and surprise, surprise it was snowing there, in fact supposed to be near the storm of the century. Was amazed to see that they have a website to track how much snow has been lifted.

If we compare that to Pune, India weather-wise we are polar opposites. There used to be a time, when I was very young, maybe 5 yrs. old that once the weather went above 30 degree celcius, rains would fall, but now its gonna touch 40 degrees soon. And April and May, the two hottest months are yet to come.

China Gate

Before I venture further, I was gifted the book ‘China Gate‘ written by an author named William Arnold. When I read the cover and the backcover, it seemed the story was set between China and Taiwan, later when I started reading it, it shares history of Taiwan going back 200 or so odd years. This became relevant as next year’s Debconf, Debconf 2018 will be in Taiwan, yes in Asia very much near to India. I am ashamed to say that except for the Tiananmen Square Massacre and the Chinese High-Speed Rail there wasn’t much that I knew. According to the book, and I’m paraphrasing here the gist I got was that for a long time, the Americans promised Taiwan it will be an Independent country forever, but due to budgetary and other political constraints, the United States took the same stand as China from 1979 and now it seems Mr. Trump wants to again recognize Taiwan as a separate entity from China itself.

One thing which hasn’t been shared in the book but came to know via web is that Taiwan calls itself ‘Republic of China’ . If Taiwan wants to be independent then why the name ‘Republic of China’ ? Doesn’t that strengthen China’s claim that Taiwan is an integral part of China. I don’t understand it.

The book does seduce you into thinking that the events are happening in real-time, as in happening now.

That’s enough OT for now.

null

Population Density

As well in the game and whatever I could find on the web, Canada seems to be on the lower side as far as population is concerned. IIRC, few years back, Canadians invited Indian farmers and gave them large land-holdings for over 100 years on some small pittance. While the link I have shared is from 2006, I read it online and in newspapers even as late as in 2013/2014. The point being there seems to be lot of open spaces in Canada, whereas in India we fight for even one inch literally, due to overpopulation. This sharing reminded me of ‘Mark of Gideon‘. While I was young, I didn’t understand the political meaning of it and still struggle to understand about whom the show was talking about. Was it India, Africa or some other continent they were talking about ?

This also becomes obvious when you figure out the surface area of the two countries. When I had started to learn about Canada, I had no idea, nor a clue that Canada is three times the size of India. And this is when I know India is a large country. but seeing that Canada is thrice larger just boggled my mind. As a typical urbanite, would probably become mad if in a rural area in Canada. Montreal, however seems to be something like Gwalior or Bangalore before IT stormed in, seems to be a place where people can work, play and have quite a few gardens as well.

Rail

This is one thing that is similar in both the great countries. India has Indian Railways and while the Canadians have their own mountain railway called viarail. India chugs on its 68k kilometre network, Canada is at fourth position with 52k network. With thrice the land size, it should have been somewhere where Russia is or even better than them. It would be interesting if Canadians comment about their railway network and why it is so bad in terms of reach.

Business houses, Political Families

This is again something that is similar in both the countries, it seems (from afar) that its only few business houses and more importantly political families which have governed for years.


Filed under: Miscellenous

20 March, 2017 06:42PM by shirishag75

Bits from Debian

DebConf17 welcomes its first eighteen sponsors!

DebConf17 logo

DebConf17 will take place in Montreal, Canada in August 2017. We are working hard to provide fuel for hearts and minds, to make this conference once again a fertile soil for the Debian Project flourishing. Please join us and support this landmark in the Free Software calendar.

Eighteen companies have already committed to sponsor DebConf17! With a warm welcome, we'd like to introduce them to you.

Our first Platinum sponsor is Savoir-faire Linux, a Montreal-based Free/Open-Source Software company which offers Linux and Free Software integration solutions and actively contributes to many free software projects. "We believe that it's an essential piece [Debian], in a social and political way, to the freedom of users using modern technological systems", said Cyrille Béraud, president of Savoir-faire Linux.

Our first Gold sponsor is Valve, a company developing games, social entertainment platform, and game engine technologies. And our second Gold sponsor is Collabora, which offers a comprehensive range of services to help its clients to navigate the ever-evolving world of Open Source.

As Silver sponsors we have credativ (a service-oriented company focusing on open-source software and also a Debian development partner), Mojatatu Networks (a Canadian company developing Software Defined Networking (SDN) solutions), the Bern University of Applied Sciences (with over 6,600 students enrolled, located in the Swiss capital), Microsoft (an American multinational technology company), Evolix (an IT managed services and support company located in Montreal), Ubuntu (the OS supported by Canonical) and Roche (a major international pharmaceutical provider and research company dedicated to personalized healthcare).

ISG.EE, IBM, Bluemosh, Univention and Skroutz are our Bronze sponsors so far.

And finally, The Linux foundation, Réseau Koumbit and adte.ca are our supporter sponsors.

Become a sponsor too!

Would you like to become a sponsor? Do you know of or work in a company or organization that may consider sponsorship?

Please have a look at our sponsorship brochure (or a summarized flyer), in which we outline all the details and describe the sponsor benefits.

For further details, feel free to contact us through sponsors@debconf.org, and visit the DebConf17 website at https://debconf17.debconf.org.

20 March, 2017 02:15PM by Laura Arjona Reina and Tássia Camões Araújo

March 19, 2017

Petter Reinholdtsen

Free software archive system Nikita now able to store documents

The Nikita Noark 5 core project is implementing the Norwegian standard for keeping an electronic archive of government documents. The Noark 5 standard document the requirement for data systems used by the archives in the Norwegian government, and the Noark 5 web interface specification document a REST web service for storing, searching and retrieving documents and metadata in such archive. I've been involved in the project since a few weeks before Christmas, when the Norwegian Unix User Group announced it supported the project. I believe this is an important project, and hope it can make it possible for the government archives in the future to use free software to keep the archives we citizens depend on. But as I do not hold such archive myself, personally my first use case is to store and analyse public mail journal metadata published from the government. I find it useful to have a clear use case in mind when developing, to make sure the system scratches one of my itches.

If you would like to help make sure there is a free software alternatives for the archives, please join our IRC channel (#nikita on irc.freenode.net) and the project mailing list.

When I got involved, the web service could store metadata about documents. But a few weeks ago, a new milestone was reached when it became possible to store full text documents too. Yesterday, I completed an implementation of a command line tool archive-pdf to upload a PDF file to the archive using this API. The tool is very simple at the moment, and find existing fonds, series and files while asking the user to select which one to use if more than one exist. Once a file is identified, the PDF is associated with the file and uploaded, using the title extracted from the PDF itself. The process is fairly similar to visiting the archive, opening a cabinet, locating a file and storing a piece of paper in the archive. Here is a test run directly after populating the database with test data using our API tester:

~/src//noark5-tester$ ./archive-pdf mangelmelding/mangler.pdf
using arkiv: Title of the test fonds created 2017-03-18T23:49:32.103446
using arkivdel: Title of the test series created 2017-03-18T23:49:32.103446

 0 - Title of the test case file created 2017-03-18T23:49:32.103446
 1 - Title of the test file created 2017-03-18T23:49:32.103446
Select which mappe you want (or search term): 0
Uploading mangelmelding/mangler.pdf
  PDF title: Mangler i spesifikasjonsdokumentet for NOARK 5 Tjenestegrensesnitt
  File 2017/1: Title of the test case file created 2017-03-18T23:49:32.103446
~/src//noark5-tester$

You can see here how the fonds (arkiv) and serie (arkivdel) only had one option, while the user need to choose which file (mappe) to use among the two created by the API tester. The archive-pdf tool can be found in the git repository for the API tester.

In the project, I have been mostly working on the API tester so far, while getting to know the code base. The API tester currently use the HATEOAS links to traverse the entire exposed service API and verify that the exposed operations and objects match the specification, as well as trying to create objects holding metadata and uploading a simple XML file to store. The tester has proved very useful for finding flaws in our implementation, as well as flaws in the reference site and the specification.

The test document I uploaded is a summary of all the specification defects we have collected so far while implementing the web service. There are several unclear and conflicting parts of the specification, and we have started writing down the questions we get from implementing it. We use a format inspired by how The Austin Group collect defect reports for the POSIX standard with their instructions for the MANTIS defect tracker system, in lack of an official way to structure defect reports for Noark 5 (our first submitted defect report was a request for a procedure for submitting defect reports :).

The Nikita project is implemented using Java and Spring, and is fairly easy to get up and running using Docker containers for those that want to test the current code base. The API tester is implemented in Python.

19 March, 2017 07:00AM

hackergotchi for Clint Adams

Clint Adams

Measure once, devein twice

Ophira lived in a wee house in University Square, Tampa. It had one floor, three bedrooms, two baths, a handful of family members, a couple pets, some plants, and an occasional staring contest.

Mauricio lived in Lowry Park North, but Ophira wasn’t allowed to go there because Mauricio was afraid that someone would tell his girlfriend. Ophira didn’t like Mauricio’s girlfriend and Mauricio’s girlfriend did not like Ophira.

Mauricio did not bring his girlfriend along when he and Ophira went to St. Pete Beach. They frolicked in the ocean water, and attempted to have sex. Mauricio and Ophira were big fans of science, so Somewhat quickly they concluded that it is impossible to have sex underwater, and absconded to Ophira’s car to have sex therein.

“I hate Mauricio’s girlfriend,” Ophira told Amit on the telephone. “She’s not even pretty.”

“Hey, listen,” said Amit. “I’m going to a wedding on Captiva.”

“Oh, my family used to go to Captiva every year. There’s bioluminescent algae and little crabs and stuff.”

“Yeah? Do you want to come along? You could pick me up at the airport.”

“Why would I want to go to a wedding?”

“Well, it’s on the beach and they’re going to have a bouncy castle.”

“A bouncy castle‽ Are you serious?”

“Yes.”

“Well, okay.”

Amit prepared to go to the wedding and Ophira became terse then unresponsive. After he landed at RSW, he called Ophira, but instead of answering the phone she startled and fell out of her chair. Amit arranged for other transportation toward the Sanibel Causeway. Ophira bit her nails for a few hours, then went to her car and drove to Cape Coral.

Ophira cruised around Cape Coral for a while, until she spotted a teenager cleaning a minivan. She parked her car and approached him.

“Whatcha doing?” asked Ophira, pretending to chew on imaginary gum.

The youth slid the minivan door open. “I’m cleaning,” he said hesitantly.

“Didn’t your parents teach you not to talk to strangers? I could do all kinds of horrible things to you.”

They conversed for a bit. She recounted a story of her personal hero, a twelve-year-old girl who seduced and manipulated older men into ruin. She rehashed the mysteries of Mauricio’s girlfriend. She waxed poetic on her love of bouncy castles. The youth listened, hypnotized.

“What’s your name, kid?” Ophira yawned.

“Arjun,” he replied.

“How old are you?”

Arjun thought about it. “15,” he said.

“Hmm,” Ophira stroked her chin. “Can you sneak me into your room so that your parents never find out about it?”

Arjun’s eyes went wide.

MEANWHILE, on Captiva Island, Amit had learned that even though the Tenderly had multiple indoor jacuzzis, General Fitzpatrick and Mrs. Fitzpatrick had decided it prudent to have sex in the hot tub on the deck; that the execution of this plan had somehow necessitated a lengthy cleaning process before the hot tub could be used again; that that’s why workmen were cleaning the hot tub; and that the Fitzpatrick children had gotten General Fitzpatrick and Mrs. Fitzpatrick to agree to not do that again, with an added suggestion that they not be seen doing anything else naked in public.

A girl walked up to Amit. “Hey, I heard you lost your plus-one. Are you here alone? What a loser!” she giggled nervously, then stared.

“Leave me alone, Darlene,” sighed Amit.

Darlene’s face reddened as she spun on her heels and stormed over to Lisette. “Oh my god, did you see that? I practically threw myself at him and he was abusive toward me. He probably has all the classic signs of being an abuser. Did you hear about that girl he dated in Ohio? I bet I know why that ended.”

“Oh really?” said Lisette distractedly, looking Amit up and down. “So he’s single now?”

Darlene glared at Lisette as Amit wandered back outside to stare at the hot tub.

“Hey kid,” said Ophira, “bring me some snacks.”

“I don’t bring food into my room,” said Arjun. “It attracts pests.”

“Is that what your parents told you?” scoffed Ophira. “Don’t be such a wuss.”

Three minutes later, Ophira was finishing a bag of paprika puffs. “These are great, Arjun! Where do you get these?”

“My cousin sends them from Europe,” he explained.

“Now get me a diet soda.”

Amit strolled along the beach, then yelped. “What’s biting my legs?” he cried out.

“Those are sand fleas,” said Nessarose.

“What are sand fleas?” asked Amit incredulously.

Nessarose rolled her eyes. “Stop being a baby and have a drink.”

After the sun went down, Amit began to notice the crabs, and this made him drink more.

When everyone was soused, General Fitzpatrick announced that they were going for a swim in the Gulf, in direct contravention of safety guidelines. Most of the guests were wise enough to refuse, but an eightsome swam out, occasionally stopping to slap the algae, but continuing until they reached the sandbar that General Fitzpatrick correctly claimed was there.

Then screams echoed through the night as all the jellyfish attacked everyone invading their sandbar.

The crestfallen swimming party eventually made it back to shore.

“Pee on the jellyfish sting,” commanded Nessarose. “It’s the best cure.”

“No!” shouted General Fitzpatrick’s daughter. “Urine makes it worse.”

Things quickly escalated from Nessarose and General Fitzpatrick’s daughter screaming at each other to the beach dividing into three factions: those siding with Nessarose, those siding with General Fitzpatrick’s daughter, and those who had no idea what was going on. General Fitzpatrick had no interest in any of this, and went straight to bed.

“It’s getting late, kid,” said Ophira. “I’m taking your bed.”

“What?” squeaked Arjun.

“Look,” said Ophira, “your bed is small and there isn’t room for both of us. You may sleep on the floor if you’re quiet and don’t bother me.”

“What?” squeaked Arjun.

“Are you deaf, kid?” Ophira grunted and then went to bed.

Arjun blinked in confusion, then tried to fall asleep on the floor, without much success.

Ophira got up in the morning and said, “Before I go, I want to teach you a valuable lesson.”

“What?” groaned Arjun, getting to his feet.

“You should be careful talking to strangers. Now, I told you that I could do horrible things to you, so this is not my fault; it’s yours,” she announced, then sucker-punched him in the gut.

Ophira climbed out the window as Arjun doubled over.

As the ceremony began, only a small minority of the wedding party was visibly suffering from jellyfish stings, which may or may not have helped with ignoring the sand fleas.

The ceremony ended shortly thereafter, and now that marriage had been accomplished, everyone turned their attention to food and drink and swimming less irresponsibly than the night before. Guests that needed to return home sooner departed in waves and Amit started to appreciate the more peaceful environment.

He heard the deck door slide open behind him and turned his attention away from the hot tub.

“Hey, mofo,” Ophira shouted as strode stylishly out onto the deck. “Where’s this bouncy castle?”

Amit blinked in surprise. “That was yesterday. You missed it.”

“Oh,” she frowned. “So I met this South Slav guy with a really sexy forehead, and I need some advice. I don’t know if I should call him or wait.”

Amit pointed to the hot tub and told her the story of General Fitzpatrick and Mrs. Fitzpatrick and the hot tub.

“What?” said Ophira. “How could they have sex underwater?”

“What do you mean?” asked Amit.

“Well, it’s impossible,” she replied.

Posted on 2017-03-19
Tags: mintings

19 March, 2017 04:38AM

March 18, 2017

Vincent Sanders

A rose by any other name would smell as sweet

Often I end up dealing with code that works but might not be of the highest quality. While quality is subjective I like to use the idea of "code smell" to convey what I mean, these are a list of indicators that, in total, help to identify code that might benefit from some improvement.

Such smells may include:
  • Complex code lacking comments on intended operation
  • Code lacking API documentation comments especially for interfaces used outside the local module
  • Not following style guide
  • Inconsistent style
  • Inconsistent indentation
  • Poorly structured code
  • Overly long functions
  • Excessive use of pre-processor
  • Many nested loops and control flow clauses
  • Excessive numbers of parameters
I am most certainly not alone in using this approach and Fowler et al have covered this subject in the literature much better than I can here. One point I will raise though is some programmers dismiss code that exhibits these traits as "legacy" and immediately suggest a fresh implementation. There are varying opinions on when a rewrite is the appropriate solution from never to always but in my experience making the old working code smell nice is almost always less effort and risk than a re-write.

Tests

When I come across smelly code, and I decide it is worthwhile improving it, I often discover the biggest smell is lack of test coverage. Now do remember this is just one code smell and on its own might not be indicative, my experience is smelly code seldom has effective test coverage while fresh code often does.

Test coverage is generally understood to be the percentage of source code lines and decision paths used when instrumented code is exercised by a set of tests. Like many metrics developer tools produce, "coverage percentage" is often misused by managers as a proxy for code quality. Both Fowler and Marick have written about this but sufficient to say that for a developer test coverage is a useful tool but should not be misapplied.

Although refactoring without tests is possible the chances for unintended consequences are proportionally higher. I often approach such a refactor by enumerating all the callers and constructing a description of the used interface beforehand and check that that interface is not broken by the refactor. At which point is is probably worth writing a unit test to automate the checks.

Because of this I have changed my approach to such refactoring to start by ensuring there is at least basic API code coverage. This may not yield the fashionable 85% coverage target but is useful and may be extended later if desired.

It is widely known and equally widely ignored that for maximum effectiveness unit tests must be run frequently and developers take action to rectify failures promptly. A test that is not being run or acted upon is a waste of resources both to implement and maintain which might be better spent elsewhere.

For projects I contribute to frequently I try to ensure that the CI system is running the coverage target, and hence the unit tests, which automatically ensures any test breaking changes will be highlighted promptly. I believe the slight extra overhead of executing the instrumented tests is repaid by having the coverage metrics available to the developers to aid in spotting areas with inadequate tests.

Example

A short example will help illustrate my point. When a web browser receives an object over HTTP the server can supply a MIME type in a content-type header that helps the browser interpret the resource. However this meta-data is often problematic (sorry that should read "a misleading lie") so the actual content must be examined to get a better answer for the user. This is known as mime sniffing and of course there is a living specification.

The source code that provides this API (Linked to it rather than included for brevity) has a few smells:
  • Very few comments of any type
  • The API are not all well documented in its header
  • A lot of global context
  • Local static strings which should be in the global string table
  • Pre-processor use
  • Several long functions
  • Exposed API has many parameters
  • Exposed API uses complex objects
  • The git log shows the code has not been significantly updated since its implementation in 2011 but the spec has.
  • No test coverage
While some of these are obvious the non-use of the global string table and the API complexity needed detailed knowledge of the codebase, just to highlight how subjective the sniff test can be. There is also one huge air freshener in all of this which definitely comes from experience and that is the modules author. Their name at the top of this would ordinarily be cause for me to move on, but I needed an example!

First thing to check is the API use

$ git grep -i -e mimesniff_compute_effective_type --or -e mimesniff_init --or -e mimesniff_fini
content/hlcache.c: error = mimesniff_compute_effective_type(handle, NULL, 0,
content/hlcache.c: error = mimesniff_compute_effective_type(handle,
content/hlcache.c: error = mimesniff_compute_effective_type(handle,
content/mimesniff.c:nserror mimesniff_init(void)
content/mimesniff.c:void mimesniff_fini(void)
content/mimesniff.c:nserror mimesniff_compute_effective_type(llcache_handle *handle,
content/mimesniff.h:nserror mimesniff_compute_effective_type(struct llcache_handle *handle,
content/mimesniff.h:nserror mimesniff_init(void);
content/mimesniff.h:void mimesniff_fini(void);
desktop/netsurf.c: ret = mimesniff_init();
desktop/netsurf.c: mimesniff_fini();

This immediately shows me that this API is used in only a very small area, this is often not the case but the general approach still applies.

After a little investigation the usage is effectively that the mimesniff_init API must be called before the mimesniff_compute_effective_type API and the mimesniff_fini releases the initialised resources.

A simple test case was added to cover the API, this exercised the behaviour both when the init was called before the computation and not. Also some simple tests for a limited number of well behaved inputs.

By changing to using the global string table the initialisation and finalisation API can be removed altogether along with a large amount of global context and pre-processor macros. This single change removes a lot of smell from the module and raises test coverage both because the global string table already has good coverage and because there are now many fewer lines and conditionals to check in the mimesniff module.

I stopped the refactor at this point but were this more than an example I probably would have:
  • made the compute_effective_type interface simpler with fewer, simpler parameters
  • ensured a solid set of test inputs
  • examined using a fuzzer to get a better test corpus.
  • added documentation comments
  • updated the implementation to 2017 specification.

Conclusion

The approach examined here reduce the smell of code in an incremental, testable way to improve the codebase going forward. This is mainly necessary on larger complex codebases where technical debt and bit-rot are real issues that can quickly overwhelm a codebase if not kept in check.

This technique is subjective but helps a programmer to quantify and examine a piece of code in a structured fashion. However it is only a tool and should not be over applied nor used as a metric to proxy for code quality.

18 March, 2017 01:01PM by Vincent Sanders (noreply@blogger.com)

March 17, 2017

hackergotchi for Shirish Agarwal

Shirish Agarwal

Science Day at GMRT, Khodad 2017

The whole team posing at the end of day 2

The above picture is the blend of the two communities from foss community and mozilla India. And unless you were there you wouldn’t know who is from which community which is what FOSS is all about. But as always I’m getting a bit ahead of myself.

Akshat, who works at NCRA as a programmer, the standing guy on the left shared with me in January this year that this year too, we should have two stalls, foss community and mozilla India stalls next to each other. While we had the banners, we were missing stickers and flyers. Funds were and are always an issue and this year too, it would have been emptier if we didn’t get some money saved from last year minidebconf 2016 that we had in Mumbai. Our major expenses included printing stickers, stationery and flyers which came to around INR 5000/- and couple of LCD TV monitors which came for around INR 2k/- as rent. All the labour was voluntary in nature, but both me and Akshat easily spending upto 100 hours before the event. Next year, we want to raise to around INR 10-15k so we can buy 1 or 2 LCD monitors and we don’t have to think for funds for next couple of years. How will we do that I have no idea atm.

Printing leaflets

Me and Akshat did all the printing and stationery runs and hence had not been using my lappy for about 3-4 days.

Come to the evening before the event and the laptop would not start. Coincidentally, or not few months or even last at last year’s Debconf people had commented on IBM/Lenovo’s obsession with proprietary power cords and adaptors. I hadn’t given it much thought but when I got no power even after putting it on AC power for 3-4 hours, I looked up on the web and saw that the power cord and power adaptors were all different even in T440 and even that under existing models. In fact I couldn’t find mine hence sharing it via pictures below.

thinkpad power cord male

thinkpad power adaptor female

I knew/suspected that thinkpads would be rare where I was going, it would be rarer still to find the exact power cord and I was unsure whether it was the power cord at fault or adaptor or whatever goes for SMPS in laptop or memory or motherboard/CPU itself. I did look up the documentation at support.lenovo.com and was surprised at the extensive documentation that Lenovo has for remote troubleshooting.

I did the usual take out the battery, put it back in, twiddle with the little hole in the bottom of the laptop, trying to switch on without the battery on AC mains, trying to switch on with battery power only but nothing worked. Couple of hours had gone by and with a resigned thought went to bed, convincing myself that anyways it’s good I am not taking the lappy as it is extra-dusty there and who needs a dead laptop anyways.

Update – After the event was over, I did contact Lenovo support and within a week, with one visit from a service engineer, he was able to identify that it was a faulty cable which was at fault and not the the other things which I was afraid of. Another week gone by and lenovo replaced the cable. Going by service standards that I have seen of other companies, Lenovo deserves a gold star here for the prompt service they provided. I probably would end up subscribing to their extended 2-year warranty service when my existing 3 year warranty is about to be over.

Next day, woke up early morning, two students from COEP hostel were volunteering and we made our way to NCRA, Pune University Campus. Ironically, though we were under the impression that we would be the late arrivals, it turned out we were the early birds. 5-10 minutes passed by and soon enough we were joined by Aniket and we played catch-up for a while. We hadn’t met each other for a while so it was good to catch-up. Then slowly other people starting coming in and around 07:10-07:15 we started for GMRT, Khodad.

Now I had been curious as had been hearing for years that the Pune-Nashik NH-50 highway would be concreted and widened to six-lane highways but the experience was below par. Came back and realized the proposal has now been pushed back to 2020.

From the mozilla team, only Aniket was with us, the rest of the group was coming straight from Nashik. Interestingly, all the six people who came, came on bikes which depending upon how you look at it was either brave or stupid. Travelling on bikes on Indian highways you either have to be brave or stupid or both, we have more than enough ‘accidents’ due to quality of road construction, road design, lane-changing drivers and many other issues. This is probably not the place for it hence will use some other blog post to rant about that.

We reached around 10:00 hrs. IST and hung around till lunch as Akshat had all the marketing material, monitors etc. The only thing we had were couple of lappies and couple of SBC’s, an RPI 3 and a BBB.

Aarti Kashyap sharing something about SBC

Our find for the event was Aarti Kashyap who you can see above. She is a third-year student at COEP and one of the rare people who chose to interact with hardware rather than software. From last several years, we had been trying, successfully and unsuccessfully to get more Indian women and girls interested into technology. It is a vicious circle as till a girl/woman doesn’t volunteer we are unable to share our knowledge to the extent we can which leads them to not have much interest in FOSS or even technology in general.

While there are groups are djangogirls, Pyladies and railgirls and even Outreachy which tries to motivate getting girls into computing but it’s a long road ahead.

We are short of both funds and ideas as to how to motivate more girls to get into computing and then to get into playing with hardware. I don’t know where to start and end for whoever wants to play with hardware. From SBC’s, routers to blade servers the sky is the limit. Again this probably isn’t the place for it, hence probably we can chew it on more at some other blog post.

This year, we had a lowish turnout due to the fact that the 12th board exams 1st paper was on the day we had opened. So instead of 20-25k, we probably had 5-7k fewer people pass through. There were two-three things that we were showing, we were showing Debian on one of the systems, we were showing the output from the SBC’s on the other monitor but the glare kept hitting the monitors.

While the organizers had done exemplary work over last year. They had taped the carpets on the ground so there was hardly any dust moving around. However, I wished the organizers had taken the pains to have two cloth roofs over our head instead of just one, the other roof head could be say 2 feet up, this would have done two things –

a. It probably would have cooled the place a bit more as –

b. We could get diffused sunlight which would have lessened the glare and reflection the LCD’s kept throwing back. At times we also got people to come to our side as can be seen in Aarti’s photo as can be seen above.

If these improvements can be made for next year, this would result in everybody in our ‘Pandal’ would benefit, not just us and mozilla. This would be benefiting around 10-15 organizations which were within the same temporary structure.

Of course, it depends very much on the budget they are able to have and people who are executing, we can just advise.

The other thing which had been missing last year and this year is writing about Single Board Computers in Marathi. If we are to promote them as something to replace a computer or something for a younger brother/sister to learn computing upon at a lower cost, we need leaflets written in their language to be more effective. And this needs to be in the language and mannerisms that people in that region understand. India, as probably people might have experienced is a dialect-prone country. Which means every 2-5 kms, the way the language is spoken is different from anywhere else. The Marathi spoken by somebody who has lived in Ravivar Peth for his whole life and a person who has lived in say Kothrud are different. The same goes from any place and this place, Khodad, Narayangaon would have its own dialect, its own mini-codespeak.

Just to share, we did have one in English but it would have been a vast improvement if we could do it in the local language. Maybe we can discuss about this and ask for help from people.

Outside, Looking in

Mozillians helping FOSS community and vice-versa

What had been interesting about the whole journey were the new people who were bringing all their passion and creativity to the fore. From the mozilla community, we had Akshay who is supposed to be a wizard on graphics, animation, editing anything to do with the visual medium. He shared some of the work he had done and also shared a bit about how blender works with people who wanted to learn about that.

Mayur, whom you see in the picture pointing out something about FOSS and this was the culture that we strove to have. I know and love and hate the browser but haven’t been able to fathom the recklessness that Mozilla has been doing the last few years, which has just been having one mis-adventure after another.

For instance, mozstumbler was an effort which I thought would go places. From what little I understood, it served/serves as a user-friendly interface to a potential user while still sharing all the data with OSM . They (Mozilla) seems/seemed to have a fatalistic take as it provided initial funding but then never fully committing to the project.

Later, at night we had the whole ‘free software’ and ‘open-source’ sharings where I tried to emphasize that without free software, the term ‘open-source’ would not have come into existence. We talked and talked and somewhere around 02:00 I slept, the next day was an extension of the first day itself where we ribbed each other good-naturedly and still shared whatever we could share with each other.

I do hope that we continue this tradition for great many years to come and engage with more and more people every passing year.


Filed under: Miscellenous Tagged: #budget, #COEP< #volunteering, #debian, #Events, #Expenses, #mozstumbler, #printing, #SBC's, #Science Day 2017, #thinkpad cable issue, FOSS, mozilla

17 March, 2017 07:19PM by shirishag75

Antonio Terceiro

Patterns for Testing Debian Packages

At the and of 2016 I had the pleasure to attend the 11th Latin American Conference on Pattern Languages of Programs, a.k.a SugarLoaf PLoP. PLoP is a series of conferences on Patterns (as in “Design Patterns”), a subject that I appreciate a lot. Each of the PLoP conferences but the original main “big” conference has a funny name. SugarLoaf PLoP is called that way because its very first edition was held in Rio de Janeiro, so the organizers named it after a very famous mountain in Rio. The name stuck even though a long time has passed since it was held in Rio for the last time. 2016 was actually the first time SugarLoaf PLoP was held outside of Brazil, finally justifying the “Latin American” part of its name.

I was presenting a paper I wrote on patterns for testing Debian packages. The Debian project funded my travel expenses through the generous donations of its supporters. PLoP’s are very fun conferences with a relaxed atmosphere, and is amazing how many smart (and interesting!) people gather together for them.

My paper is titled “Patterns for Writing As-Installed Tests for Debian Packages”, and has the following abstract:

Large software ecosystems, such as GNU/Linux distributions, demand a large amount of effort to make sure all of its components work correctly invidually, and also integrate correctly with each other to form a coherent system. Automated Quality Assurance techniques can prevent issues from reaching end users. This paper presents a pattern language originated in the Debian project for automated software testing in production-like environments. Such environments are closer in similarity to the environment where software will be actually deployed and used, as opposed to the development environment under which developers and regular Continuous Integration mechanisms usually test software products. The pattern language covers the handling of issues arising from the difference between development and production-like environments, as well as solutions for writing new, exclusive tests for as-installed functional tests. Even though the patterns are documented here in the context of the Debian project, they can also be generalized to other contexts.

In practical terms, the paper documents a set of patterns I have noticed in the last few years, when I have been pushing the Debian Continous Integration project. It should be an interesting read for people interested in the testing of Debian packages in their installed form, as done with autopkgtest. It should also be useful for people from other distributions interested in the subject, as the issues are not really Debian-specific.

I have recently finished the final version of the paper, which should be published in the ACM Digital Library at any point now. You can download a copy of the paper in PDF. Source is also available, if you are into markdown, LaTeX, makefiles and this sort of thing.

If everything goes according to plan, I should be presenting a talk on this at the next Debconf in Montreal.

17 March, 2017 01:23AM

March 16, 2017

hackergotchi for Thorsten Glaser

Thorsten Glaser

Updates to the last two posts

Someone from the FSF’s licencing department posted an official-looking thing saying they don’t believe GitHub’s new ToS to be problematic with copyleft. Well, my lawyer (not my personal one, nor for The MirOS Project, but related to another association, informally) does agree with my reading of the new ToS, and I can point out at least a clause in the GPLv1 (I really don’t have time right now) which says contrary (but does this mean the FSF generally waives the restrictions of the GPL for anything on GitHub?). I’ll eMail GitHub Legal directly and will try to continue getting this fixed (as soon as I have enough time for it) as I’ll otherwise be forced to force GitHub to remove stuff from me (but with someone else as original author) under GPL, such as… tinyirc and e3.

My dbconfig-common Debian packaging example got a rather hefty upgrade because dbconfig-common (unlike any other DB schema framework I know of) doesn’t apply the upgrades on a fresh install (and doesn’t automatically put the upgrades into a transaction either) but only upgrades between Debian package versions (which can be funny with backports, but AFAICT that part is handled correctly). I now append the upgrades to the initial-version-as-seen-in-the-source to generate the initial-version-as-shipped-in-the-binary-package (optionally, only if it’s named .in) removing all transaction stuff from the upgrade files and wrapping the whole shit in BEGIN; and COMMIT; after merging. (This should at least not break nōn-PostgreSQL databases and… well, database-like-ish things I cannot test for obvious (SQLite is illegal, at least in Germany, but potentially worldwide, and then PostgreSQL is the only remaining Open Source database left ;) reasons.)

Update: Yes, this does mean that maintainers of databases and webservers should send me patches to make this work with not-PostgreSQL (new install/name.in, upgrade files) and not-Apache-2.2/2.4 (new debian/*/*.conf snippets) to make this packaging example even more generally usable.

Natureshadow already forked this and made a Python/Flask package from it, so I’ll prod him to provide a similarily versatile hello-python-world example package.

16 March, 2017 11:12PM by MirOS Developer tg (tg@mirbsd.org)

hackergotchi for Joey Hess

Joey Hess

end of an era

I'm at home downloading hundreds of megabytes of stuff. This is the first time I've been in position of "at home" + "reasonably fast internet" since I moved here in 2012. It's weird!

Satellite internet dish with solar panels in foreground

While I was renting here, I didn't mind dialup much. In a way it helps to focus the mind and build interesting stuff. But since I bought the house, the prospect of only dialup at home ongoing became more painful.

While I hope to get on the fiber line that's only a few miles away eventually, I have not convinced that ISP to build out to me yet. Not enough neighbors. So, satellite internet for now.

9.1 dB SNR

speedtest results: 15 megabit down / 4.5 up with significant variation

Dish seems well aligned, speed varies a lot, but is easily hundreds of times faster than dialup. Latency is 2x dialup.

The equipment uses more power than my laptop, so with the current solar panels, I anticipate using it only 6-9 months of the year. So I may be back to dialup most days come winter, until I get around to adding more PV capacity.

It seems very cool that my house can capture sunlight and use it to beam signals 20 thousand miles into space. Who knows, perhaps there will even be running water one day.

Satellite dish

16 March, 2017 10:14PM

hackergotchi for Raphaël Hertzog

Raphaël Hertzog

Freexian’s report about Debian Long Term Support, February 2017

A Debian LTS logoLike each month, here comes a report about the work of paid contributors to Debian LTS.

Individual reports

In January, about 154 work hours have been dispatched among 13 paid contributors. Their reports are available:

  • Antoine Beaupré did 3 hours (out of 13h allocated, thus keeping 10 extra hours for March).
  • Balint Reczey did 13 hours (out of 13 hours allocated + 1.25 hours remaining, thus keeping 1.25 hours for March).
  • Ben Hutchings did 19 hours (out of 13 hours allocated + 15.25 hours remaining, he gave back the remaining hours to the pool).
  • Chris Lamb did 13 hours.
  • Emilio Pozuelo Monfort did 12.5 hours (out of 13 hours allocated, thus keeping 0.5 hour for March).
  • Guido Günther did 8 hours.
  • Hugo Lefeuvre did nothing and gave back his 13 hours to the pool.
  • Jonas Meurer did 14.75 hours (out of 5 hours allocated + 9.75 hours remaining).
  • Markus Koschany did 13 hours.
  • Ola Lundqvist did 4 hours (out of 13h allocated, thus keeping 9 hours for March).
  • Raphaël Hertzog did 3.75 hours (out of 10 hours allocated, thus keeping 6.25 hours for March).
  • Roberto C. Sanchez did 5.5 hours (out of 13 hours allocated + 0.25 hours remaining, thus keeping 7.75 hours for March).
  • Thorsten Alteholz did 13 hours.

Evolution of the situation

The number of sponsored hours increased slightly thanks to Bearstech and LiHAS joining us.

The security tracker currently lists 45 packages with a known CVE and the dla-needed.txt file 39. The number of open issues continued its slight increase, this time it could be explained by the fact that many contributors did not spend all the hours allocated (for various reasons). There’s nothing worrisome at this point.

Thanks to our sponsors

New sponsors are in bold.

No comment | Liked this article? Click here. | My blog is Flattr-enabled.

16 March, 2017 01:25PM by Raphaël Hertzog

Enrico Zini

Django signing signs, does not encrypt

As is says in the documentation. django.core.signing signs, and does not encyrpt.

Even though signing.dumps creates obscure-looking tokens, they are not encrypted, and here's a proof:

>>> from django.core import signing
>>> a = signing.dumps({"action":"set-password", "username": "enrico", "password": "SECRET"})
>>> from django.utils.encoding import force_bytes
>>> print(signing.b64_decode(force_bytes(a.split(":",1)[0])))
b'{"action":"set-password","password":"SECRET","username":"enrico"}'

I'm writing it down so one day I won't be tempted to think otherwise.

16 March, 2017 11:01AM

hackergotchi for Wouter Verhelst

Wouter Verhelst

Codes of Conduct

These days, most large FLOSS communities have a "Code of Conduct"; a document that outlines the acceptable (and possibly not acceptable) behaviour that contributors to the community should or should not exhibit. By writing such a document, a community can arm itself more strongly in the fight against trolls, harassment, and other forms of antisocial behaviour that is rampant on the anonymous medium that the Internet still is.

Writing a good code of conduct is no easy matter, however. I should know -- I've been involved in such a process twice; once for Debian, and once for FOSDEM. While I was the primary author for the Debian code of conduct, the same is not true for the FOSDEM one; I was involved, and I did comment on a few early drafts, but the core of FOSDEM's current code was written by another author. I had wanted to write a draft myself, but then this one arrived and I didn't feel like I could improve it, so it remained.

While it's not easy to come up with a Code of Conduct, there (luckily) are others who walked this path before you. On the "geek feminism" wiki, there is an interesting overview of existing Open Source community and conference codes of conduct, and reading one or more of them can provide one with some inspiration as to things to put in one's own code of conduct. That wiki page also contains a paragraph "Effective codes of conduct", which says (amongst others) that a good code of conduct should include

Specific descriptions of common but unacceptable behaviour (sexist jokes, etc.)

The attentive reader will notice that such specific descriptions are noticeably absent from both the Debian and the FOSDEM codes of conduct. This is not because I hadn't seen the above recommendation (I had); it is because I disagree with it. I do not believe that adding a list of "don't"s to a code of conduct is a net positive to it.

Why, I hear you ask? Surely having a list of things that are not welcome behaviour is a good thing, which should be encouraged? Surely such a list clarifies the kind of things your does not want to see? Having such a list will discourage that bad behaviour, right?

Well, no, I don't think so. And here's why.

Enumerating badness

A list of things not to do is like a virus scanner. For those not familiar with these: on some operating systems, there is specific piece of software that everyone recommends you run, which checks if particular blobs of data appear in files on the disk. If they do, then these files are assumed to be bad, and are kicked out. If they do not, then these files are assumed to be not bad, and are left alone (for the most part).

This works if we know all the possible types of badness; but as soon as someone invents a new form of badness, suddenly your virus scanner is ineffective. Additionally, it also means you're bound to continually have to update your virus scanner (or, as the case may be, code of conduct) to a continually changing hostile world. For these (and other) reasons, enumerating badness is listed as number 2 in security expert Markus Ranum's "six dumbest ideas in computer security," which was written in 2005.

In short, a list of "things not to do" is bound to be incomplete; if the goal is to clarify the kind of behaviour that is not welcome in your community, it is usually much better to explain the behaviour that is wanted, so that people can infer (by their absense) the kind of behaviour that isn't welcome.

This neatly brings me to my next point...

Black vs White vs Gray.

The world isn't black-and-white. We could define a list of welcome behaviour -- let's call that the whitelist -- or a list of unwelcome behaviour -- the blacklist -- and assume that the work is done after doing so. However, that wouldn't be true. For every item on either the white or black list, there's going to be a number of things that fall somewhere in between. Let's call those things as being on the "gray" list. They're not the kind of outstanding behaviour that we would like to see -- they'd be on the white list if they were -- but they're not really obvious CoC violations, either. You'd prefer it if people don't do those things, but it'd be a stretch to say they're jerks if they do.

Let's clarify that with an example:

Is it a code of conduct violation if you post links to pornography websites on your community's main development mailinglist? What about jokes involving porn stars? Or jokes that denigrate women, or that explicitly involve some gender-specific part of the body? What about an earring joke? Or a remark about a user interacting with your software, where the women are depicted as not understanding things as well as men? Or a remark about users in general, that isn't written in a gender-neutral manner? What about a piece of self-deprecating humor? What about praising someone else for doing something outstanding?

I'm sure most people would agree that the first case in the above paragraph should be a code of conduct violation, whereas the last case should not be. Some of the items in the list in between are clearly on one or the other side of the argument, but for others the jury is out. Let's call those as being in the gray zone. (Note: no, I did not mean to imply that the list is ordered in any way ;-)

If you write a list of things not to do, then by implication (because you didn't mention them), the things in the gray area are okay. This is especially problematic when it comes to things that are borderline blacklisted behaviour (or that should be blacklisted but aren't, because your list is incomplete -- see above). In such a situation, you're dealing with people who are jerks but can argue about it because your definition of jerk didn't cover teir behaviour. Because they're jerks, you can be sure they'll do everything in their power to waste your time about it, rather than improving their behaviour.

In contrast, if you write a list of things that you want people to do, then by implication (because you didn't mention it), the things in the gray area are not okay. If someone slips and does something in that gray area anyway, then that probably means they're doing something borderline not-whitelisted, which would be mildly annoying but doesn't make them jerks. If you point that out to them, they might go "oh, right, didn't think of it that way, sorry, will aspire to be better next time". Additionally, the actual jerks and trolls will have been given less tools to argue about borderline violations (because the border of your code of conduct is far, far away from jerky behaviour), so less time is wasted for those of your community who have to police it (yay!).

In theory, the result of a whitelist is a community of people who aspire to be nice people, rather than a community of people who simply aspire to be "not jerks". I know which kind of community I prefer.

Giving the wrong impression

During one of the BOFs that were held while I was drafting the Debian code of conduct, it was pointed out to me that a list of things not to do may give the impression to people that all these things on this list do actually happen in the code's community. If that is true, then a very long list may produce the impression that the given community is a community with a lot of problems.

Instead, a whitelist-based code of conduct will provide the impression that you're dealing with a healthy community. Whether that is the case obviously depends on more factors than just the code of conduct itself, but it will put people in the right mindset for this to become something of a self-fulfilling prophecy.

Conclusion

Given all of the above, I think a whitelist-based code of conduct is a better idea than a blacklist-based one. Additionally, in the few years since the Debian code of conduct was accepted, it is my impression that the general atmosphere in the Debian project has improved, which would seem to confirm that the method works (but YMMV, of course).

At any rate, I'm not saying that blacklist-based codes of conduct are useless. However, I do think that whitelist-based ones are better; and hopefully, you now agree, too ;-)

16 March, 2017 08:02AM

hackergotchi for Ben Hutchings

Ben Hutchings

Debian LTS work, February 2017

I was assigned 13 hours of work by Freexian's Debian LTS initiative and carried over 15.25 from January. I worked 19 hours and have returned the remaining 9.25 hours to the general pool.

I prepared a security update for the Linux kernel and issued DLA-833-1. However, I spent most of my time catching up with a backlog of fixes for the Linux 3.2 longterm stable branch. I issued two stable updates (3.2.85, 3.2.86).

16 March, 2017 04:44AM

March 15, 2017

hackergotchi for Michal Čihař

Michal Čihař

Life of free software project

During last week I've noticed several interesting posts about challenges being free software maintainer. After being active in open source for 16 years I can share much of the feelings I've read and I can also share my dealings with the things.

First of all let me link some of the other posts on the topic:

I guess everybody involved in in some popular free software project knows it - there is much more work to be done than people behind the project can handle. It really doesn't matter it those are bug reports, support requests, new features or technical debt, it's simply too much of that. If you are the only one behind the project it can feel even more pressing.

There can be several approaches how to deal with that, but you have to choose what you prefer and what is going to work for you and your project. I've used all of the below mentioned approaches on some of the projects, but I don't think there is a silver bullet.

Finding more people

Obviously if you can not cope with the work, let's find more people to do the work. Unfortunately it's not that easy. Sometimes people come by, contribute few patches, but it's not that easy to turn them into regular contributor. You should encourage them to stay and to care about the part of the project they have touched.

You can try to attract completely new contributors through programs as Google Summer of Code (GSoC) or Outreachy, but that has it's own challenges as well.

With phpMyAdmin we're participating regularly in GSoC (we've only missed last year as we were not chosen by Google that year) and it indeed helps to bring new people on the board. Many of them even stay around your project (currently 3 of 5 phpMyAdmin team members are former GSoC students). But I think this approach really works only for bigger organizations.

You can also motivate people by money. It's way which is not really much used on free software projects, partly because lack of funding (I'll get to that later) and partly because it doesn't necessarily bring long time contributors, just cash hunters. I've been using Bountysource for some of my projects (Weblate and Gammu) and so far it mostly works other way around - if somebody posts bounty on the issue, it means it's quite important for him to get that fixed, so I use that as indication for myself. On attracting new developers it never really worked well, even when I've tried to post bounties to some easy to fix issues, where newbies could learn our code base and get paid for that. These issues stayed opened for months and in the end I've fixed them myself because they annoyed me.

Don't care too much

I think this is most important aspect - you simply can never fix all the problems. Let's face it and work according to that. There can be various levels of don't caring. I find it always better to try to encourage people to fix their problem, but you can't expect big success rate in that, so you might find it not worth of the time.

What I currently do:

  • I often ignore direct emails asking for fixing something. The project has public issue tracker on purpose. Once you solve the issue there others will have chance to find it when they face similar problem. Solving things privately in mails will probably make you look at similar problems again and again.
  • I try to batch process things. It is really easier to get focused when you work on one project and do not switch contexts. This means people will have to wait until you get to their request, but it also means that you will be able to deal them much more effectively. This is why Free hosting requests for Hosted Weblate get processed once in a month.
  • I don't care about number of unread mails, notifications or whatever. Or actually I try to not get much of these at all. This is really related to above, I might to some things once in a month (or even less) and that's still okay. Maybe you're just getting notifications for things you really don't need to get notified on? Do you really need notification for new issues? Isn't it better just to look at the issue tracker once in a time than constantly feeling the pressure of not read notifications?
  • I don't have to fix every problem. When it seems like something what could be as well fixed by the reporter, I just try to give them guidance how to dig deeper into the issue. Obviously this can't work for all cases, but getting more people on board always helps.
  • I try to focus on things which can save time in future. Many issues turn out to be just some unclear things and once you figure out that, spend few more minutes to improve your documentation to cover that. It's quite likely that this will save your time in future.

If you still can't handle that, you should consider abandoning the project as well. Does it bring something to you other than frustration of not completed work? I know it can be hard decision, in the end it is your child, but sometimes it's the best think you can do.

Get paid to do the work

Are you doing your fulltime job and then work on free software on nights or weekends? It can probably work for some time, but unless you find some way to make these two match, you will lack free time to relax and spend with friends or family. There are several options to make these work together.

You can find job where doing free software will be natural part of it. This worked for me pretty well at SUSE, but I'm sure there are more companies where it will work. It can happen that the job will not cover all your free software activities, but this still helps.

You can also make your project to become your employer. This can be sometimes challenging to make volunteers and paid contractors to work on one project, but I think this can be handled. Such setup currently works currently quite well for phpMyAdmin (we will announce second contractor soon) and works quite well for me with Weblate as well.

Funding free software projects

Once your project is well funded, you can fix many problems by money. You can pay yourself to do the work, hire additional developers, get better infrastructure or travel to conferences to spread word about it. But the question is how to get to the point of being well funded.

There are several crowdfunding platforms which can help you with that (Liberapay, Bountysource salt, Gratipay or Snowdrift to mention some). You can also administer the funding yourself or using some legal entity such as Software Freedom Conservancy which handles this for phpMyAdmin.

But the most important thing is to persuade people and companies to give back. You know there are lot of companies relying on your project, but how to make them fund the project? I really don't know, I still struggle with this as I don't want to be too pushy in asking for money, but I'd really like to see them to give back.

What kind of works is giving your sponsors logo / link placement on your website. If your website is well ranked, you can expect to get quite a lot of SEO sponsors and the question is where to draw a line what you still find acceptable. Obviously the most willing to pay companies will have nothing to do with what you do and they just want to get the link. The industry you can expect is porn, gambling, binary options and various MFA sites. You will get some legitimate sponsors related to your project as well. We felt we've gone too far with phpMyAdmin last year and we've stricten the rules recently, but the outcome is still not visible on our website (as we've just limited new sponsors, but existing contracts will be honored).

Another option is to monetize your project more directly. You can offer consulting services or provide it as a service (this is what I currently do with Weblate). It really depends on the product if you can build customer base on that or not, but certainly this is not something what would work well for all projects.

Thanks for reading this and I hope it's not too chaotic, as I've moved parts there and back while writing and I'm afraid it got too long in the end.

Filed under: Debian English Gammu phpMyAdmin SUSE Weblate | 0 comments

15 March, 2017 11:00AM

Bits from Debian

Build Android apps with Debian: apt install android-sdk

In Debian stretch, the upcoming new release, it is now possible to build Android apps using only packages from Debian. This will provide all of the tools needed to build an Android app targeting the "platform" android-23 using the SDK build-tools 24.0.0. Those two are the only versions of "platform" and "build-tools" currently in Debian, but it is possible to use the Google binaries by installing them into /usr/lib/android-sdk.

This doesn't cover yet all of the libraries that are used in the app, like the Android Support libraries, or all of the other myriad libraries that are usually fetched from jCenter or Maven Central. One big question for us is whether and how libraries should be included in Debian. All the Java libraries in Debian can be used in an Android app, but including something like Android Support in Debian would be strange since they are only useful in an Android app, never for a Debian app.

Building apps with these packages

Here are the steps for building Android apps using Debian's Android SDK on Stretch.

  1. sudo apt install android-sdk android-sdk-platform-23
  2. export ANDROID_HOME=/usr/lib/android-sdk
  3. In build.gradle, set compileSdkVersion to 23 and buildToolsVersion to 24.0.0
  4. run gradle build

The Gradle Android Plugin is also packaged. Using the Debian package instead of the one from online Maven repositories requires a little configuration before running gradle. In the buildscript block:

  • add maven { url 'file:///usr/share/maven-repo' } to repositories
  • use compile 'com.android.tools.build:gradle:debian' to load the plugin

Currently there is only the target platform of API Level 23 packaged, so only apps targeted at android-23 can be built with only Debian packages. There are plans to add more API platform packages via backports. Only build-tools 24.0.0 is available, so in order to use the SDK, build scripts need to be modified. Beware that the Lint in this version of Gradle Android Plugin is still problematic, so running the :lint tasks might not work. They can be turned off with lintOptions.abortOnError in build.gradle. Google binaries can be combined with the Debian packages, for example to use a different version of the platform or build-tools.

Why include the Android SDK in Debian?

While Android developers could develop and ship apps right now using these Debian packages, this is not very flexible since only build-tools-24.0.0 and android-23 platform are available. Currently, the Debian Android Tools Team is not aiming to cover the most common use cases. Those are pretty well covered by Google's binaries (except for the proprietary license on the Google binaries), and are probably the most work for the Android Tools Team to cover. The current focus is on use cases that are poorly covered by the Google binaries, for example, like where only specific parts of the whole SDK are used. Here are some examples:

  • tools for security researchers, forensics, reverse engineering, etc. which can then be included in live CDs and distros like Kali Linux
  • a hardened APK signing server using apksigner that uses a standard, audited, public configuration of all reproducibly built packages
  • Replicant is a 100% free software Android distribution, so of course they want to have a 100% free software SDK
  • high security apps need a build environment that matches their level of security, the Debian Android Tools packages are reproducibly built only from publicly available sources
  • support architectures besides i386 and amd64, for example, the Linaro LAVA setup for testing ARM devices of all kinds uses the adb packages on ARM servers to make their whole testing setup all ARM architecture
  • dead simple install with strong trust path with mirrors all over the world

In the long run, the Android Tools Team aims to cover more use cases well, and also building the Android NDK. This all will happen more quickly if there are more contributors on the Android Tools team! Android is the most popular mobile OS, and can be 100% free software like Debian. Debian and its derivatives are one of the most popular platforms for Android development. This is an important combination that should grow only more integrated.

Last but not least, the Android Tools Team wants feedback on how this should all work, for example, ideas for how to nicely integrate Debian's Java libraries into the Android gradle workflow. And ideally, the Android Support libraries would also be reproducibly built and packaged somewhere that enforces only free software. Come find us on IRC and/or email! https://wiki.debian.org/AndroidTools#Communication_Channels

15 March, 2017 11:00AM by Hans-Christoph Steiner and Kai-Chung Yan (殷啟聰)

March 14, 2017

hackergotchi for Keith Packard

Keith Packard

Valve

Consulting for Valve in my spare time

Valve Software has asked me to help work on a couple of Linux graphics issues, so I'll be doing a bit of consulting for them in my spare time. It should be an interesting diversion from my day job working for Hewlett Packard Enterprise on Memory Driven Computing and other fun things.

First thing on my plate is helping support head-mounted displays better by getting the window system out of the way. I spent some time talking with Dave Airlie and Eric Anholt about how this might work and have started on the kernel side of that. A brief synopsis is that we'll split off some of the output resources from the window system and hand them to the HMD compositor to perform mode setting and page flips.

After that, I'll be working out how to improve frame timing reporting back to games from a composited desktop under X. Right now, a game running on X with a compositing manager can't tell when each frame was shown, nor accurately predict when a new frame will be shown. This makes smooth animation rather difficult.

14 March, 2017 07:10PM

John Goerzen

Parsing the GOP’s Health Insurance Statistics

There has been a lot of noise lately about the GOP health care plan (AHCA) and the differences to the current plan (ACA or Obamacare). A lot of statistics are being misinterpreted.

The New York Times has an excellent analysis of some of this. But to pick it apart, I want to highlight a few things:

Many Republicans are touting the CBO’s estimate that, some years out, premiums will be 10% lower under their plan than under the ACA. However, this carries with it a lot of misleading information.

First of all, many are spinning this as if costs would go down. That’s not the case. The premiums would still rise — they would just have risen less by the end of the period than under ACA. That also ignores the immediate spike and throwing millions out of the insurance marketplace altogether.

Now then, where does this 10% number come from? First of all, you have to understand the older people are substantially more expensive to the health system, and therefore more expensive to insure. ACA limited the price differential from the youngest to the oldest people, which meant that in effect some young people were subsidizing older ones on the individual market. The GOP plan removes that limit. Combined with other changes in subsidies and tax credits, this dramatically increases the cost to older people. For instance, the New York Times article cites a CBO estimate that “the price an average 64-year-old earning $26,500 would need to pay after using a subsidy would increase from $1,700 under Obamacare to $14,600 under the Republican plan.”

They further conclude that these exceptionally high rates would be so unaffordable to older people that the older people will simply stop buying insurance on the individual market. This means that the overall risk pool of people in that market is healthier, and therefore the average price is lower.

So, to sum up: the reason that insurance premiums under the GOP plan will rise at a slightly slower rate long-term is that the higher-risk people will be unable to afford insurance in the first place, leaving only the cheaper people to buy in.

14 March, 2017 03:35PM by John Goerzen

Reproducible builds folks

Reproducible Builds: week 98 in Stretch cycle

Here's what happened in the Reproducible Builds effort between Sunday March 5 and Saturday March 11 2017:

Upcoming events

Reproducible Builds Hackathon Hamburg

The Reproducible Builds Hamburg Hackathon 2017, or RB-HH-2017 for short, is a 3 day hacking event taking place in the CCC Hamburg Hackerspace located inside the Frappant, which is collective art space located in a historical monument in Hamburg, Germany.

The aim of the hackathon is to spent some days working on Reproducible Builds in every distribution and project. The event is open to anybody interested on working on Reproducible Builds issues in any distro or project, with or without prio experience!

Packages filed

Chris Lamb:

Toolchain development

  • Guillem Jover uploaded dpkg 1.18.23 to unstable, declaring .buildinfo format 1.0 as "stable".

  • Jams McCoy uploaded devscripts 2.17.2 to unstable addingd support for .buildinfo files to the debsign utility via patches from Ximin Luo and Guillem Jover.

  • Hans-Christoph Steiner noted that the first reproducibility-related patch in the Android SDK was marked as confirmed.

Reviews of unreproducible packages

39 package reviews have been added, 7 have been updated and 9 have been removed in this week, adding to our knowledge about identified issues.

2 issue types have been added:

Weekly QA work

During our reproducibility testing, FTBFS bugs have been detected and reported by:

  • Chris Lamb (2)

buildinfo.debian.net development

reproducible-website development

tests.reproducible-builds.org

  • Hans-Christoph Steiner gave a progress report on testing F-Droid: we now have a complete vagrant workflow working in nested KVM! So we can provision a new KVM guest, then package it using vagrant box all inside of a KVM guest (which is a profitbricks build node). So we finally have a working setup on jenkins.debian.net. Next up is fixing bugs in our libvirt snapshoting support.
  • Then Hans-Christoph was also able to enable building of all F-Droid apps in our setup, though this is still work in progress…
  • Daniel Shahaf spotted a subtile error in our FreeBSD sudoers configuration and as a result the FreeBSD reproducibility results are back.
  • Holger once again adjusted the Debian armhf scheduling frequency, to cope with the ever increasing amount of armhf builds.
  • Mattia spotted a refactoring error which resulted in no maintenance mails for a week.
  • Holger also spent some time on improving IRC notifications further, though there is still some improvements to be made.

Misc.

This week's edition was written by Chris Lamb, Holger Levsen, Vagrant Cascadian & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

14 March, 2017 06:41AM

March 13, 2017

hackergotchi for Sean Whitton

Sean Whitton

Initial views of 5th edition DnD

I’ve been playing in a 5e campaign for around two months now. In the past ten days or so I’ve been reading various source books and Internet threads regarding the design of 5th edition. I’d like to draw some comparisons and contrasts between 5th edition, and the 3rd edition family of games (DnD 3.5e and Paizo’s Pathfinder, which may be thought of as 3.75e).

The first thing I’d like to discuss is that wizards and clerics are no longer Vancian spellcasters. In rules terms, this is the idea that individual spells are pieces of ammunition. Spellcasters have a list of individual spells stored in their heads, and as they cast spells from that list, they cross off each item. Barring special rules about spontaneously converting prepared spells to healing spells, for clerics, the only way to add items back to the list is to take a night’s rest. Contrast this with spending points from a pool of energy in order to use an ability to cast a fireball. Then the limiting factor on using spells is having enough points in your mana pool, not having further castings of the spell waiting in memory.

One of the design goals of 5th edition was to reduce the dominance of spellcasters at higher levels of play. The article to which I linked in the previous paragraph argues that this rebalancing requires the removal of Vancian magic. The idea, to the extent that I’ve understood it, is that Vancian magic is not an effective restriction on spellcaster power levels, so it is to be replaced with other restrictions—adding new restrictions while retaining the restrictions inherent in Vancian magic would leave spellcasters crippled.

A further reason for removing Vancian magic was to defeat the so-called “five minute adventuring day”. The compat ability of a party that contains higher level Vancian spellcasters drops significantly once they’ve fired off their most powerful combat spells. So adventuring groups would find themselves getting into a fight, and then immediately retreating to fully rest up in order to get their spells back. This removes interesting strategic and roleplaying possibilities involving the careful allocation of resources, and continuing to fight as hit points run low.

There are some other related changes. Spell components are no longer used up when casting a spell. So you can use one piece of bat guano for every fireball your character ever casts, instead of each casting requiring a new piece. Correspondingly, you can use a spell focus, such as a cool wand, instead of a pouch full of material components—since the pouch never runs out, there’s no mechanical change if a wizard uses an arcane focus instead. 0th level spells may now be cast at will (although Pathfinder had this too). And there are decent 0th level attack spells, so a spellcaster need not carry a crossbow or shortbow in order to have something to do on rounds when it would not be optimal to fire off one of their precious spells.

I am very much in favour of these design goals. The five minute adventuring day gets old fast, and I want it to be possible for the party to rely on the cool abilities of non-spellcasters to deal with the challenges they face. However, I am concerned about the flavour changes that result from the removal of Vancian magic. These affect wizards and clerics differently, so I’ll take each case in turn.

Firstly, consider wizards. In third edition, a wizard had to prepare and cast Read Magic (the only spell they could prepare without a spellbook), and then set about working through their spellbook. This involved casting the spells they wanted to prepare, up until the last few triggering words or gestures that would cause the effect of the spell to manifest. They would commit these final parts of the spell to memory. When it came to casting the spell, the wizard would say the final few words and make the required gestures, and bring out relevant material components from their component pouch. The completed spell would be ripped out of their mind, to manifest its effect in the world. We see that the casting of a spell is a highly mentally-draining activity—it rips the spell out of the caster’s memory!—not to be undertaken lightly. Thus it is natural that a wizard would learn to use a crossbow for basic damage-dealing. Magic is not something that comes very naturally to the wizard, to be deployed in combat as readily as the fighter swings their sword. They are not a superhero or video game character, “pew pew”ing their way to victory. This is a very cool starting point upon which to roleplay an academic spellcaster, not really available outside of tabletop games. I see it as a distinction between magical abilities and real magic.

Secondly, consider clerics. Most of the remarks in the previous paragraph apply, suitably reworked to be in terms of requesting certain abilities from the deity to whom the cleric is devoted. Additionally, there is the downgrading of the importance of the cleric’s healing magic in 5th edition. Characters can heal themselves by taking short and long rests. Previously, natural healing was very slow, so a cleric would need to convert all their remaining magic to healing spells at the end of the day, and hope that it was enough to bring the party up to fighting shape. Again, this made the party of adventurers seem less like superheroes or video game characters. Magic had a special, important and unique role, that couldn’t be replaced by the abilities of other classes.

There are some rules in the back of the DMG—“Slow Natural Healing”, “Healing Kit Dependency”, “Lingering Wounds”—which can be used to make healing magic more important. I’m not sure how well they would work without changes to the cleric class.

I would like to find ways to restore the feel and flavour of Vancian clerics and wizards to 5th edition, without sacrificing the improvements that have been made that let other party members do cool stuff too. I hope it is possible to keep magic cool and unique without making it dominate the game. It would be easy to forbid the use of arcane foci, and say that material component pouches run out if the party do not visit a suitable marketplace often enough. This would not have a significant mechanical effect, and could enhance roleplaying possibilities. I am not sure how I could deal with the other issues I’ve discussed without breaking the game.

The second thing I would like to discuss is bounded accuracy. Under this design principle, the modifiers to dice rolls grow much more slowly. The gain of hit points remains unbounded. Under third edition, it was mechanically impossible for a low-level monster to land a hit on a higher-level adventurer, rendering them totally useless even in overwhelming numbers. With bounded accuracy, it’s always possible for a low-level monster to hit a PC, even if they do insigificant damage. That means that multiple low-level monsters pose a threat.

This change opens up many roleplaying opportunities by keeping low-level character abilities relevant, as well as monster types that can remain involves in stories without giving them implausible new abilities so they don’t fall far behind the PCs. However, I’m a little worried that it might make high level player characters feel a lot less powerful to play. I want to cease a be a fragile adventurer and become a world-changing hero at later levels, rather than forever remain vulnerable to the things that I was vulnerable to at the start of the game. This desire might just be the result of the video games which I played growing up. In the JRPGs I played and in Diablo II, enemies in earlier areas of the map were no threat at all once you’d levelled up by conquering higher-level areas. My concerns about bounded accuracy might just be that it clashes with my own expectations of how fantasy heroes work. A good DM might be able to avoid these worries entirely.

The final thing I’d like to discuss is the various simplifications to the rules of 5th edition, when it is compared with 3rd edition and Pathfinder. Attacks of opportunity are only provoked when leaving a threatened square; you can go ahead and cast a spell when in melee with someone. There is a very short list of skills, and party members are much closer to each other in skills, now that you can’t pump more and more ranks into one or two abilities. Feats as a whole are an optional rule.

At first I was worried about these simplifications. I thought that they might make character building and tactics in combat a lot less fun. However, I am now broadly in favour of all of these changes, for two reasons. Firstly, they make the game so much more accessible, and make it far more viable to play without relying on a computer program to fill in the boxes on your character sheet. In my 5th edition group, two of us have played 3rd edition games, and the other four have never played any tabletop games before. But nobody has any problems figuring out their modifiers because it is always simply your ability bonus or penalty, plus your proficiency bonus if relevant. And advantage and disadvantage is so much more fun than getting an additional plus or minus two. Secondly, these simplifications downplay the importance of the maths, which means it is far less likely to be broken. It is easier to ensure that a smaller core of rules is balanced than it is to keep in check a larger mass of rules, constantly being supplemented by more and more addon books containing more and more feats and prestige classes. That means that players make their characters cool by roleplaying them in interesting ways, not making them cool by coming up with ability combos and synergies in advance of actually sitting down to play. Similarly, DMs can focus on flavouring monsters, rather than writing up longer stat blocks.

I think that this last point reflects what I find most worthwhile about tabletop RPGs. I like characters to encounter cool NPCs and cool situations, and then react in cool ways. I don’t care that much about character creation. (I used to care more about this, but I think it was mainly because of interesting options for magic items, which hasn’t gone away.) The most important thing is exercising group creativity while actually playing the game, rather than players and DMs having to spend a lot of time preparing the maths in advance of playing. Fifth edition enables this by preventing the rules from getting in the way, because they’re broken or overly complex. I think this is why I love Exalted: stunting is vital, and there is social combat. I hope to be able to work out a way to restore Vancian magic, but even without that, on balance, fifth edition seems like a better way to do group storytelling about fantasy heroes. Hopefully I will have an opportunity to DM a 5th edition campaign. I am considering disallowing all homebrew and classes and races from supplemental books. Stick to the well-balanced core rules, and do everything else by means of roleplaying and flavour. This is far less gimmicky, if more work for unimaginative players (such as myself!).

Some further interesting reading:

13 March, 2017 11:37PM

hackergotchi for Ross Gammon

Ross Gammon

February 2017 – My Free Software activities summary

When I sat down to write this blog, I thought I hadn’t got much done in February. But as it took  me quite a while to write up, there must have actually been a little bit of progress. With my wife starting a new job, there have been some adjustments in family life, and I have struggled just to keep up with all the Debian and Ubuntu emails. Anyway……..

Debian

Ubuntu

  • Tested Ubuntu Studio 16.02.2 point release, marked as ready, and updated the Release Notes.
  • Started updating my previous Gramps backport in Ubuntu to Gramps 4.2.5. The package builds fine, and I have tested that it installs and works. I just need to update the bug.
  • Prepared updates to the ubuntustudio-default-settings & ubuntustudio-meta packages. There were some deferred changes from before Yakkety was released, including moving the final bit of configuration left in the ubuntustudio-lightdm-theme package to ubuntustudio-default-settings. Jeremy Bicha sponsored the uploads after suggesting moving away from some transitional ttf font packages in ubuntustudio-meta.
  • Tested the Ubuntu Studio 17.04 First Beta release, marked as ready, and prepared the Release Notes.
  • Upgraded my music studio Ubuntu Studio computer to Yakkety 16.1o.
  • Got accepted as an Ubuntu Contributing Developer by the Developer Membership Board.

Other

  • After a merge of my Family Tree with the Family Tree of my wife in Gramps a long way back, I finally started working through the database merging duplicates and correcting import errors.
  • Worked some more on the model railway, connecting up the other end of the tunnel section with the rest of the railway.

Plan status from last month & update for next month

Debian

For the Debian Stretch release:

  • Keep an eye on the Release Critical bugs list, and see if I can help fix any. – In Progress

Generally:

  • Finish the Gramps 5.2.5 backport for Jessie. – Done
  • Package all the latest upstream versions of my Debian packages, and upload them to Experimental to keep them out of the way of the Stretch release.
  • Begin working again on all the new stuff I want packaged in Debian.

Ubuntu

  • Finish the ubuntustudio-lightdm-theme, ubuntustudio-default-settings transition including an update to the ubuntustudio-meta packages. – Done
  • Reapply to become a Contributing Developer. – Done
  • Start working on an Ubuntu Studio package tracker website so that we can keep an eye on the status of the packages we are interested in. – Started
  • Start testing & bug triaging Ubuntu Studio packages. – In progress
  • Test Len’s work on ubuntustudio-controls – In progress
  • Do the Ubuntu Studio Zesty 17.04 Final Beta release.

Other

  • Give JMRI a good try out and look at what it would take to package it. – In progress
  • Also look at OpenPLC for simulating the relay logic of real railway interlockings (i.e. a little bit of the day job at home involving free software – fun!). – In progress

13 March, 2017 09:06PM by Ross Gammon

hackergotchi for Michal Čihař

Michal Čihař

Weblate users survey

Weblate is growing quite well in last months, but sometimes it's development is really driven by people who complain instead of following some roadmap with higher goals. I think it's time to change it at least a little bit. In order to get broader feedback I've sent out short survey to active project owners in Hosted Weblate week ago.

I've decided to target at smaller audience for now, though publicly open survey might follow later (but it's always harder to evaluate feedback across different user groups).

Overall feelings were really positive, most people find Weblate better than other similar services they have used. This is really something I like to hear :-).

Weblate overall experience

Weblate compared with other tools

But the most important part for me was where users want to see improvements. This somehow matches my expectation that we really should improve the user interface.

Weblate future development

We have quite a lot features, which are really hidden in the user interface. Also interface for some of the features is far from being intuitive. This all probably comes from the fact that we really don't have anybody experienced with creating user interfaces right now. It's time to find somebody who will help us. In case you are able to help or know somebody who might be interested in helping, please get in touch. Weblate is free software, but this can still be paid job.

Last part of the survey was focused on some particular features, but the outcome was not as clear as I hoped for as almost all feature group attracted about same attention (with one exception being extending the API, which was not really wanted by most of the users).

Overall I think doing some survey like this is useful and I will certainly repeat it (probably yearly or so), to see where we're moving and what our users want. Having feedback from users is important for every project and this seemed to worked quite well. Anyway if you have further feedback, don't hesitate to use our issue tracker at GitHub or contact me directly.

Filed under: Debian English phpMyAdmin SUSE Weblate | 0 comments

13 March, 2017 11:00AM

March 12, 2017

Iustin Pop

A recipe for success

It is said that with age comes wisdom. I would be happy for that to be true, because today I must have been very very young then.

For example, if you want to make a long bike ride in order to hit some milestone, like your first metric century, it is not indicated to follow ANY of the following points:

  • instead of doing this in the season, when you're fit, wait over the winter, during which you should indulge in food and drink with only an occasional short bike ride, so that most of your fitness is gone and replaced by a few extra kilograms;
  • instead of choosing a flat route that you've done before, extending it a bit to hit the target distance, think about taking the route from one of the people you follow on Strava (and I mean real cyclists here); bonus points if you choose one they mention was about training instead of a freeride and gave it a meaningful name like "The ride of 3 peaks", something with 1'500m+ altitude gain…
  • in order to not get bogged down by too much by extra weight (those winter kilograms are enough!), skimp on breakfast (just a very very light one); together with the energy bar you eat, something like 400 calories…
  • take the same amount of food you take for much shorter and flatter rides; bonus points if you don't check the actual calories in the food, and instead of the presumed 700+ calories you think you're carrying (which might be enough, if you space them correctly, given how much you can absorb per hour), take at most 300 calories with you, because hey, your body is definitely used with long efforts in which you convert fat to energy on the fly, right? especially after said winter pause!
  • since water is scarce in the Swiss outdoors (not!), especially when doing a road bike ride, carry lots of water with you (full hydro-pack, 3l) instead of an extra banana or energy bar, or a sandwich, or nuts, or a steak… mmmm, steak!
  • and finally and most importantly don't do the ride indoors on the trainer, even though it can pretty realistically simulate the effort, but instead do it for real outside, where you can't simply stop when you had enough, because you have to get back home…

For bonus points, if you somehow manage to reach the third peak in the above ride, and have mostly only flat/down to the destination, do the following: be so glad you're done with climbing, that you don't pay attention to the map and start a wrong descent, on a busy narrow road, so that you can't stop immediately as you realise you've lost the track; it will cost you only an extra ~80 meters of height towards the end of the ride. Which are pretty cheap, since all the food is gone and the water almost as well, so the backpack is light. Right.

However, if you do follow all the above, you're rewarded with a most wonderful thing for the second half of the ride: your will receive a +5 boost on your concentration skill. You will be able to focus on, and think about a single thing for hours at a time, examining it (well, its contents) in minute detail.

Plus, when you get home and open that thing—I mean, of course, the FRIDGE with all the wonderful FOOD it contains—everything will taste MAGICAL! You can now recoup the roughly 1500 calories deficit on the ride, and finally no longer feel SO HUNGRY.

That's all. Strava said "EXTREME" suffer score, albeit less than 20% points in the red, which means I was just slugging through the ride (total time confirms it), like a very very very old man. But definitely not a wise one.

12 March, 2017 10:38PM

Mike Hommey

When the memory allocator works against you

Cloning mozilla-central with git-cinnabar requires a lot of memory. Actually too much memory to fit in a 32-bits address space.

I hadn’t optimized for memory use in the first place. For instance, git-cinnabar keeps sha-1s in memory as hex values (40 bytes) rather than raw values (20 bytes). When I wrote the initial prototype, it didn’t matter that much, and while close(ish) to the tipping point, it didn’t require more than 2GB of memory at the time.

Time passed, and mozilla-central grew. I suspect the recent addition of several thousands of commits and files has made things worse.

In order to come up with a plan to make things better (short or longer term), I needed data. So I added some basic memory resource tracking, and collected data while cloning mozilla-central.

I must admit, I was not ready for what I witnessed. Follow me for a tale of frustrations (plural).

I was expecting things to have gotten worse on the master branch (which I used for the data collection) because I am in the middle of some refactoring and did many changes that I was suspecting might have affected memory usage. I wasn’t, however, expecting to see the clone command using 10GB(!) memory at peak usage across all processes.

(Note, those memory sizes are RSS, minus “shared”)

It also was taking an unexpected long time, but then, I hadn’t cloned a large repository like mozilla-central from scratch in a while, so I wasn’t sure if it was just related to its recent growth in size or otherwise. So I collected data on 0.4.0 as well.

Less time spent, less memory usage… ok. There’s definitely something wrong on master. But wait a minute, that slope from ~2GB to ~4GB on the git-remote-hg process doesn’t actually make any kind of sense. I mean, I’d understand it if it were starting and finishing with the “Import manifest” phase, but it starts in the middle of it, and ends long before it finishes. WTH?

First things first, since RSS can be a variety of things, I checked /proc/$pid/smaps and confirmed that most of it was, indeed, the heap.

That’s the point where you reach for Google, type something like “python memory profile” and find various tools. One from the results that I remembered having used in the past is guppy’s heapy.

Armed with pdb, I broke execution in the middle of the slope, and tried to get memory stats with heapy. SIGSEGV. Ouch.

Let’s try something else. I reached out to objgraph and pympler. SIGSEGV. Ouch again.

Tried working around the crashes for a while (too long while, retrospectively, hindsight is 20/20), and was somehow successful at avoiding them by peaking at a smaller set of objects. But whatever I did, despite being attached to a process that had 2.6GB RSS, I wasn’t able to find more than 1.3GB of data. This wasn’t adding up.

It surely didn’t help that getting to that point took close to an hour each time. Retrospectively, I wish I had investigated using something like Checkpoint/Restore in Userspace.

Anyways, after a while, I decided that I really wanted to try to see the whole picture, not smaller peaks here and there that might be missing something. So I resolved myself to look at the SIGSEGV I was getting when using pympler, collecting a core dump when it happened.

Guess what? The Debian python-dbg package does not contain the debug symbols for the python package. The core dump was useless.

Since I was expecting I’d have to fix something in python, I just downloaded its source and built it. Ran the command again, waited, and finally got a backtrace. First Google hit for the crashing function? The exact (unfixed) crash reported on the python bug tracker. No patch.

Crashing code is doing:

((f)->f_builtins != (f)->f_tstate->interp->builtins)

And (f)->f_tstate is NULL. Classic NULL deref.

Added a guard (assessing it wouldn’t break anything). Ran the command again. Waited. Again. SIGSEGV.

Facedesk. Another crash on the same line. Did I really use the patched python? Yes. But this time (f)->f_tstate->interp is NULL. Sigh.

Same player, shoot again.

Finally, no crash… but still stuck on only 1.3GB accounted for. Ok, I know not all python memory profiling tools are entirely reliable, let’s try heapy again. SIGSEGV. Sigh. No debug info on the heapy module, where the crash happens. Sigh. Rebuild the module with debug info, try again. The backtrace looks like heapy is recursing a lot. Look at %rsp, compare with the address space from /proc/$pid/maps. Confirmed. A stack overflow. Let’s do ugly things and increase the stack size in brutal ways.

Woohoo! Now heapy tells me there’s even less memory used than the 1.3GB I found so far. Like, half less. Yeah, right.

I’m not clear on how I got there, but that’s when I found gdb-heap, a tool from Red Hat’s David Malcolm, and the associated talk “Dude, where’s my RAM?” A deep dive into how Python uses memory (slides).

With a gdb attached, I would finally be able to rip python’s guts out and find where all the memory went. Or so I thought. The gdb-heap tool only found about 600MB. About as much as heapy did, for that matter, but it could be coincidental. Oh. Kay.

I don’t remember exactly what went through my mind then, but, since I was attached to a running process with gdb, I typed the following on the gdb prompt:

gdb> call malloc_stats()

And that’s when the truth was finally unvealed: the memory allocator was just acting up the whole time. The ouput was something like:

Arena 0:
system bytes    =  some number above (but close to) 2GB
in use bytes    =  some number above (but close to) 600MB

Yes, the glibc allocator was just telling it had allocated 600MB of memory, but was holding onto 2GB. I must have found a really bad allocation pattern that causes massive fragmentation.

One thing that David Malcolm’s talk taught me, though, is that python uses its own allocator for small sizes, so the glibc allocator doesn’t know about them. And, roughly, adding the difference between RSS and what glibc said it was holding to to the use bytes it reported somehow matches the 1.3GB I had found so far.

So it was time to see how those things evolved in time, during the entire clone process. I grabbed some new data, tracking the evolution of “system bytes” and “in use bytes”.

There are two things of note on this data:

  • There is a relatively large gap between what the glibc allocator says it has gotten from the system, and the RSS (minus “shared”) size, that I’m expecting corresponds to the small allocations that python handles itself.
  • Actual memory use is going down during the “Import manifests” phase, contrary to what the evolution of RSS suggests.

In fact, the latter is exactly how git-cinnabar is supposed to work: It reads changesets and manifests chunks, and holds onto them while importing files. Then it throws away those manifests and changesets chunks one by one while it imports them. There is, however, some extra bookkeeping that requires some additional memory, but it’s expected to be less memory consuming than keeping all the changesets and manifests chunks in memory.

At this point, I thought a possible explanation is that since both python and glibc are mmap()ing their own arenas, they might be intertwined in a way that makes things not go well with the allocation pattern happening during the “Import manifest” phase (which, in fact, allocates and frees increasingly large buffers for each manifest, as manifests grow in size in the mozilla-central history).

To put the theory at work, I patched the python interpreter again, making it use malloc() instead of mmap() for its arenas.

“Aha!” I thought. That definitely looks much better. Less gap between what glibc says it requested from the system and the RSS size. And, more importantly, no runaway increase of memory usage in the middle of nowhere.

I was preparing myself to write a post about how mixing allocators could have unintended consequences. As a comparison point, I went ahead and ran another test, with the python allocator entirely disabled, this time.

Heh. It turns out glibc was acting up all alone. So much for my (plausible) theory. (I still think mixing allocators can have unintended consequences.)

(Note, however, that the reason why the python allocator exists is valid: without it, the overall clone took almost 10 more minutes)

And since I had been getting all this data with 0.4.0, I gathered new data without the python allocator with the master branch.

This paints a rather different picture than the original data on that branch, with much less memory use regression than one would think. In fact, there isn’t much difference, except for the spike at the end, which got worse, and some of the noise during the “Import manifests” phase that got bigger, implying larger amounts of temporary memory used. The latter may contribute to the allocation patterns that throw glibc’s memory allocator off.

It turns out tracking memory usage in python 2.7 is rather painful, and not all the tools paint a complete picture of it. I hear python 3.x is somewhat better in that regard, and I hope it’s true, but at the moment, I’m stuck with 2.7. The most reliable tool I’ve used here, it turns out, is pympler. Or rebuilding the python interpreter without its allocator, and asking the system allocator what is allocated.

With all this data, I now have some defined problems to tackle, some easy (the spike at the end of the clone), and some less easy (working around glibc allocator’s behavior). I have a few hunches as to what kind of allocations are causing the runaway increase of RSS. Coincidentally, I’m half-way through a refactor of the code dealing with manifests, and it should help dealing with the issue.

But that will be the subject of a subsequent post.

12 March, 2017 01:47AM by glandium

hackergotchi for Steve Kemp

Steve Kemp

How I started programming

I've written parts of this story in the past, but never in one place and never in much detail. So why not now?

In 1982 my family moved house, so one morning I went to school and at lunch-time I had to walk home to a completely different house.

We moved sometime towards the end of the year, and ended up spending lots of money replacing the windows of the new place. For people in York I was born in Farrar Street, Y010 3BY, and we moved to a place on Thief Lane, YO1 3HS. Being named as it was I "ironically" stole at least two street-signs and hung them on my bedroom wall. I suspect my parents were disappointed.

Anyway the net result of this relocation, and the extra repairs meant that my sisters and I had a joint Christmas present that year, a ZX Spectrum 48k.

I tried to find pictures of what we received but unfortunately the web doesn't remember the precise bundle. All together though we received:

I know we also received Horace and the Spiders, and I have vague memories of some other things being included, including a Space Invaders clone. No doubt my parents bought them separately.

Highlights of my Spectrum-gaming memories include R-Type, Strider, and the various "Dizzy" games. Some of the latter I remember very fondly.

Unfortunately this Christmas was pretty underwhelming. We unpacked the machine, we cabled it up to the family TV-set - we only had the one, after all - and then proceeded to be very disappointed when nothing we did resulted in a successful game! It turns out our cassette-deck was not good enough. Being back in the 80s the shops were closed over Christmas, and my memory is that it was around January before we received a working tape-player/recorder, such that we could load games.

Happily the computer came with manuals. I read one, skipping words and terms I didn't understand. I then read the other, which was the spiral-bound orange book. It contained enough examples and decent wording that I learned to write code in BASIC. Not bad for an 11/12 year old.

Later I discovered that my local library contained "computer books". These were colourful books that promised "The Mystery of Silver Mounter", or "Write your own ADVENTURE PROGRAMS". But were largely dry books that contained nothing but multi-page listings of BASIC programs to type in. Often with adjustments that had to be made for your own computer-flavour (BASIC varying between different systems).

If you want to recapture the magic scroll to the foot of this Osbourne page and you can download them!

Later I taught myself Z80 Assembly Language, partly via the Spectrum manual and partly via such books as these two (which I still own 30ish years later):

  • Understanding your Spectrum, Basic & Machine Code Programming.
    • by Dr Ian Logan
  • An introduction to Z80 Machine Code.
    • R.A & J.W Penfold

Pretty much the only reason I continued down this path is because I wanted infinite/extra lives in the few games I owned. (Which were largely pirated via the schoolboy network of parents with cassette-copiers.)

Eventually I got some of my l33t POKES printed in magazines, and received free badges from the magazines of the day such as Your Sinclair & Sinclair User. For example I was "Hacker of the Month" in the Your Sinclair issue 67 , Page 32, apparently because I "asked so nicely in my letter".

Terrible scan is terrible:

Anyway that takes me from 1980ish to 1984. The only computer I ever touched was a Spectrum. Friends had other things, and there were Sega consoles, but I have no memories of them. Suffice it to say that later when I first saw a PC (complete with Hercules graphics, hard drives, and similar sourcery, running GEM IIRC) I was pleased that Intel assembly was "similar" to Z80 assembly - and now I know the reason why.

Some time in the future I might document how I got my first computer job. It is hillarious. As was my naivete.

12 March, 2017 12:00AM

March 11, 2017

John Goerzen

Silent Data Corruption Is Real

Here’s something you never want to see:

ZFS has detected a checksum error:

   eid: 138
 class: checksum
  host: alexandria
  time: 2017-01-29 18:08:10-0600
 vtype: disk

This means there was a data error on the drive. But it’s worse than a typical data error — this is an error that was not detected by the hardware. Unlike most filesystems, ZFS and btrfs write a checksum with every block of data (both data and metadata) written to the drive, and the checksum is verified at read time. Most filesystems don’t do this, because theoretically the hardware should detect all errors. But in practice, it doesn’t always, which can lead to silent data corruption. That’s why I use ZFS wherever I possibly can.

As I looked into this issue, I saw that ZFS repaired about 400KB of data. I thought, “well, that was unlucky” and just ignored it.

Then a week later, it happened again. Pretty soon, I noticed it happened every Sunday, and always to the same drive in my pool. It so happens that the highest I/O load on the machine happens on Sundays, because I have a cron job that runs zpool scrub on Sundays. This operation forces ZFS to read and verify the checksums on every block of data on the drive, and is a nice way to guard against unreadable sectors in rarely-used data.

I finally swapped out the drive, but to my frustration, the new drive now exhibited the same issue. The SATA protocol does include a CRC32 checksum, so it seemed (to me, at least) that the problem was unlikely to be a cable or chassis issue. I suspected motherboard.

It so happened I had a 9211-8i SAS card. I had purchased it off eBay awhile back when I built the server, but could never get it to see the drives. I wound up not filling it up with as many drives as planned, so the on-board SATA did the trick. Until now.

As I poked at the 9211-8i, noticing that even its configuration utility didn’t see any devices, I finally started wondering if the SAS/SATA breakout cables were a problem. And sure enough – I realized I had a “reverse” cable and needed a “forward” one. $14 later, I had the correct cable and things are working properly now.

One other note: RAM errors can sometimes cause issues like this, but this system uses ECC DRAM and the errors would be unlikely to always manifest themselves on a particular drive.

So over the course of this, had I not been using ZFS, I would have had several megabytes of reads with undetected errors. Thanks to using ZFS, I know my data integrity is still good.

11 March, 2017 09:34PM by John Goerzen

Enrico Zini

On the meaning of "we"

Rather than as a word of endearment, I'm starting to see "we" as a word of entitlement.

In some moments of insecurity, I catch myself "wee"-ing over other people, to claim them as mine.

11 March, 2017 01:11PM

March 10, 2017

hackergotchi for Jonathan Dowland

Jonathan Dowland

Nintendo NES Classic Mini

After months of trying, I've finally got my hands on a Nintendo NES Classic Mini. It's everything I wish retropie was: simple, reliable, plug-and-play gaming. I didn't have a NES at the time, so the games are all mostly new to me (although I'm familiar with things like Super Mario Brothers).

NES classic and 8bitdo peripherals

NES classic and 8bitdo peripherals

The two main complaints about the NES classic are the very short controller cable and the need to press the "reset" button on the main unit to dip in and out of games. Both are addressed by the excellent 8bitdo Retro Receiver for NES Classic bundle. You get a bluetooth dongle that plugs into the classic and a separate wireless controller. The controller is a replica of the original NES controller. However, they've added another two buttons on the right-hand side alongside the original "A" and "B", and two discrete shoulder buttons which serve as turbo-repeat versions of "A" and "B". The extra red buttons make it look less authentic which is a bit of a shame, and are not immediately useful on the NES classic (but more on that in a minute).

With the 8bitdo controller, you can remotely activate the Reset button by pressing "Down" and "Select" at the same time. Therefore the whole thing can be played from the comfort of my sofa.

That's basically enough for me, for now, but in the future if I want to expand the functionality of the classic, it's possible to mod it. A hack called "Hakchi2" lets you install additional NES ROMs; install retroarch-based emulator cores and thus play SNES, Megadrive, N64 (etc. etc.) games; as well as other hacks like adding "down+select" Reset support to the wired controller. If you were playing non-NES games on the classic, then the extra buttons on the 8bitdo become useful.

10 March, 2017 11:45AM

Reproducible builds folks

Reproducible Builds: week 97 in Stretch cycle

Here's what happened in the Reproducible Builds effort between Sunday February 26 and Saturday March 4 2017:

Upcoming Events

Ed Maste will present Reproducible Builds in FreeBSD at AsiaBSDCon 2017.

Ximin Luo will present Reproducible builds, its uses and the future at Open Source Days in Copenhagen on March 18.

Holger Levsen will give a talk at the German Unix User Group's "Frühjahrsfachgespräch" in Darmstadt, Germany, about Reproducible Builds everywhere on March 23.

Verifying Software Freedom with Reproducible Builds will be presented by Vagrant Cascadian at Libreplanet2017 in Boston, March 25th-26th.

Media coverage

Aspiration Tech published a very detailed report on our Reproducible Builds World Summit 2016 in Berlin.

Reproducible work in other projects

Duncan published a very thorough post on the Rust Programming Language Forum about reproducible builds in the Rust compiler and toolchain.

In particular, he produced a table recording the reproducibility of different build products under different individual variations, totalling 187 build+variation combinations.

Packages reviewed and fixed, and bugs filed

Chris Lamb:

Dhole:

Reviews of unreproducible packages

60 package reviews have been added, 8 have been updated and 13 have been removed in this week, adding to our knowledge about identified issues.

1 issue type has been added:

Weekly QA work

During our reproducibility testing, FTBFS bugs have been detected and reported by:

  • Chris Lamb (3)

diffoscope development

diffoscope 78 was uploaded to unstable and jessie-backports by Mattia Rizzolo. It included contributions from:

  • Chris Lamb:
    • Make tests that call xxd work on jessie again. (Closes: #855239)
    • tests: Move normalize_zeros to more generic utils.data module.
  • Brett Smith:
    • comparators.json: Catch bad JSON errors on Python pre-3.5. (Closes: #855233)
  • Ed Maste:
    • Use BSD-style stat(1) on FreeBSD. (Closes: #855169)

In addition, the following changes were made on the experimental branch:

  • Chris Lamb (4):
    • Tidy cbfs tests.
    • Correct "exercice" -> "exercise" typo.
    • Support newer versions of cbfstool to avoid test failure. (Closes: #856446)
    • Skip icc test that varies on endian if the (Debian-specific) patch is not present. (Closes: #856447)

reproducible-website development

  • anonmos1:
    • Replace root with 0 when giving UIDs/GIDs to GNU tar.
  • Holger Levsen and Chris Lamb:
    • Publish report by Aspiration Tech about RWS Berlin 2016.

tests.reproducible-builds.org

  • Ed Maste continued his work on testing FreeBSD for reproducibility but hasn't reached the magical 100% mark yet.
  • Holger Levsen adjusted the Debian builders scheduling frequency, mostly to adopt to armhf having become faster due to the two new nodes.

Misc.

This week's edition was written by Ximin Luo, Chris Lamb, Holger Levsen & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

10 March, 2017 08:41AM

hackergotchi for Martín Ferrari

Martín Ferrari

SunCamp happening again this May!

As I announced in mailing lists a few days ago, the Debian SunCamp (DSC2017) is happening again this May.

SunCamp different to most other Debian events. Instead of a busy schedule of talks, SunCamp focuses on the hacking and socialising aspect, without making it just a Debian party/vacation.

DSC2016 - Hacking and discussing

The idea is to have 4 very productive days, staying in a relaxing and comfy environment, working on your own projects, meeting with your team, or presenting to fellow Debianites your most recent pet project.

DSC2016 - Tincho talking about Prometheus

We have tried to make this event the simplest event possible, both for organisers and attendees. There will be no schedule, except for the meal times at the hotel. But these can be ignored too, there is a lovely bar that serves snacks all day long, and plenty of restaurants and cafés around the village.

DSC2016 - Hacking and discussing

The SunCamp is an event to get work done, but there will be time for relaxing and socialising too.

DSC2016 - Well deserved siesta
DSC2016 - Playing Pétanque

Do you fancy a hack-camp in a place like this?

Swimming pool

Café Café terrace

One of the things that makes the event simple, is that we have negotiated a flat price for accommodation that includes usage of all the facilities in the hotel, and optionally food. We will give you a booking code, and then you arrange your accommodation as you please, you can even stay longer if you feel like it!

The rooms are simple but pretty, and everything has been renovated very recently.

Room Room view

We are not preparing a talks programme, but we will provide the space and resources for talks if you feel inclined to prepare one.

You will have a huge meeting room, divided in 4 areas to reduce noise, where you can hack, have team discussions, or present talks.

Hacklab Hacklab

Do you want to see more pictures? Check the full gallery


Debian SunCamp 2017

Hotel Anabel, LLoret de Mar, Province of Girona, Catalonia, Spain

May 18-21, 2017


Tempted already? Head to the wikipage and register now, it is only 2 months away!

Please try to reserve your room before the end of March. The hotel has reserved a number of rooms for us until that time. You can reserve a room after March, but we can't guarantee the hotel will still have free rooms.

Comment

10 March, 2017 07:36AM

March 09, 2017

hackergotchi for Steinar H. Gunderson

Steinar H. Gunderson

Tired

To be honest, at this stage I'd actually prefer ads in Wikipedia to having ever more intrusive begging for donations. Please go away soon.

09 March, 2017 06:28PM

Petter Reinholdtsen

Detecting NFS hangs on Linux without hanging yourself...

Over the years, administrating thousand of NFS mounting linux computers at the time, I often needed a way to detect if the machine was experiencing NFS hang. If you try to use df or look at a file or directory affected by the hang, the process (and possibly the shell) will hang too. So you want to be able to detect this without risking the detection process getting stuck too. It has not been obvious how to do this. When the hang has lasted a while, it is possible to find messages like these in dmesg:

nfs: server nfsserver not responding, still trying
nfs: server nfsserver OK

It is hard to know if the hang is still going on, and it is hard to be sure looking in dmesg is going to work. If there are lots of other messages in dmesg the lines might have rotated out of site before they are noticed.

While reading through the nfs client implementation in linux kernel code, I came across some statistics that seem to give a way to detect it. The om_timeouts sunrpc value in the kernel will increase every time the above log entry is inserted into dmesg. And after digging a bit further, I discovered that this value show up in /proc/self/mountstats on Linux.

The mountstats content seem to be shared between files using the same file system context, so it is enough to check one of the mountstats files to get the state of the mount point for the machine. I assume this will not show lazy umounted NFS points, nor NFS mount points in a different process context (ie with a different filesystem view), but that does not worry me.

The content for a NFS mount point look similar to this:

[...]
device /dev/mapper/Debian-var mounted on /var with fstype ext3
device nfsserver:/mnt/nfsserver/home0 mounted on /mnt/nfsserver/home0 with fstype nfs statvers=1.1
        opts:   rw,vers=3,rsize=65536,wsize=65536,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,soft,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=129.240.3.145,mountvers=3,mountport=4048,mountproto=udp,local_lock=all
        age:    7863311
        caps:   caps=0x3fe7,wtmult=4096,dtsize=8192,bsize=0,namlen=255
        sec:    flavor=1,pseudoflavor=1
        events: 61063112 732346265 1028140 35486205 16220064 8162542 761447191 71714012 37189 3891185 45561809 110486139 4850138 420353 15449177 296502 52736725 13523379 0 52182 9016896 1231 0 0 0 0 0 
        bytes:  166253035039 219519120027 0 0 40783504807 185466229638 11677877 45561809 
        RPC iostats version: 1.0  p/v: 100003/3 (nfs)
        xprt:   tcp 925 1 6810 0 0 111505412 111480497 109 2672418560317 0 248 53869103 22481820
        per-op statistics
                NULL: 0 0 0 0 0 0 0 0
             GETATTR: 61063106 61063108 0 9621383060 6839064400 453650 77291321 78926132
             SETATTR: 463469 463470 0 92005440 66739536 63787 603235 687943
              LOOKUP: 17021657 17021657 0 3354097764 4013442928 57216 35125459 35566511
              ACCESS: 14281703 14290009 5 2318400592 1713803640 1709282 4865144 7130140
            READLINK: 125 125 0 20472 18620 0 1112 1118
                READ: 4214236 4214237 0 715608524 41328653212 89884 22622768 22806693
               WRITE: 8479010 8494376 22 187695798568 1356087148 178264904 51506907 231671771
              CREATE: 171708 171708 0 38084748 46702272 873 1041833 1050398
               MKDIR: 3680 3680 0 773980 993920 26 23990 24245
             SYMLINK: 903 903 0 233428 245488 6 5865 5917
               MKNOD: 80 80 0 20148 21760 0 299 304
              REMOVE: 429921 429921 0 79796004 61908192 3313 2710416 2741636
               RMDIR: 3367 3367 0 645112 484848 22 5782 6002
              RENAME: 466201 466201 0 130026184 121212260 7075 5935207 5961288
                LINK: 289155 289155 0 72775556 67083960 2199 2565060 2585579
             READDIR: 2933237 2933237 0 516506204 13973833412 10385 3190199 3297917
         READDIRPLUS: 1652839 1652839 0 298640972 6895997744 84735 14307895 14448937
              FSSTAT: 6144 6144 0 1010516 1032192 51 9654 10022
              FSINFO: 2 2 0 232 328 0 1 1
            PATHCONF: 1 1 0 116 140 0 0 0
              COMMIT: 0 0 0 0 0 0 0 0

device binfmt_misc mounted on /proc/sys/fs/binfmt_misc with fstype binfmt_misc
[...]

The key number to look at is the third number in the per-op list. It is the number of NFS timeouts experiences per file system operation. Here 22 write timeouts and 5 access timeouts. If these numbers are increasing, I believe the machine is experiencing NFS hang. Unfortunately the timeout value do not start to increase right away. The NFS operations need to time out first, and this can take a while. The exact timeout value depend on the setup. For example the defaults for TCP and UDP mount points are quite different, and the timeout value is affected by the soft, hard, timeo and retrans NFS mount options.

The only way I have been able to get working on Debian and RedHat Enterprise Linux for getting the timeout count is to peek in /proc/. But according to Solaris 10 System Administration Guide: Network Services, the 'nfsstat -c' command can be used to get these timeout values. But this do not work on Linux, as far as I can tell. I asked Debian about this, but have not seen any replies yet.

Is there a better way to figure out if a Linux NFS client is experiencing NFS hangs? Is there a way to detect which processes are affected? Is there a way to get the NFS mount going quickly once the network problem causing the NFS hang has been cleared? I would very much welcome some clues, as we regularly run into NFS hangs.

09 March, 2017 02:20PM