> Our most important finding is that the reproducibility rate in nixpkgs has increased steadily from 69% in 2017 to about 91% in April 2023. The high reproducibility rate in our most recent revision is quite impressive, given both the size of the package set and the absence of systematic monitoring in nixpkgs. We knew that it was possible to achieve very good reproducibility rate in smaller package sets like Debian, but this shows that achieving very high bitwise reproducibility is possible at scale, something that was believed impossible by practitioners4
I think people in this thread are focusing on the wrong thing. Sure, not all packages are reproducible, but the project is systematically increasing the percentage of projects that are reproducible while ALSO adding new projects and demonstrating conclusively that what was considered infeasible is actually readily achievable.
> The interesting aspect of these causes is that they show that even if nixpkgs already achieves great reproducibility rates, there still exists some low hanging fruits towards improving reproducibility that could be tackled by the Nix community and the whole FOSS ecosystem.
This work is helpful I think for the community to tackle the sources of unreproducible builds to push the percentage up even further. I think it also highlights the need for automation to validate that there aren't systematic regressions or regressions in particularly popular packages (doing individual regressions for all packages is a futile effort unless a lot of people volunteer to be part of a distributed check effort).
sublimefire 3 hours ago [-]
Some interesting related stats from Debian also show good reproducibility progress
I think this debate comes down to exactly what "reproducible" means. Nix doesn't give bit-exact reproducibility, but it does give reproducible environments, by ensuring that the inputs are always bit-exact. It is closer to being fully reproducible than most other build systems (including Bazel) -- but because it can only reasonably ensure that the inputs are exact, it's still necessary for the build processes themselves to be fully deterministic to get end-to-end bit-exactness.
Nix on its own doesn't fully resolve supply chain concerns about binaries, but it can provide answers to a myriad of other problems. I think most people like Nix reproducibility, and it is marketed as such, for the sake of development: life is much easier when you know for sure you have the exact same version of each dependency, in the exact same configuration. A build on one machine may not be bit-exact to a build on another machine, but it will be exactly the same source code all the way down.
The quest to get every build process to be deterministic is definitely a bigger problem and it will never be solved for all of Nixpkgs. NixOS does have a reproducibility project[1], and some non-trivial amount of NixOS actually isproperly reproducible, but the observation that Nixpkgs is too vast is definitely spot-on, especially because in most cases the real issues lie upstream. (and carrying patches for reproducibility is possible, but it adds even more maintainer burden.)
> The quest to get every build process to be deterministic [...] will never be solved for all of Nixpkgs.
Not least because of unfree and/or binary-blob packages that can't be reproducible because they don't even build anything. As much as Guix' strict FOSS and build-from-source policy can be an annoyance, it is a necessary precondition to achieve full reproducibility from source, i.e. the full-source bootstrap.
jchw 15 hours ago [-]
Nixpkgs provides license[1] and source provenance[2] information. For legal reasons, Nix also defaults to not evaluating unfree packages. Not packaging them at all, though, doesn't seem useful from any technical standpoint; I think that is purely ideological.
In any case, it's all a bit imperfect anyway, since it's from the perspective of the package manager, which can't be absolutely sure there's no blobs. Anyone who follows Linux-libre releases can see how hard it really is to find all of those needles in the haystack. (And yeah, it would be fantastic if we could have machines with zero unfree code and no blobs, but the majority of computers sold today can't meaningfully operate like that.)
I actually believe there's plenty of value in the builds still being reproducible even when blobs are present: you can still verify that the supply chain is not compromised outside of the blobs. For practical reasons, most users will need to stick to limiting the amount of blobs rather than fully eliminating them.
you can slap a hash on a binary distribution and it becomes "reproducible" in the same trivial sense as any source tarball. after that, the reproducibility of whatever "build process" takes place to extract archives and shuffle assets around is no more or less fraught than any other package (probably less considering how much compilers have historically had to be brought to heel, especially before reproducibility was fashionable enough for it to enter much into compiler authors' consideration!!)
sa46 17 hours ago [-]
> It is closer to being fully reproducible than most other build systems (including Bazel).
How so? Bazel produces the same results for the same inputs.
jchw 17 hours ago [-]
Bazel doesn't guarantee bit-exact outputs, but also Bazel doesn't guarantee pure builds. It does have a sandbox that prevents some impurities, but for example it doesn't prevent things from going out to the network, or even accessing files from anywhere in the filesystem, if you use absolute paths. (Although, on Linux at least, Bazel does prevent you from modifying files outside of the sandbox directory.)
The Nix sandbox does completely obscure the host filesystem and limit network access to processes that can produce a bit-exact output only.
(Bazel also obviously uses the system compilers and headers. Nix does not.)
gf000 7 hours ago [-]
I think talking about sandboxes is missing a point a bit.
It's an important constituent, but only complete OS-emulation with deterministic scheduling could (at a huge overhead) actually result in bit-by-bit reproducible artifacts with arbitrary build steps.
There are an endless source of impurities/randomness and most compilers haven't historically cared much about this.
jchw 7 hours ago [-]
The point I'm making is that neither Bazel nor Nix do that. However, sandboxing is still relevant, because if you still have impurities leaking from outside the closure of the build, you have bigger fish to fry than non-deterministic builds.
That all said, in practice, many of the cases where Nixpkgs builds are not deterministic are actually fairly trivial. Despite not being a specific goal necessarily, compilers are more deterministic than not, and in practice the sources of non-determinism are fewer than you'd think. Case in point, I'm pretty sure the vast majority of Nixpkgs packages that are bit-for-bit reproducible just kind of are by accident, because nothing in the build is actually non-deterministic. Many of the cases of non-deterministic builds are fairly trivial, such as things just linking in different orders depending on scheduling.
Running everything under a deterministic VM would probably be too slow and/or cumbersome, so I think Nix is the best it's going to get.
gf000 7 hours ago [-]
Sandboxing is relevant, but nix does that by default, so no difference here.
Nonetheless, I agree that Nix does the optimum here, full-on emulation would be prohibitively expensive.
lmm 7 hours ago [-]
> There are an endless source of impurities/randomness and most compilers haven't historically cared much about this.
The point is that Nix will catch a lot more of them than Bazel does, since Nix manages the toolchain used to build, whereas Bazel just runs the host system cc.
taurknaut 6 hours ago [-]
> It's an important constituent, but only complete OS-emulation with deterministic scheduling could (at a huge overhead)
This does actually exist; check out antithesis's product. I'm not sure how much is public information but their main service is a deterministic (...I'm not sure to what extent this is true, but that was the claim I heard) many-core vm on which otherwise difficult testing scenarios can be reproduced (clusters, databases, video games, maybe even kernels?) to observe bugs that only arise in extremely difficult to reproduce circumstances.
It does seem like overkill just to get a marginally more reproducible build system, though.
dijit 17 hours ago [-]
Uh, Either my understanding of Bazel is wrong, or everything you wrote is wrong.
Bazel absolutely prevents network access and filesystem access (reads) from builds. (only permitting explicit network includes from the WORKSPACE file, and access to files explicitly depended on in the BUILD files).
Maybe you can write some “rules_” for languages that violate this, but it is designed purposely to be hermetic and bit-perfect reproducible.
EDIT:
From the FAQ[0]:
> Will Bazel make my builds reproducible automatically?
> For Java and C++ binaries, yes, assuming you do not change the toolchain.
The issues with Docker's style of "reproducible" (meaning.. consistent environment; are also outlined in the same FAQ[1]
> Doesn’t Docker solve the reproducibility problems?
> Docker does not address reproducibility with regard to changes in the source code. Running Make with an imperfectly written Makefile inside a Docker container can still yield unpredictable results.
I think you're both right in a sense. Bazel doesn't (in general) prevent filesystem access, e.g. to library headers in /usr/include. If those headers change (maybe because a Debian package got upgraded or whatever), Bazel won't know it has to invalidate the build cache. I think the FAQ is still technically correct because upgrading the Debian package for a random library dependency counts as "chang[ing] the toolchain" in this context. But I don't think you'd call it hermetic by default.
> Under the hood there's a default auto-configured toolchain that finds whatever is installed locally in the system. Since it has no way of knowing what files an arbitrary "cc" might depend on, you lose hermeticity by using it.
jchw 14 hours ago [-]
I believe your understanding of Bazel is wrong. I don't see any documentation that suggests the Bazel sandbox prevents the toolchain from accessing the network.
(Actually, it can: that documentation suggests it's optionally supported, at least on the Linux sandbox. That said, it's optional. There's definitely actions that use the network on purpose and can't participate in this.)
This may seem pointless, because in many situations this would only matter in somewhat convoluted cases. In C++ the toolchain probably won't connect to the network. This isn't the case for e.g. Rust, where proc macros can access the network. (In practical terms, I believe the sqlx crate does this, connecting to a local Postgres instance to do type inference.) Likewise, you could do an absolute file inclusion, but that would be very much on purpose and not an accident. So it's reasonable to say that you get a level of reproducibility when you use Bazel for C++ builds...
Kind of. It's not bit-for-bit because it uses the system toolchain, which is just an arbitrary choice. On Darwin it's even more annoying: with XCode installed via Mac App Store, the XCode version can change transparently under Bazel in the background, entirely breaking the hermeticity, and require you to purge the Bazel cache (because the dependency graph will be wrong and break the build. Usually.)
Nix is different. The toolchain is built by Nix and undergoes the same sandboxed build process with sandboxing and cryptographically verified inputs. Bazel does not do that.
paulddraper 14 hours ago [-]
It does.
There are mechanisms for opting out/breaking that, just as with Nix or any other system.
> macOS
What does nix do on these systems?
jchw 13 hours ago [-]
Opt-out would be one thing, but it's actually opt-in for network isolation, and a project can disable all sandboxing with just a .bazelrc. Nix does have ways to opt-out of sandboxing, but you can't do it inside a Nix expression: if you ran Nix with sandbox = true, anything being able to escape or bypass the sandbox restrictions would be a security vulnerability and assigned a CVE. Disabling the sandbox can only be done by a trusted user, and it's entirely out-of-band from the builder. For Bazel, the sandbox is mostly just there to prevent accidental impurities, but it's not water tight by any means.
Ultimately, I still think that Nix provides a greater degree of isolation and reproducibility than Bazel overall, and especially out of the box, but I was definitely incorrect when I said that Bazel's sandbox doesn't/can't block the network. I did dive a little deeper into the nuances in another comment.[1]
> What does nix do on these systems?
On macOS, Nix is not exactly as solid as it is on Linux. It uses sandbox-exec for sandboxing, which achieves most of what the Nix sandbox does on Linux, except it disallows all networking rather than just isolated networking. (Some derivations need local network access, so you can opt-in to having local network access per-derivation. This still doesn't give Internet access, though: internet access still requires a fixed-output derivation.) There's definitely some room for improvement there but it will be hard to do too much better since xnu doesn't have anything similar to network namespaces afaik.
As for the toolchain, I'm not sure how the Nix bootstrap works on macOS. It seems like a lot of effort went in to making it work and it can function without XCode installed. (Can't find a source for this, but I was using it on a Mac Mini that I'm pretty sure didn't have XCode installed. So it clearly has its own hermetic toolchain setup just like Linux.)
AFAIK Bazel does not use the sandbox by default. Last time I experimented with it, the sandbox had some problematic holes, but I don’t remember exactly what, and it’s been a few years.
The very doc you link hints at that, while also giving many caveats where the build will become non-reproducible. So it boils down to “yes, but only if you configure it correctly and do things right”.
jchw 12 hours ago [-]
Yeah, I think you are right: by default, there is no OS-level sandboxing going on. According to documentation, the default spawn strategy is `local`[1], whereas it would need to be `sandboxed` for sandboxing to take effect.
Meanwhile, if you want to forcibly block network access for a specific action, you can pass `block-network` as an execution requirement[2]. You can also explicitly block network access with flags, using --nosandbox_default_allow_network[3]. Interestingly though, an action can also `require-network` to bypass this, and I don't think there's any way to account for that.
Maybe more importantly, Bazel lacks the concept of a fixed-output action, so when an impure action needs `require-network` the potentially-impure results could impact downstream dependents of actions.
I was still ultimately incorrect to say that Bazel's sandbox can't sandbox the network. The actual reality is that it can. If you do enable the sandbox, while it's not exactly pervasive through the entire ecosystem, it does look like a fair number of projects at least set the `block-network` tag--about 700 as of writing this[4]. I think the broader point I was making (that Nix adheres to a stronger standard of "hermetic" than Bazel) is ultimately true, but I did miss on a bit of nuance initially.
I remember that a system nagged about non-reproducible outputs, Blaze (not Bazel, but the internal thing) allowed looking into the outside-world through bad Starlark rules and compile time tricks could get you questioning why there's so much evil in the world.
Maybe Bazel forbid these things right away and Googlers actually talking about Blaze will be inadvertently lying thinking they are similar enough.
valcron1000 16 hours ago [-]
I'm not familiar with Bazel at all so this might be obvious, but does Bazel check that the files listed in the BUILD file are the "right ones" (ex. through a checksum), and if so, is this always enforced (that is, this behavior cannot be disabled)?
dijit 16 hours ago [-]
The contents of files are basically hashed, if the contents don't change of the file listed for a target then no change will happen, even if you modify metadata of the file (like last modified time by `touch` or so on.)
Bazel is really sophisticated and I'd be lying if I said I understood it well, but I have spent time looking at it.
gf000 8 hours ago [-]
No, most compilers are not themselves reproducible, even within very restrictive sandboxes (e.g. they may do some work concurrently and collect the results based on when it completes, then build on top of that. If they don't add a timing-insensitive sorting step, the resulting binary will (assuming no bugs) be functionally equivalent, but may not be bit-by-by equal), and a build tool can only do so much.
k__ 5 hours ago [-]
What are the common issues besides timestamps?
colejohnson66 1 hours ago [-]
Compiler executing internal work concurrently and merging at the end. Thread scheduling changes will cause a different output ordering.
colordrops 16 hours ago [-]
I'm curious, why couldn't packages that are fully reproduceable be marked with metadata, and in your config you set a flag to only allow reproduceable packages? Similar to the nonfree tag.
Then you'd have a 100% reproduceable OS if you have the flag set (assuming that required base packages are reproduceable)
jchw 13 hours ago [-]
You could definitely do that, I think the main thing stopping anyone is simply lack of demand for that specific feature. That, and also it might be hard to keep track of what things are properly reproducible; you can kind of only ever prove for sure that a package is not reproducible. It could be non-deterministic but only produce differences on different CPUs or an infinitesimally small percentage of times. Actually being able to assure determinism would be pretty amazing although I don't know how that could be achieved.
colordrops 13 hours ago [-]
I assume it would be somewhat of a judgement call. I mean that is the case with nonfree packages as well - licenses and whatnot have to be evaluated. I assume that there are no cases of non-trivially large software packages in the wild that have been formally proven to be reproducible, but I could be wrong.
IHLayman 17 hours ago [-]
How this article discusses reproducibility in NixOS and declines to even mention the intensional model or efforts to implement it are surprising to me, since it appears they have done a lot of research into the matter.
If you don’t know, the intensional model is an alternative way to structure the NixOS store so that components are content-addressable (store hash is based on the targets) as opposed to being addressed based on the build instructions and dependencies. IIUC, the entire purpose of the intensional model is to make Nix stores shareable so that you could just depend on Cachix and such without the worry of a supply-chain attack. This approach was an entire chapter in the Nix thesis paper (chapter 6) and has been worked on recently (see https://github.com/NixOS/rfcs/pull/62 and https://github.com/NixOS/rfcs/pull/17 for current progress).
rssoconnor 16 hours ago [-]
I'll repeat my comment from last time this came up.[0]
I could be wrong (and I probably am) but I feel like the term "reproducible build" has shifted/solidified since 2006 when Dolstra's thesis was first written (which itself doesn't really use that term all that much). As evidence the first wikipedia page on "Reproducible builds" seems to have appeared in 2016, a decade after Dolstra's thesis, and even that stub from 2016 appears to prefer to use the term "Deterministic compilation".
Anyhow, when the Nix project originally spoke about "reproducible builds", what I understood was meant by that term was "being able to repeat the same build steps with the same inputs". Because of the lack of determinstic compilation, this doesn't always yield bit-by-bit identical outputs, but are simply presumed to be "functionally identical". There is, of course, no reason to believe that they will necessarily be functionally identical, but it is what developers take for granted every day, and if otherwise would be considered a bug somewhere in the package.
With Nix, when some software "doesn't work for me, but works for you", we can indeed recursively compare the nix derivation files locating and eliminating potential differences, a debugging process I have used on occasion.
I agree that "reproducible builds" now means something different, but that isn't exactly the fault of Nix advocates. I guess a new term for "being able to repeat the same build steps with the same inputs" is needed.
> Our most important finding is that the reproducibility rate in nixpkgs has increased steadily from 69% in 2017 to about 91% in April 2023. The high reproducibility rate in our most recent revision is quite impressive, given both the size of the package set and the absence of systematic monitoring in nixpkgs.
That's one way to read the statistic. Another way you could read the graph is that they still have about the same number (~5k) of non-reproducible builds, which has been pretty constant over the time period. Adding a bunch of easily reproducible additional builds maybe doesn't make me believe it's solving the original issues.
> We knew that it was possible to achieve very good reproducibility rate in smaller package sets like Debian, but this shows that achieving very high bitwise reproducibility is possible at scale, something that was believed impossible by practitioners.
Maybe I miss some nuance here, but why is Debian written off as being so much smaller scale? The top end of the graph here suggests a bit over 70k packages, Debian apparently also currently has 74k packages available (https://www.debian.org/doc/manuals/debian-reference/ch02.en....); I guess there's maybe a bit of time lag here but I'm not sure that is enough to claim Debian is somehow not "at scale".
gf000 7 hours ago [-]
This is not really a Nix-issue to begin with.
It's a bit like asking what percentage of Nix-packaged programs have Hungarian translation -- if Nix packages some more stuff the rate might decrease, but it's not Nix's task to add that to the programs that lack it.
Nix does everything in its power to provide a sandboxed environment in which builds can happen. Given the way hardware works, there are still sources of non-determinism that are impossible to prevent, most importantly timing.
Most programs depend on it, even compilers, and extra care should be taken by them to change that. The only way to prevent it would be to go full-on CPU and OS emulation, but that would be prohibitively expensive.
genewitch 4 hours ago [-]
> The only way to prevent it would be to go full-on CPU and OS emulation, but that would be prohibitively expensive.
How so?
For ref I used a Gentoo distcc chroot inside a devuan VM to bootstrap gentoo on a 2009 netbook. It worked fine. I did this around Halloween.
orbital-decay 55 minutes ago [-]
OS scheduling is non-deterministic, and there are quite a few things that are sensitive to the order of operations (simplest example: floating point addition). Is you want to guarantee determinism, not just provide it on a best effort basis for things that are willing to cooperate, the only way do that is to put everything into a fully deterministic emulator, which is terribly slow.
gf000 2 hours ago [-]
A compiler invoked twice on the same source file is not mandated to produce the same binary, but it should produce a binary with the same functionality.
There are infinite number of binaries that do the same thing (e.g. just padding random zeros in certain places wouldn't cause a functional problem).
Nix is very good at doing functionally reproducible builds, that's its whole thing. But there are build steps which are simply not deterministic, and they might produce correct, but not always the same outputs.
Although I'm aware many distros care somewhat about reproducible builds these days, I tend to associate it primarily with Guix System, I never really considered it a feature of NixOS, having used both (though spent much more time on Guix System now).
For the record, even in the land of Guix I semi-regularly see reports on the bug-guix mailing list that some package isn't reproducible. It seems to get treated as a bug and fixed then. With that in mind, and personally considering Guix kind of the flagship of these efforts, it doesn't surprise me if anyone else doesn't have perfectly reproducible builds yet either. Especially Nix with the huge number of things in nixpkgs. It's probably easier for stuff to fall through the cracks with that many packages to manage.
est31 3 hours ago [-]
Note that NixOS's "build" step often actually doesn't do any compilation. Often it's just downloading a binary from github releases and runs NixOS's specific binary tools on it to make it look for libraries in the right places.
So if that process is reproducible, it's a different statement from a Debian package being reproducible, which requires build inputs in the preferred form of modification (source code).
lrvick 7 hours ago [-]
I would note that stagex is 100% reproducible, and full source bootstrapped.
Every artifact is reproduced and signed by multiple maintainers on independently controlled hardware and this has been the case since our first release around this time last year.
I work on a matching decomp project that has tooling to recompile C into binaries matching a 28 year old game.
In the final binaries created by compiled with gcc 2.6.3 and assembled with a custom assembler there appear to be unused, uninitialized data that is whatever was in RAM when whoever compiled the game created the release build.
Since the goal is a matching (reproducible) binary, we have tools to restore that random data at specific offsets. Fortunately our targets are fixed
fngjdflmdflg 16 hours ago [-]
What even causes this to happen? ie. what dev tool would add random data from RAM to a binary? Is this likely a bug or is there some reason for it like needing to reach a specific file size somewhere?
aidenn0 14 hours ago [-]
Simply calling write() on a C struct can do that, if there is any padding in the struct. Then, of course, there are bugs.
dezgeg 14 hours ago [-]
By accidentally writing out uninitialized memory contents to the file with the game still working. It's even worse in DOS era where there is no memory protection so uninitialized memory can contain data used by other processes, so for example parts of source code can get leaked that way. There's a big list of those in https://tcrf.net/Category:Games_with_uncompiled_source_code
jonhohle 9 hours ago [-]
Yeah, this was originally all DOS and Windows 3.1 utilities for writing programs that would run on MIPS. The data is small enough that it isn’t relevant, just not reproducible through standard build tools because it was never meant to be bitwise reproducible.
tuananh 14 hours ago [-]
please do write more about it.
jonhohle 9 hours ago [-]
We use a tool named dirt-patcher[0], which was written for the project. It lets you write arbitrary bytes at specified offsets[1].
As far as we know at this time, they’re just uninitialized bytes that would have been padding for alignment or other reasons anyway. Maybe if we move to an official build tool chain we’ll find they are deterministic, but for now, we believe they are garbage that happened to make it into the final binary.
Is anyone actually implementing the concept of checking hashes with trusted builders? This is all wasted effort if that isn't needed.
I've seen it pointed out (by mjg59, perhaps?) that if you have a trusted builder, why don't you just use their build? That seems to be the actual model in practice.
Reproducibility seems only to be useful if you have a pool of mostly trustworthy builders and somehow want to build a consensus out of that. Which I suppose is useful for a distributed community but does seem like a stretch for the amount of work going in to reproducible builds.
sublimefire 3 hours ago [-]
> is useful for a distributed community but does seem like a stretch for the amount of work going in to reproducible builds
Good point but even in the case of a larger monolithic systems you want to be sure it is possible to forensically analyze your source, to audit it. Once you can trust that one hash relates to this specific thing you can sign it, etc. This can then be "sold" with some added value of trust down the stream. Tracking of hashes also becomes easier once they are reproducible because they mean much more than just a "version".
__MatrixMan__ 11 hours ago [-]
> if you have a trusted builder, why don't you just use their build
Pardon my tinfoil hat, but doing this would make them a high-value target. If I like them enough to trust their builds, I probably also like them enough to avoid focusing the attentions of the bad guys on them.
Better would be to have a lot of trusted builders all comparing hashes... like, every NixOS user you know (and also the ones they know) so that there's nobody in particular to target.
Timber-6539 3 hours ago [-]
That's no different from how NixOS does it. You are still comparing hashes from the first build done by the distribution. A more pure approach would be to use the source code files (simple sha256sum will suffice) as the first independent variable in the chain of trust.
__MatrixMan__ 44 minutes ago [-]
I'm not sure what you mean. It's your machine that calculates the hashes when it encounters the code.
If you bulld the directed graph made by the symlinks in the nix store, and walk it backwards, a sha256 of the source files is what you'll find, both in the form of a nix store path and possibly in a derivation that relies on a remote resource but provides a hash of that resource so we can know it's unchanged when downloaded later.
The missing piece is that they're not gossipped between users. So if I find some code in a dark alley somewhere and it has a nix flake to make building it easy, I've got no way to take the hashes and determine who else has experience with the same code and can help me decide if it's trustworthy.
There is also an additional benefit to reproducible builds, where getting the same output every time could help avoiding certain regressions. For instance, if GitHub actions performs extensive testing on a particular executable. Then you want to be able to get the exact same executable in the future, not one that is slightly different.
c0balt 16 hours ago [-]
> is anyone actually implementing [..]
Not for NixOS as far as I can tell. You only have this for source derivations where a hash is (usually in a PR) submitted and must be reproducable in CI. This specific example however has the problem that linkrot can be hard to detect unless you regularly check upstream sources.
0x457 18 hours ago [-]
IIRC any package that uses Java isn't reproducible because system time and fixing it to epoch permamently causes issues in some application builds.
* there're maven and gradle plugins to make builds reproducible.
yjftsjthsd-h 18 hours ago [-]
Can you force it to some time other than 0? Ex. I've seen some packages force timestamps to the git commit timestamp, which is nice but still fixed.
IME Erlang was like this ~8 years ago (the last time I touched it) but things may have changed since then.
arjvik 16 hours ago [-]
What issues? I'm not aware of any Java build process that checks timestamps.
paulddraper 14 hours ago [-]
JARs are archives, and archives have timestamps.
You can remove those with some extra work.
gf000 7 hours ago [-]
Just add a post-process step that sets the output artifacts' timestamps (including its content)?
Wouldn't that work?
layer8 17 hours ago [-]
Can you elaborate on the root causes?
GeezNuts 12 hours ago [-]
Certainly not all packages are reproducible, but the project is systematically increasing the percentage of projects that are reproducible while ALSO adding new projects and demonstrating conclusively that what was considered infeasible is actually very readily achievable.
mhh__ 15 hours ago [-]
I guess this is a tangent, but Nix to me feels like the right idea with the wrong abstraction. I can't explain / it would take a serious bit of genius to come up with an alternative worth switching to.
Has anyone done any better?
zanecodes 14 hours ago [-]
I agree, I feel like Nix is kind of a hack to work around the fact that many build systems (especially for C and C++) aren't pure by default, so it tries to wrap them in a sandboxed environment that eliminates as many opportunities for impurity as it reasonably can.
It's not solving the underlying problem: that build systems are often impure and sometimes nondeterministic. It also tries to solve a bunch of adjacent problems, like providing a common interface for building many different types of package, providing a common configuration language for builds as well as system services and user applications in the case of NixOS and home-manager, and providing end-user CLI tools to manage the packages built with it. It's trying to be a build wrapper, a package manager, a package repository, a configuration language, and more.
throwawayqqq11 5 hours ago [-]
Purity becomes a hard goal when ever you hit the real world at build or runtime. By definition, you have to bridge 2 domains.
Imagine constant time compute and constant memory constrains, required in cryptography, being applied to the nix ecosystem.
Yes, this is an artifical example but it shows that purity is harder to define and come by, then some people think. Maybe someday these constraints actually do apply to nix goal of reproducibility.
With ever changing hardware that purity is a moving target so nix imo will always be an approach to purity and bundling so much tooling is to be expected. Still, you can legitimately call it a hack :)
gf000 7 hours ago [-]
> It's not solving the underlying problem: that build systems are often impure and sometimes nondeterministic
It's not Nix's job, imo. Those compilers should be fixed.
And all the other "features" come for free from Nix's fundamental abstractions, I don't feel it would overstep its boundaries anywhere.
genewitch 3 hours ago [-]
I have never checked if my c compilers are deterministic, but Gentoo has tinderbox, and since everything that has an emake or whatever has a sha hash; this means if I use the exact same sha hashed source as the tinderbox binary, I should get a bitwise equal binary output myself. I of course imply all of the toolchain is using sha hash verified source output.
In Gentoo
`emerge -e <package name>` will do it, add binpkgs if you know what you're doing (I do, and I do).
Ericson2314 15 hours ago [-]
You can use the low level stuff without the language to forge your own journey.
>As part of my PhD, and under the supervision of Théo Zimmermann and Stefano Zacchiroli, I have empirically studied bitwise build reproducibility in nixpkgs over a time period of 6 years.
Why spend only 6 years on the most interesting topic of all mankind? I spent 10 years analyzing this.
__MatrixMan__ 11 hours ago [-]
> there exist no reproducibility monitoring at the scale of the Nix package set (nixpkgs)
I think it would be fairly easy to do this monitoring with a bit of community participation. At least I'd enable telemetry ¯\_(ツ)_/¯.
By default the pkz57 in /nix/store/pkz57...-nushell-0.97.1 is a hash of the build inputs for that package. If you hash the contents of that dir, you get an identifier for the build output.
If we then make a big list of such pairs as built by different people on different machines at different times, and capture the frequency of each pair we'll either see this:
The former being an indicator that the build is reproducible, and the latter giving hints about why not (supposing these users are willing to share a bit more about the circumstances of that build). I'd call it a "build chromatograph". I expect that knowing whether you're one of the 100 or the odd-man-out could be relevant for certain scenarios.
gf000 7 hours ago [-]
I'm not sure the "distribution" would be all that helpful.
A single compiler that does some parallel work and collect the results of that work in a list in order of completion (and similar) are probably the most common cause of non-determinism.
Given that, your chromatograph would be mostly "determined" by the source code's peculiarities, instead of the offending compiler itself. (E.g. I have n classes a compiler would process in parallel, so given a single point of timing non-determinism n! different combinations could possibly exist (assuming they all cause a different outputs). The only information I could conclude from such a distribution is that there is a common sequence of completing tasks).
But your idea is cool, and simply reporting back non-matching local builds would be helpful (I believe the binary property of whether a different output could be built is the only relevant fact) -- also, if we were to mark a package as non-reproducible, we could recursively mark everything else that has it as a (transitive) input.
jf 16 hours ago [-]
Aside from this being a great article with lots of interesting details, it's also a rare example of a headline that does NOT follow "Betteridge's law of headlines"
Ericson2314 15 hours ago [-]
I got scared and then I was unexpected releaved!
(-- A Nix maintainer)
tuananh 14 hours ago [-]
so looks like reproducibility rate of nixos is not that high, roughly similar with debian?
In my case, I define, "reproducible," to mean, "immutable." After a few days of testing, I broke NixOS. Simple test was swapping different Desktop Environments, eventually broke Nix, thus I'm not at the point where I'd agree with Nix being truly reproducible, at least not in that context :(
bsimpson 16 hours ago [-]
One problem is that the applications themselves are impure.
Just running KDE litters a bunch of dotfiles into your user folder, even for settings you didn't adjust. This is true for many applications.
If you had an empty home folder and passively tried a handful of desktops, you'd no longer have an empty home folder. Hopefully your environment is resilient to clutter being leaked into your home folder, but if your filesystem isn't truly immutable, rolling back to a particular Nix config might not get you the exact state your system was in when you first built that.
There's a project that wipes all local changes when you restart your machine, with the goal of making Nix systems more reproducible. I think it's called Impermanence.
alfiedotwtf 16 hours ago [-]
I do all my stuff in temporary docker containers, and when I’m done, the container gets blown away.
If the point of Nix is to keep a filesystem immutable as long as every app sticks to certain rules, is it actually the right till for the job?
Sorry… I actually don’t know much about Nix given I’ve been using VMs and now containers for over a decade, so just trying to understand the problem that nix actually solves
tombert 13 hours ago [-]
I do something similar with Nix Flakes for a lot of my applications. I get my stuff working in a Flake, then I execute it with `nix run`; this is an ephemeral thing; once I kill the app then it's unlinked and can be garbage collected eventually.
It can still write to folders, so it's not completely silo'd off like a full-on Docker container, but I still really like it.
alfiedotwtf 16 hours ago [-]
Just chatgpt’d it. I see… what I’m thinking about more was NixOS. Ok, I think I see how it could work, but I’d apps aren’t really isolated, then couldn’t a system still get to a broken if it spills out?
At the moment I’m using Ansible for the host and Docker for guests, but I see NixOS as combining these two lasers so everything just runs on the host? Is that fair to say how NixOS works? If so and I have it wrong, maybe I should check it out and I’ve been sleeping on Nix all this time
c0balt 16 hours ago [-]
A "normal" NixOS system will only give you a full sandboxed isolation for apps at build time and not a runtime. nixpkgs (the thing packaging the stuff for NixOS) provides packages for apps similar to Debian afterwards and not flatpak in terms of runtime isolation (if I understand your use case).
My recommendation would be to test it out and look at how it does things. Maybe checkout the live installer with a gui to get a feel for a desktop system.
alfiedotwtf 16 hours ago [-]
Hmm.. nix-shell and “nix develop” do look interesting!
Edit: ok I HAVE been sleeping on NixOS! I couldn’t understand how isolation worked with /etc files, but it turns out /etc is not modified but you do it all through modifying the nix config and rebuild the system which generates /etc! Ok, super interesting
tmnvdb 16 hours ago [-]
Those things are not the same though. Reproducible just means it will break again if you configure your system in the same way.
I think people in this thread are focusing on the wrong thing. Sure, not all packages are reproducible, but the project is systematically increasing the percentage of projects that are reproducible while ALSO adding new projects and demonstrating conclusively that what was considered infeasible is actually readily achievable.
> The interesting aspect of these causes is that they show that even if nixpkgs already achieves great reproducibility rates, there still exists some low hanging fruits towards improving reproducibility that could be tackled by the Nix community and the whole FOSS ecosystem.
This work is helpful I think for the community to tackle the sources of unreproducible builds to push the percentage up even further. I think it also highlights the need for automation to validate that there aren't systematic regressions or regressions in particularly popular packages (doing individual regressions for all packages is a futile effort unless a lot of people volunteer to be part of a distributed check effort).
https://tests.reproducible-builds.org/debian/reproducible.ht...
Nix on its own doesn't fully resolve supply chain concerns about binaries, but it can provide answers to a myriad of other problems. I think most people like Nix reproducibility, and it is marketed as such, for the sake of development: life is much easier when you know for sure you have the exact same version of each dependency, in the exact same configuration. A build on one machine may not be bit-exact to a build on another machine, but it will be exactly the same source code all the way down.
The quest to get every build process to be deterministic is definitely a bigger problem and it will never be solved for all of Nixpkgs. NixOS does have a reproducibility project[1], and some non-trivial amount of NixOS actually is properly reproducible, but the observation that Nixpkgs is too vast is definitely spot-on, especially because in most cases the real issues lie upstream. (and carrying patches for reproducibility is possible, but it adds even more maintainer burden.)
[1]: https://reproducible.nixos.org/
Not least because of unfree and/or binary-blob packages that can't be reproducible because they don't even build anything. As much as Guix' strict FOSS and build-from-source policy can be an annoyance, it is a necessary precondition to achieve full reproducibility from source, i.e. the full-source bootstrap.
In any case, it's all a bit imperfect anyway, since it's from the perspective of the package manager, which can't be absolutely sure there's no blobs. Anyone who follows Linux-libre releases can see how hard it really is to find all of those needles in the haystack. (And yeah, it would be fantastic if we could have machines with zero unfree code and no blobs, but the majority of computers sold today can't meaningfully operate like that.)
I actually believe there's plenty of value in the builds still being reproducible even when blobs are present: you can still verify that the supply chain is not compromised outside of the blobs. For practical reasons, most users will need to stick to limiting the amount of blobs rather than fully eliminating them.
[1]: https://nixos.org/manual/nixpkgs/stable/#sec-meta-license
[2]: https://nixos.org/manual/nixpkgs/stable/#sec-meta-sourceProv...
How so? Bazel produces the same results for the same inputs.
The Nix sandbox does completely obscure the host filesystem and limit network access to processes that can produce a bit-exact output only.
(Bazel also obviously uses the system compilers and headers. Nix does not.)
It's an important constituent, but only complete OS-emulation with deterministic scheduling could (at a huge overhead) actually result in bit-by-bit reproducible artifacts with arbitrary build steps.
There are an endless source of impurities/randomness and most compilers haven't historically cared much about this.
That all said, in practice, many of the cases where Nixpkgs builds are not deterministic are actually fairly trivial. Despite not being a specific goal necessarily, compilers are more deterministic than not, and in practice the sources of non-determinism are fewer than you'd think. Case in point, I'm pretty sure the vast majority of Nixpkgs packages that are bit-for-bit reproducible just kind of are by accident, because nothing in the build is actually non-deterministic. Many of the cases of non-deterministic builds are fairly trivial, such as things just linking in different orders depending on scheduling.
Running everything under a deterministic VM would probably be too slow and/or cumbersome, so I think Nix is the best it's going to get.
Nonetheless, I agree that Nix does the optimum here, full-on emulation would be prohibitively expensive.
The point is that Nix will catch a lot more of them than Bazel does, since Nix manages the toolchain used to build, whereas Bazel just runs the host system cc.
This does actually exist; check out antithesis's product. I'm not sure how much is public information but their main service is a deterministic (...I'm not sure to what extent this is true, but that was the claim I heard) many-core vm on which otherwise difficult testing scenarios can be reproduced (clusters, databases, video games, maybe even kernels?) to observe bugs that only arise in extremely difficult to reproduce circumstances.
It does seem like overkill just to get a marginally more reproducible build system, though.
Bazel absolutely prevents network access and filesystem access (reads) from builds. (only permitting explicit network includes from the WORKSPACE file, and access to files explicitly depended on in the BUILD files).
Maybe you can write some “rules_” for languages that violate this, but it is designed purposely to be hermetic and bit-perfect reproducible.
EDIT:
From the FAQ[0]:
> Will Bazel make my builds reproducible automatically?
> For Java and C++ binaries, yes, assuming you do not change the toolchain.
The issues with Docker's style of "reproducible" (meaning.. consistent environment; are also outlined in the same FAQ[1]
> Doesn’t Docker solve the reproducibility problems?
> Docker does not address reproducibility with regard to changes in the source code. Running Make with an imperfectly written Makefile inside a Docker container can still yield unpredictable results.
[0]: https://bazel.build/about/faq#will_bazel_make_my_builds_repr...
[1]: https://bazel.build/about/faq#doesn’t_docker_solve_the_repro...
Check out the previous discussion at https://news.ycombinator.com/item?id=23184843 and below:
> Under the hood there's a default auto-configured toolchain that finds whatever is installed locally in the system. Since it has no way of knowing what files an arbitrary "cc" might depend on, you lose hermeticity by using it.
https://bazel.build/docs/sandboxing
(Actually, it can: that documentation suggests it's optionally supported, at least on the Linux sandbox. That said, it's optional. There's definitely actions that use the network on purpose and can't participate in this.)
This may seem pointless, because in many situations this would only matter in somewhat convoluted cases. In C++ the toolchain probably won't connect to the network. This isn't the case for e.g. Rust, where proc macros can access the network. (In practical terms, I believe the sqlx crate does this, connecting to a local Postgres instance to do type inference.) Likewise, you could do an absolute file inclusion, but that would be very much on purpose and not an accident. So it's reasonable to say that you get a level of reproducibility when you use Bazel for C++ builds...
Kind of. It's not bit-for-bit because it uses the system toolchain, which is just an arbitrary choice. On Darwin it's even more annoying: with XCode installed via Mac App Store, the XCode version can change transparently under Bazel in the background, entirely breaking the hermeticity, and require you to purge the Bazel cache (because the dependency graph will be wrong and break the build. Usually.)
Nix is different. The toolchain is built by Nix and undergoes the same sandboxed build process with sandboxing and cryptographically verified inputs. Bazel does not do that.
There are mechanisms for opting out/breaking that, just as with Nix or any other system.
> macOS
What does nix do on these systems?
Ultimately, I still think that Nix provides a greater degree of isolation and reproducibility than Bazel overall, and especially out of the box, but I was definitely incorrect when I said that Bazel's sandbox doesn't/can't block the network. I did dive a little deeper into the nuances in another comment.[1]
> What does nix do on these systems?
On macOS, Nix is not exactly as solid as it is on Linux. It uses sandbox-exec for sandboxing, which achieves most of what the Nix sandbox does on Linux, except it disallows all networking rather than just isolated networking. (Some derivations need local network access, so you can opt-in to having local network access per-derivation. This still doesn't give Internet access, though: internet access still requires a fixed-output derivation.) There's definitely some room for improvement there but it will be hard to do too much better since xnu doesn't have anything similar to network namespaces afaik.
As for the toolchain, I'm not sure how the Nix bootstrap works on macOS. It seems like a lot of effort went in to making it work and it can function without XCode installed. (Can't find a source for this, but I was using it on a Mac Mini that I'm pretty sure didn't have XCode installed. So it clearly has its own hermetic toolchain setup just like Linux.)
[1]: https://news.ycombinator.com/item?id=43032285
The very doc you link hints at that, while also giving many caveats where the build will become non-reproducible. So it boils down to “yes, but only if you configure it correctly and do things right”.
Meanwhile, if you want to forcibly block network access for a specific action, you can pass `block-network` as an execution requirement[2]. You can also explicitly block network access with flags, using --nosandbox_default_allow_network[3]. Interestingly though, an action can also `require-network` to bypass this, and I don't think there's any way to account for that.
Maybe more importantly, Bazel lacks the concept of a fixed-output action, so when an impure action needs `require-network` the potentially-impure results could impact downstream dependents of actions.
I was still ultimately incorrect to say that Bazel's sandbox can't sandbox the network. The actual reality is that it can. If you do enable the sandbox, while it's not exactly pervasive through the entire ecosystem, it does look like a fair number of projects at least set the `block-network` tag--about 700 as of writing this[4]. I think the broader point I was making (that Nix adheres to a stronger standard of "hermetic" than Bazel) is ultimately true, but I did miss on a bit of nuance initially.
[1]: https://bazel.build/docs/user-manual#spawn-strategy
[2]: https://bazel.build/reference/be/common-definitions#common.t...
[3]: https://bazel.build/reference/command-line-reference#flag--s...
[4]: https://github.com/search?q=language%3Abzl+%22block-network%...
Maybe Bazel forbid these things right away and Googlers actually talking about Blaze will be inadvertently lying thinking they are similar enough.
Bazel is really sophisticated and I'd be lying if I said I understood it well, but I have spent time looking at it.
Then you'd have a 100% reproduceable OS if you have the flag set (assuming that required base packages are reproduceable)
If you don’t know, the intensional model is an alternative way to structure the NixOS store so that components are content-addressable (store hash is based on the targets) as opposed to being addressed based on the build instructions and dependencies. IIUC, the entire purpose of the intensional model is to make Nix stores shareable so that you could just depend on Cachix and such without the worry of a supply-chain attack. This approach was an entire chapter in the Nix thesis paper (chapter 6) and has been worked on recently (see https://github.com/NixOS/rfcs/pull/62 and https://github.com/NixOS/rfcs/pull/17 for current progress).
I could be wrong (and I probably am) but I feel like the term "reproducible build" has shifted/solidified since 2006 when Dolstra's thesis was first written (which itself doesn't really use that term all that much). As evidence the first wikipedia page on "Reproducible builds" seems to have appeared in 2016, a decade after Dolstra's thesis, and even that stub from 2016 appears to prefer to use the term "Deterministic compilation".
Anyhow, when the Nix project originally spoke about "reproducible builds", what I understood was meant by that term was "being able to repeat the same build steps with the same inputs". Because of the lack of determinstic compilation, this doesn't always yield bit-by-bit identical outputs, but are simply presumed to be "functionally identical". There is, of course, no reason to believe that they will necessarily be functionally identical, but it is what developers take for granted every day, and if otherwise would be considered a bug somewhere in the package.
With Nix, when some software "doesn't work for me, but works for you", we can indeed recursively compare the nix derivation files locating and eliminating potential differences, a debugging process I have used on occasion.
I agree that "reproducible builds" now means something different, but that isn't exactly the fault of Nix advocates. I guess a new term for "being able to repeat the same build steps with the same inputs" is needed.
[0]https://news.ycombinator.com/item?id=41953155
That's one way to read the statistic. Another way you could read the graph is that they still have about the same number (~5k) of non-reproducible builds, which has been pretty constant over the time period. Adding a bunch of easily reproducible additional builds maybe doesn't make me believe it's solving the original issues.
> We knew that it was possible to achieve very good reproducibility rate in smaller package sets like Debian, but this shows that achieving very high bitwise reproducibility is possible at scale, something that was believed impossible by practitioners.
Maybe I miss some nuance here, but why is Debian written off as being so much smaller scale? The top end of the graph here suggests a bit over 70k packages, Debian apparently also currently has 74k packages available (https://www.debian.org/doc/manuals/debian-reference/ch02.en....); I guess there's maybe a bit of time lag here but I'm not sure that is enough to claim Debian is somehow not "at scale".
It's a bit like asking what percentage of Nix-packaged programs have Hungarian translation -- if Nix packages some more stuff the rate might decrease, but it's not Nix's task to add that to the programs that lack it.
Nix does everything in its power to provide a sandboxed environment in which builds can happen. Given the way hardware works, there are still sources of non-determinism that are impossible to prevent, most importantly timing. Most programs depend on it, even compilers, and extra care should be taken by them to change that. The only way to prevent it would be to go full-on CPU and OS emulation, but that would be prohibitively expensive.
How so?
For ref I used a Gentoo distcc chroot inside a devuan VM to bootstrap gentoo on a 2009 netbook. It worked fine. I did this around Halloween.
There are infinite number of binaries that do the same thing (e.g. just padding random zeros in certain places wouldn't cause a functional problem).
Nix is very good at doing functionally reproducible builds, that's its whole thing. But there are build steps which are simply not deterministic, and they might produce correct, but not always the same outputs.
That seems to be the crux of it.
For the record, even in the land of Guix I semi-regularly see reports on the bug-guix mailing list that some package isn't reproducible. It seems to get treated as a bug and fixed then. With that in mind, and personally considering Guix kind of the flagship of these efforts, it doesn't surprise me if anyone else doesn't have perfectly reproducible builds yet either. Especially Nix with the huge number of things in nixpkgs. It's probably easier for stuff to fall through the cracks with that many packages to manage.
So if that process is reproducible, it's a different statement from a Debian package being reproducible, which requires build inputs in the preferred form of modification (source code).
Every artifact is reproduced and signed by multiple maintainers on independently controlled hardware and this has been the case since our first release around this time last year.
https://codeberg.org/stagex/stagex
In the final binaries created by compiled with gcc 2.6.3 and assembled with a custom assembler there appear to be unused, uninitialized data that is whatever was in RAM when whoever compiled the game created the release build.
Since the goal is a matching (reproducible) binary, we have tools to restore that random data at specific offsets. Fortunately our targets are fixed
As far as we know at this time, they’re just uninitialized bytes that would have been padding for alignment or other reasons anyway. Maybe if we move to an official build tool chain we’ll find they are deterministic, but for now, we believe they are garbage that happened to make it into the final binary.
0 - https://github.com/Xeeynamo/sotn-decomp/blob/master/tools/di...
1 - https://github.com/Xeeynamo/sotn-decomp/blob/master/config/d...
I've seen it pointed out (by mjg59, perhaps?) that if you have a trusted builder, why don't you just use their build? That seems to be the actual model in practice.
Reproducibility seems only to be useful if you have a pool of mostly trustworthy builders and somehow want to build a consensus out of that. Which I suppose is useful for a distributed community but does seem like a stretch for the amount of work going in to reproducible builds.
Good point but even in the case of a larger monolithic systems you want to be sure it is possible to forensically analyze your source, to audit it. Once you can trust that one hash relates to this specific thing you can sign it, etc. This can then be "sold" with some added value of trust down the stream. Tracking of hashes also becomes easier once they are reproducible because they mean much more than just a "version".
Pardon my tinfoil hat, but doing this would make them a high-value target. If I like them enough to trust their builds, I probably also like them enough to avoid focusing the attentions of the bad guys on them.
Better would be to have a lot of trusted builders all comparing hashes... like, every NixOS user you know (and also the ones they know) so that there's nobody in particular to target.
If you bulld the directed graph made by the symlinks in the nix store, and walk it backwards, a sha256 of the source files is what you'll find, both in the form of a nix store path and possibly in a derivation that relies on a remote resource but provides a hash of that resource so we can know it's unchanged when downloaded later.
The missing piece is that they're not gossipped between users. So if I find some code in a dark alley somewhere and it has a nix flake to make building it easy, I've got no way to take the hashes and determine who else has experience with the same code and can help me decide if it's trustworthy.
maintainers build the packages, other people check: https://wiki.archlinux.org/title/Rebuilderd#Package_rebuilde...
Not for NixOS as far as I can tell. You only have this for source derivations where a hash is (usually in a PR) submitted and must be reproducable in CI. This specific example however has the problem that linkrot can be hard to detect unless you regularly check upstream sources.
* there're maven and gradle plugins to make builds reproducible.
https://bugs.openjdk.org/browse/JDK-8264449 https://reproducible-builds.org/docs/source-date-epoch/
You can remove those with some extra work.
Wouldn't that work?
Has anyone done any better?
It's not solving the underlying problem: that build systems are often impure and sometimes nondeterministic. It also tries to solve a bunch of adjacent problems, like providing a common interface for building many different types of package, providing a common configuration language for builds as well as system services and user applications in the case of NixOS and home-manager, and providing end-user CLI tools to manage the packages built with it. It's trying to be a build wrapper, a package manager, a package repository, a configuration language, and more.
Imagine constant time compute and constant memory constrains, required in cryptography, being applied to the nix ecosystem.
Yes, this is an artifical example but it shows that purity is harder to define and come by, then some people think. Maybe someday these constraints actually do apply to nix goal of reproducibility.
With ever changing hardware that purity is a moving target so nix imo will always be an approach to purity and bundling so much tooling is to be expected. Still, you can legitimately call it a hack :)
It's not Nix's job, imo. Those compilers should be fixed.
And all the other "features" come for free from Nix's fundamental abstractions, I don't feel it would overstep its boundaries anywhere.
In Gentoo
`emerge -e <package name>` will do it, add binpkgs if you know what you're doing (I do, and I do).
https://github.com/NixOS/nix/blob/master/doc/manual/source/s...
I am working on the docs for this as we speak.
Why spend only 6 years on the most interesting topic of all mankind? I spent 10 years analyzing this.
I think it would be fairly easy to do this monitoring with a bit of community participation. At least I'd enable telemetry ¯\_(ツ)_/¯.
By default the pkz57 in /nix/store/pkz57...-nushell-0.97.1 is a hash of the build inputs for that package. If you hash the contents of that dir, you get an identifier for the build output.
If we then make a big list of such pairs as built by different people on different machines at different times, and capture the frequency of each pair we'll either see this:
or we'll see this: The former being an indicator that the build is reproducible, and the latter giving hints about why not (supposing these users are willing to share a bit more about the circumstances of that build). I'd call it a "build chromatograph". I expect that knowing whether you're one of the 100 or the odd-man-out could be relevant for certain scenarios.A single compiler that does some parallel work and collect the results of that work in a list in order of completion (and similar) are probably the most common cause of non-determinism.
Given that, your chromatograph would be mostly "determined" by the source code's peculiarities, instead of the offending compiler itself. (E.g. I have n classes a compiler would process in parallel, so given a single point of timing non-determinism n! different combinations could possibly exist (assuming they all cause a different outputs). The only information I could conclude from such a distribution is that there is a common sequence of completing tasks).
But your idea is cool, and simply reporting back non-matching local builds would be helpful (I believe the binary property of whether a different output could be built is the only relevant fact) -- also, if we were to mark a package as non-reproducible, we could recursively mark everything else that has it as a (transitive) input.
(-- A Nix maintainer)
https://wiki.debian.org/ReproducibleBuilds
But what did it cost? Usability.
Just running KDE litters a bunch of dotfiles into your user folder, even for settings you didn't adjust. This is true for many applications.
If you had an empty home folder and passively tried a handful of desktops, you'd no longer have an empty home folder. Hopefully your environment is resilient to clutter being leaked into your home folder, but if your filesystem isn't truly immutable, rolling back to a particular Nix config might not get you the exact state your system was in when you first built that.
There's a project that wipes all local changes when you restart your machine, with the goal of making Nix systems more reproducible. I think it's called Impermanence.
If the point of Nix is to keep a filesystem immutable as long as every app sticks to certain rules, is it actually the right till for the job?
Sorry… I actually don’t know much about Nix given I’ve been using VMs and now containers for over a decade, so just trying to understand the problem that nix actually solves
It can still write to folders, so it's not completely silo'd off like a full-on Docker container, but I still really like it.
At the moment I’m using Ansible for the host and Docker for guests, but I see NixOS as combining these two lasers so everything just runs on the host? Is that fair to say how NixOS works? If so and I have it wrong, maybe I should check it out and I’ve been sleeping on Nix all this time
My recommendation would be to test it out and look at how it does things. Maybe checkout the live installer with a gui to get a feel for a desktop system.
Edit: ok I HAVE been sleeping on NixOS! I couldn’t understand how isolation worked with /etc files, but it turns out /etc is not modified but you do it all through modifying the nix config and rebuild the system which generates /etc! Ok, super interesting