Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/networking: don't add hostname / FQDN entires to /etc/hosts #380987

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lschuermann
Copy link
Member

PR #362132 1 removed multiple /etc/hosts entires mapping to the same IPv6 loopback address (::1). However, a mapping from a machine's hostname and FQDN still exists for a second IPv4 loopback address (127.0.0.2).

This causes issues when using systemd's resolved: when any /etc/hosts entry exists for a domain name being queried, systemd-resolved only returns entires from the /etc/hosts database, and no other records that can be obtained from DNS. In practice, this means that a host trying to resolve its own FQDN (which is further configured in the system's NixOS configuration through the networking.hostName and networking.domain settings) will only return its IPv4 loopback IP (127.0.0.2), but no IPv6 address, regardless of whether a AAAA-record exists.

[root@myserver:~]# cat /etc/hosts
127.0.0.1 localhost
::1 localhost
127.0.0.2 myserver.example.org

[root@myserver:~]# resolvectl query myserver.example.org
myserver.example.org: 127.0.0.2

-- Information acquired via protocol DNS in 6.1ms.
-- Data is authenticated: yes; Data was acquired via local or encrypted transport: yes
-- Data from: synthetic

This pull request removes such default hostname or FQDN hosts-entries entirely. This is similar to the default configurations on AlmaLinux or Fedora systems:

root@myserver:~# cat /etc/hosts
# Loopback entries; do not change.
# For historical reasons, localhost precedes localhost.localdomain:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
# See hosts(5) for proper format and other examples:
# 192.168.1.10 foo.example.org foo
# 192.168.1.13 bar.example.org bar

It does not prevent users from defining any custom such entires for the machine's public IPs, similar to the comment in the file above. We don't have a good idea to determine what the proper public IPv4 and IPv6 addresses are for a machine's FQDN, and as such this decision and responsibility should be on the user.

In a separate commit, this PR adds a test that ensures that hosts can resolve their own v4 and v6 addresses as encoded in A and AAAA records of their upstream DNS server or resolver respectively.

This test fails prior to this PR with a regular NixOS install, which contains an /etc/hosts mapping for FQDN to 127.0.0.2, but no such mapping for an IPv6 loopback IP (as there are no multiple loopback v6 IPs, and each address must only be listed once 1). While, for NixOS test configurations, this test passes, this is only because the test networking module adds additional entries to networking.extraHosts which are not present in a regular NixOS configuration.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

Ping @alyssais @ElvishJerricco @primeos @blitz

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels Feb 10, 2025
fqdn_and_host_name
== machine.succeed("getent hosts 127.0.0.2 | awk '{print $2,$3}'").strip()
)

assert "${fqdn}" == machine.succeed("getent hosts ${hostName} | awk '{print $2}'").strip()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's questionable whether this line should remain. The only reason why this works is that the testing framework's base configuration includes an extraHosts setting that includes this machine's FQDN:

# Put the IP addresses of all VMs in this machine's
# /etc/hosts file. If a machine has multiple
# interfaces, use the IP address corresponding to
# the first interface (i.e. the first network in its
# virtualisation.vlans option).
networking.extraHosts = flip concatMapStrings (attrNames nodes) (
m':
let
config = nodes.${m'};
hostnames =
optionalString (
config.networking.domain != null
) "${config.networking.hostName}.${config.networking.domain} "
+ "${config.networking.hostName}\n";
in
optionalString (
config.networking.primaryIPAddress != ""
) "${config.networking.primaryIPAddress} ${hostnames}"
+ optionalString (config.networking.primaryIPv6Address != "") (
"${config.networking.primaryIPv6Address} ${hostnames}"
)
);

So this is just testing the test framework itself, and not really the network configuration for regular hosts.

This adds a test that ensures that hosts can resolve their own v4 and
v6 addresses as encoded in A and AAAA records of their upstream DNS
server or resolver respectively.

This test fails on this commit with a regular NixOS install, which
contains an `/etc/hosts` mapping for FQDN to 127.0.0.2, but no such
mapping for an IPv6 loopback IP (as there are no multiple loopback v6
IPs, and each address must only be listed once [1]). While for NixOS
test configurations this test passes, this is only because the test
networking module adds additional entries to `networking.extraHosts`
which are not present in a regular NixOS configuration.

[1]: NixOS#362132
PR NixOS#362132 [1] removed multiple `/etc/hosts` entires mapping to the
same IPv6 loopback address (`::1`). However, a mapping from a
machine's hostname and FQDN still exists for a second IPv4 loopback
address (`127.0.0.2`).

This causes issues when using systemd's resolved: when any
`/etc/hosts` entry exists for a domain name being queried,
systemd-resolved only returns entires from the `/etc/hosts` database,
and no other records that can be obtained from DNS. In practice, this
means that a host trying to resolve its own FQDN (which is further
configured in the system's NixOS configuration through the
`networking.hostName` and `networking.domain` settings) will only
return its IPv4 loopback IP (`127.0.0.2`), but _no_ IPv6 address,
regardless of whether a AAAA-record exists.

    [root@myserver:~]# cat /etc/hosts
    127.0.0.1 localhost
    ::1 localhost
    127.0.0.2 myserver.example.org

    [root@myserver:~]# resolvectl query myserver.example.org
    myserver.example.org: 127.0.0.2

    -- Information acquired via protocol DNS in 6.1ms.
    -- Data is authenticated: yes; Data was acquired via local or encrypted transport: yes
    -- Data from: synthetic

This pull request removes such default hostname or FQDN hosts-entries
entirely. This is similar to the default configurations on AlmaLinux
or Fedora systems:

    root@myserver:~# cat /etc/hosts
    # Loopback entries; do not change.
    # For historical reasons, localhost precedes localhost.localdomain:
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    # See hosts(5) for proper format and other examples:
    # 192.168.1.10 foo.example.org foo
    # 192.168.1.13 bar.example.org bar

It does not prevent users from defining any custom such entires for
the machine's public IPs, similar to the comment in the file above. We
don't have a good idea to determine what the proper public IPv4 and
IPv6 addresses are for a machine's FQDN, and as such this decision and
responsibility should be on the user.

[1]: NixOS#362132
@alyssais
Copy link
Member

I've been trying out this change, and the first thing I've noticed is that the following test, that I've been using to check that applications are able to look up their FQDN, now only passes because of the default virtualisation.vlans:

nixosTest {
  name = "getaddrinfo";

  nodes.machine = {
    environment.systemPackages = [ pkgs.python3 ];

    networking.hostName = "ahost";
    networking.domain = "adomain";
    # virtualisation.vlans = [];

    services.resolved.enable = true;
  };

  testScript = ''
    machine.wait_for_unit('nss-lookup.target')

    machine.succeed("""python3 -c '
    import socket
    for (_, _, _, name, _) in socket.getaddrinfo(socket.gethostname(), None, family=socket.AF_INET, proto=socket.IPPROTO_TCP, flags=socket.AI_CANONNAME):
      if name:
        exit(name != \"ahost.adomain\")
    exit(2)
    ' >&2""")
  '';
}

If I uncomment the virtualisation.vlans line, the test now fails. Previously, it passed regardless. Is this expected?

@lschuermann
Copy link
Member Author

lschuermann commented Feb 11, 2025

If I uncomment the virtualisation.vlans line, the test now fails. Previously, it passed regardless. Is this expected?

I wouldn't go as far as to say it's expected, but it makes sense given the way the config and modules work right now.

Options walkthrough

The qemu-vm module defines the networking.primaryIP{,v6}Address options:

networking.primaryIPAddress = mkOption {
type = types.str;
default = "";
internal = true;
description = "Primary IP address used in /etc/hosts.";
};
networking.primaryIPv6Address = mkOption {
type = types.str;
default = "";
internal = true;
description = "Primary IPv6 address used in /etc/hosts.";
};

It also drives the virtualisation.vlans option by default:

virtualisation.vlans = mkOption {
type = types.listOf types.ints.unsigned;
default = if config.virtualisation.interfaces == { } then [ 1 ] else [ ];

That's then picked up by the testing framework's network config and converted into a set of networking.interfaces attributes:

# Convert legacy VLANs to named interfaces and merge with explicit interfaces.
vlansNumbered = forEach (zipLists config.virtualisation.vlans (range 1 255)) (v: {
name = "eth${toString v.snd}";
vlan = v.fst;
assignIP = true;
});
explicitInterfaces = lib.mapAttrsToList (n: v: v // { name = n; }) config.virtualisation.interfaces;
interfaces = vlansNumbered ++ explicitInterfaces;
interfacesNumbered = zipLists interfaces (range 1 255);
# Automatically assign IP addresses to requested interfaces.
assignIPs = lib.filter (i: i.assignIP) interfaces;
ipInterfaces = forEach assignIPs (
i:
nameValuePair i.name {
ipv4.addresses = [
{
address = "192.168.${toString i.vlan}.${toString config.virtualisation.test.nodeNumber}";
prefixLength = 24;
}
];
ipv6.addresses = [
{
address = "2001:db8:${toString i.vlan}::${toString config.virtualisation.test.nodeNumber}";
prefixLength = 64;
}
];
}
);

Additionally, this let binding is further used to then drive the primaryIP{,v6}Address options:

networking.primaryIPAddress =
optionalString (ipInterfaces != [ ])
(head (head ipInterfaces).value.ipv4.addresses).address;
networking.primaryIPv6Address =
optionalString (ipInterfaces != [ ])
(head (head ipInterfaces).value.ipv6.addresses).address;

And, finally, that's used to create an extraHosts entry:

networking.extraHosts = flip concatMapStrings (attrNames nodes) (
m':
let
config = nodes.${m'};
hostnames =
optionalString (
config.networking.domain != null
) "${config.networking.hostName}.${config.networking.domain} "
+ "${config.networking.hostName}\n";
in
optionalString (
config.networking.primaryIPAddress != ""
) "${config.networking.primaryIPAddress} ${hostnames}"
+ optionalString (config.networking.primaryIPv6Address != "") (
"${config.networking.primaryIPv6Address} ${hostnames}"
)
);

We thus end up with an explicit extraHosts setting for the default vlans = [ 1 ]; configuration...

nix-repl> nixosTests.alyssais-network.nodes.machine.virtualisation.vlans  
[ 1 ]

nix-repl> nixosTests.alyssais-network.nodes.machine.networking.extraHosts
"192.168.1.1 ahost.adomain ahost\n2001:db8:1::1 ahost.adomain ahost\n\n"

...but no such entries when vlans is explicitly overridden:

nix-repl> nixosTests.alyssais-network.nodes.machine.virtualisation.vlans
[ ]

nix-repl> nixosTests.alyssais-network.nodes.machine.networking.extraHosts
"\n"

Now, for me there's two major takeaways from this:

  • I think that, while convenient for other tests, the way that the NixOS testing framework currently drives many parts of the networking.* attributes by default is actively harmful to tests of network-related subsystems. What your example above and my comment shows is that we're often not testing the actual NixOS modules as would be used on a deployed system, but artifacts of the test environment itself.

  • Specially, I think that the qemu-vm.nix module should not introduce the networking.primaryIP{,v6}Address options.

    It actually does seem like a neat concept, and could be used to set up /etc/hosts mappings for a machine's primary IPs outside the test environment (similar to the default hosts file on Debian systems).

    Having a hybrid of "option defined in the system config, but only in an optional module that's not indexed in search.nixos.org, and only driven and used by the test framework itself" just leads to confusion.

The changes of this PR themselves work as expected: a machine will be unable to resolve its own FQDN when neither an explicit hosts entry, nor appropriate DNS records exists. Whether what the qemu-vm.nix module or test infrastructure does makes sense is another question.

@lschuermann
Copy link
Member Author

If I uncomment the virtualisation.vlans line, the test now fails.

@alyssais Put in a different way, with this change your test would always be expected to fail for a regular (non-test) NixOS system that does not have explicit hosts-entries or DNS records for its FQDN set up. It only works because its run in the test environment.

@alyssais
Copy link
Member

Okay, as long as it works when there are DNS entries that seems fine to me.

@lschuermann
Copy link
Member Author

I'm increasingly convinced that adding networking.primaryIP{,v6}Address options to the main networking module, and generating hosts entries based on that and the hostName + domain settings, is a good quality of life improvement.

But that seems separate from what this PR is trying to do, and can easily be achieved by just setting networking.hosts = { ... } manually for now.

@lschuermann
Copy link
Member Author

Okay, as long as it works when there are DNS entries that seems fine to me.

😞 Double checked, it does not.

[root@myserver:~]# cat /etc/hosts
127.0.0.1 localhost
::1 localhost
[root@myserver:~]# python3
>>> import socket
>>> socket.getaddrinfo(socket.gethostname(), None, family=socket.AF_INET, proto=socket.IPPROTO_TCP, flags=socket.AI_CANONNAME)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, 'myserver', ('<public-v4>', 0))]

whereas

[nix-shell:~]# cat /etc/hosts 
127.0.0.1 localhost
::1 localhost
<public-v4> myserver.example.org myserver
<public-v6> myserver.example.org myserver
[nix-shell:~]# python3
>>> import socket
>>> socket.getaddrinfo(socket.gethostname(), None, family=socket.AF_INET, proto=socket.IPPROTO_TCP, flags=socket.AI_CANONNAME)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, 'myserver.example.org', ('<public-v4>', 0))]

If you want getaddrinfo to return the FQDN as the canonical hostname, it seems like an entry in /etc/hosts is required (as it needs to map gethostname() to the canonical name?).

This behavior is identical to that of a fresh, default installation of Fedora 41 Server Edition, with the hostname specified in the installer and public DNS records set up:

root@fed-etchosttest:~# cat /etc/hostname
fed-etchosttest
root@fed-etchosttest:~# cat /etc/hosts
# Loopback entries; do not change.
# For historical reasons, localhost precedes localhost.localdomain:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
# See hosts(5) for proper format and other examples:
# 192.168.1.10 foo.example.org foo
# 192.168.1.13 bar.example.org bar
root@fed-etchosttest:~# python3
>>> import socket
>>> socket.getaddrinfo(socket.gethostname(), None, family=socket.AF_INET, proto=socket.IPPROTO_TCP, flags=socket.AI_CANONNAME)
[(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, 'fed-etchosttest', ('<public-v4>', 0))]
root@fed-etchosttest:~# host <public-v4>
<4v-cilbup>.in-addr.arpa domain name pointer fed-etchosttest.example.org
root@fed-etchosttest:~# host fed-etchosttest.example.org
fed-etchosttest.example.org has address <public-v4>
fed-etchosttest.example.org has IPv6 address <public-v6>

@alyssais
Copy link
Member

Yikes. That makes me a lot less confident this is the right thing. I wonder if we should ask the resolved developers how they expect this all to work…?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 1-10 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants