SYD(7)

# NAME

Overview of sandboxing with Syd

# SANDBOXING

The list of available sandboxing categories is given below:

[< *stat*
:< Confine file metadata accesses. This sandboxing category may be used to
   effectively _hide files and directories_ from the sandbox process. List
   of filtered system calls are _access_(2), _faccessat_(2),
   _faccessat2_(2), _getdents64_(2), _readlink_(2), _readlinkat_(2)
   _stat_(2), _fstat_(2), _lstat_(2), _statx_(2), _newfstatat_(2),
   _getxattr_(2), _getxattrat_(2), _lgetxattr_(2), _fgetxattr_(2),
   _listxattr_(2), _listxattrat_(2), _flistxattr_(2), _llistxattr_(2),
   _statfs_(2), _statfs64_(2), _fstatfs_(2), _fstatfs64_(2),
   _fanotify_mark_(2), and _inotify_add_watch_(2). In addition, paths
   may be masked using the _mask_ command. In this case, all filtered
   system calls on the path will be executed on the character device
   /dev/null instead. See the description of the _mask_ command in
   _syd_(2) manual page for more information.
|< *walk*
:< Confine path traversals. This sandboxing category is used during path
   canonicalization to confine path traversals. As such, its arguments
   are not necessarily fully canonicalized paths but they're guaranteed
   to be absolute paths without any *.* (dot) or *..* (dotdot)
   components. It has been split from the _stat_ category as of version
   3.39.0. Together with the _stat_ category, path hiding provides a
   full implementation resilient against attempts to unhide otherwise
   hidden paths by passing through them during path canonicalization.
   Notably, OpenBSD's _unveil_(2) pioneered similar capabilities and
   remains a widely respected, mature reference implementation.
|< *read*
:< Confine file reads. List of filtered system calls are _open_(2),
   _openat_(2) and _openat2_(2) with the *O_RDONLY* or *O_RDWR* flags.
|< *write*
:< Confine file writes. List of filtered system calls are _open_(2),
   _openat_(2) and _openat2_(2) with the *O_WRONLY* or *O_RDWR* flags.
|< *exec*
:< Confine binary execution and dynamic library loading. The list of
   filtered system calls are _execve_(2), _execveat_(2), _mmap_(2),
   _mmap2_(2), and _memfd_create_(2). For scripts access check is
   done for both the script and the interpreter binary. As of version
   3.16.3, Syd checks the paths of the dynamic libraries an executable is
   linked against for exec access as well. This only works for ELF
   binaries. As of version 3.21.2, Syd seals memory file descriptors as
   non-executable by default, therefore memory file descriptors are not
   checked for exec access unless the option _trace/allow_unsafe_memfd:1_
   is set to lift this restriction. As of version 3.21.3, Syd hooks into
   _mmap_(2) and _mmap2_(2) system calls and checks the file descriptor for
   exec access when the memory protection mode includes *PROT_EXEC* and
   flags does not include *MAP_ANONYMOUS* which typically indicates a
   _dlopen_(3). Therefore, libraries dynamically loaded at runtime are
   checked for exec access as well. In addition, *SegvGuard* is used to
   deny execution if binary is crashing repeatedly which is similar to the
   implementation of Grsecurity & HardenedBSD. See the *SegvGuard* section
   for more information.
|< *ioctl*
:< Confine _ioctl_(2) requests. Use *lock/ioctl* to confine _ioctl_(2)
   system call for filesystem access. This feature may be used to
   effectively access GPU, PTY, DRM, and KVM etc. safely. _ioctl_(2)
   requests may be allowed or denied by adding them to the respective list
   using the options _allow/ioctl+_ and _deny/ioctl+_. As of version
   3.38.0, architecture-agnostic _ioctl_(2) decoding was introduced,
   allowing ioctls to be specified by name in addition to numeric values.
   See the _syd_(2) manual page for more information.
|< *create*
:< Confine creation of regular files and memory file descriptors. List
   of filtered system calls are _creat_(2), _mknod_(2), _mknodat_(2), and
   _memfd_create_(2). In addition, open system calls _open_(2),
   _openat_(2), and _openat2_(2) are filtered if the flag *O_CREAT* is set
   and the flag *O_TMPFILE* is not set in arguments. _memfd_create_(2)
   name argument is prepended with _!memfd:_ before access check. Use e.g.
   _deny/create+!memfd:\*\*_ to deny access to memory file descriptors
   regardless of name. As of version 3.37.0, _memfd_create_(2) name
   argument is prepended with _!memfd-hugetlb:_ before access check in
   case flags include *MFD_HUGETLB*.
|< *delete*
:< Confine file deletions. List of filtered system calls are
   _unlink_(2) and _unlinkat_(2). As of version 3.33.0, _unlinkat_(2) is
   confined by this category if and only if *AT_REMOVEDIR* is not set in
   flags, otherwise it's confined by the _rmdir_ category.
|< *rename*
:< Confine file renames and hard links. List of filtered system calls are
   _rename_(2), _renameat_(2), _renameat2_(2), _link_(2), and _linkat_(2).
|< *symlink*
:< Confine creation of symbolic links. List of filtered system calls are
   _symlink_(2) and _symlinkat_(2).
|< *truncate*
:< Confine file truncations. List of filtered system calls are
   _truncate_(2), _truncate64_(2), _ftruncate_(2), _ftruncate64_(2), and
   _fallocate_(2). In addition, open system calls _open_(2), _openat_(2),
   and _openat2_(2) are filtered if the flag *O_TRUNC* is set in arguments
   and the flags *O_TMPFILE* or *O_CREAT* are not set in arguments.
|< *chdir*
:< Confine directory changes. List of filtered system calls are
   _chdir_(2) and _fchdir_(2). Additional hardening may be achieved using
   the _trace/deny_dotdot:1_ option to deny parent directory traversals. It
   is possible to set this option at runtime before sandbox is locked. This
   allows for incremental confinement. See the *Path Resolution
   Restriction For Chdir and Open Calls* section for more information.
|< *readdir*
:< Confine directory listings. List of filtered system calls are
   _open_(2), _openat_(2), and _openat2_(2) when they're called on an
   existing directory regardless of the *O_DIRECTORY* flag.
|< *mkdir*
:< Confine creation of directories. List of filtered system calls are
   _mkdir_(2), _mkdirat_(2), _mknod_(2) and _mknodat_(2).
|< *rmdir*
:< Confine deletion of directories. List of filtered system calls are
   _rmdir_(2) and _unlinkat_(2). Note _unlinkat_(2) is confined by this
   category if and only if *AT_REMOVEDIR* is set in flags, otherwise it's
   confined by the _delete_ category. This category was split from the
   _delete_ category as of version 3.33.0.
|< *chown, chgrp*
:< Confine owner and group changes on files. List of filtered system calls
   are _chown_(2), _chown32_(2), _fchown_(2), _fchown32_(2), _lchown_(2),
   _lchown32_(2), and _fchownat_(2).
|< *chmod*
:< Confine mode changes on files. List of filtered system calls are
   _chmod_(2), _fchmod_(2), _fchmodat_(2), and _fchmodat2_(2). In addition,
   a _umask_(2) value may be set using the _trace/force_umask_ option which
   is enforced at _chmod_(2) boundary as well as during regular file
   creation, e.g. setting _trace/force_umask:7177_ effectively
   disallows setting s{u,g}id bits, all group+other bits and execute bit
   for the current user. This feature is useful in setting up W^X (Write
   XOR Execute) configuration for the sandbox.
|< *chattr*
:< Confine extended attribute changes on files. List of filtered system
   calls are _setxattr_(2), _setxattrat_(2), _fsetxattr_(2),
   _lsetxattr_(2), _removexattr_(2), _removexattrat_(2),
   _fremovexattr_(2), and _lremovexattr_(2). In addition, Syd ensures
   extended attributes whose name start with the one of the prefixes
   _security._, _trusted._ and _user.syd._ can not be listed or tampered
   by the sandbox process unless the sandbox lock is _off_ for the
   respective process. This access can be permitted to the initial
   sandbox process with _lock:exec_ or to all sandbox processes with
   _lock:off_. As of version 3.37.0, this restriction may be lifted with
   _trace/allow_unsafe_xattr:1_.
|< *chroot*
:< Confine change of the root directory using the _chroot_(2) system call.
   This sandboxing category can be disabled with
   _trace/allow_unsafe_chroot:1_ at startup, when the _chroot_(2) system
   call becomes a no-op. Similarly the _pivot_root_(2) system call is
   denied with the _errno_(3) *EPERM* by default unless
   _trace/allow_unsafe_pivot_root:1_ is set at startup in which case it
   becomes a no-op like _chroot_(2). No actual change of
   root directory takes place either way. *Syd must share the root
   directory with the sandbox process to work correctly.* Instead, Syd
   will prevent all filesystem access after the first allowed
   _chroot_(2) attempt regardless of the root directory argument. The
   only exception to the prevention of filesystem access is the
   _chdir_(2) system call with the specific argument */*, aka the root
   directory, is allowed. This ensures a TOCTOU-free way to support the
   common use-case of cutting all filesystem access by means of a
   _chroot_(2) call to /var/empty which is common case among unix
   daemons. This sandboxing category does not depend on the Linux
   capability *CAP_SYS_CHROOT*, therefore can be used in unprivileged
   context. Syd drops the *CAP_SYS_CHROOT* Linux capability by default
   unless _trace/allow_unsafe_caps:1_ is passed at startup.
|< *utime*
:< Confine last access and modification time changes on files. List of
   filtered system calls are _utime_(2), _utimes_(2), _futimesat_(2),
   _utimensat_(2), and _utimensat_time64_(2).
|< *mkbdev*
:< Confine block device creation. List of filtered system calls are
   _mknod_(2) and _mknodat_(2). Block device creation is disabled by
   default to adhere to the principle of secure defaults with a kernel
   level seccomp-bpf filter which terminates the process on violation.
   This filter includes the Syd process, so a compromised Syd process
   will not be able to create block devices either. Therefore, the user
   must opt-in at startup using the _trace/allow_unsafe_mkbdev:1_ option
   to use this category for path-based access checks on block devices.
|< *mkcdev*
:< Confine character device creation. List of filtered system calls are
   _mknod_(2) and _mknodat_(2). Character device creation is disabled by
   default to adhere to the principle of secure defaults with a kernel
   level seccomp-bpf filter which terminates the process on violation.
   This filter includes the Syd process, so a compromised Syd process
   will not be able to create character devices either. Therefore, the
   user must opt-in at startup using the _trace/allow_unsafe_mkcdev:1_
   option to use this category for path-based access checks on character
   devices.
|< *mkfifo*
:< Confine named pipe (FIFO) creation. List of filtered system calls are
   _mknod_(2) and _mknodat_(2).
|< *mktemp*
:< Confine temporary file creation. List of filtered system calls are
   _open_(2), _openat_(2), and _openat2_(2) with the *O_TMPFILE* flag. A
   rule such as _allow/mktemp+/tmp_ permits the sandbox process to create
   _anonymous_ temporary files under the directory /tmp. The creation of
   regular files of temporary nature are confined by the *create*
   category instead.
|< *net*
:< Confine network access. Socket types UNIX, IPv4, IPv6,
   NetLink and KCAPI are supported, use the option
   _trace/allow_unsupp_socket:1_ to pass-through sockets of unsupported
   types. UNIX domain sockets are always matched on absolute path,
   therefore always start with the character */*. UNIX abstract sockets are
   prefixed with the *@* character before access check. Similarly unnamed
   UNIX sockets use the dummy path _!unnamed_ for access check. Finally,
   network sandboxing concentrates on confining the initial connection
   action and leaves out the system calls _recvfrom_(2), _recvmsg_(2) and
   _recvmmsg_(2) as out of scope for sandbox confinement for performance
   reasons and due to a lack of security implications noting the fact that
   recv\* system calls cannot specify target addresses.
|< *net/bind*
:< Confine binding network access. This category confines the _bind_(2)
   system call, UNIX domain socket file creation using the _mknod_(2) and
   _mknodat_(2) system calls, and UNIX socket-pair creation using the
   _socketpair_(2) system call. _socketpair_(2) system call uses the
   dummy path _!unnamed_ for access check. Unnamed UNIX sockets use the
   same dummy path.
|< *net/connect*
:< Confine connecting network access. List of filtered system calls are
   _connect_(2), _sendto_(2), _sendmsg_(2), and _sendmmsg_(2). For IPv4
   and IPv6 sockets, the target address of these system calls are also
   checked against the IP blocklist, see the description of the _block_
   command in _syd_(2) manual page for more information.
|< *net/sendfd*
:< Confine sending of file descriptors. The list of filtered system
   calls are _sendmsg_(2) and _sendmmsg_(2). As of version 3.31.0, file
   descriptors referring to block devices, directories and symbolic links
   may not be passed. The restriction on block devices can be lifted
   with _trace/allow_unsafe_mkbdev:1_. UNIX domain sockets are
   always matched on absolute path, therefore always start with the
   character */*. UNIX abstract sockets are prefixed with the _@_ (at
   sign) character before access check. Similarly unnamed UNIX sockets
   use the dummy path _!unnamed_ for access check.
|< *net/link*
:< Confine _netlink_(7) sockets used in communication between kernel and
   user space. This sandboxing category may be used to specify a list of
   _netlink_(7) families to allow for the sandbox process. Use e.g.
   _allow/net/link+route_ to allow the *NETLINK_ROUTE* family. See the
   _syd_(2) manual page for more information.
|< *lock/read*
:< Use _landlock_(7) to confine file read access.
   This category corresponds to the _landlock_(7) access right
   *LANDLOCK_ACCESS_FS_READ_FILE* and only applies to the content of the
   directory not the directory itself. As of version 3.33.0, _lock/exec_
   and _lock/readdir_ access rights are confined in their respective
   categories. Previously, this category included the access rights
   *LANDLOCK_ACCESS_FS_EXECUTE* and *LANDLOCK_ACCESS_FS_READ_DIR* as
   well.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *lock/write*
:< Use _landlock_(7) to confine file write access.
   This category corresponds to the _landlock_(7) access right
   *LANDLOCK_ACCESS_FS_WRITE_FILE* and only applies to the content of
   the directory not the directory itself.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *lock/exec*
:< Use _landlock_(7) to confine file execution.
   This category corresponds to the _landlock_(7) access right
   *LANDLOCK_ACCESS_FS_EXECUTE* and only applies to the content of the
   directory not the directory itself.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *lock/ioctl*
:< Use _landlock_(7) to confine _ioctl_(2) operations.
   This category corresponds to the _landlock_(7) access right
   *LANDLOCK_ACCESS_FS_IOCTL_DEV* and only applies to the content of the
   directory not the directory itself. This access right is
   supported as of Landlock ABI version 4 which was introduced with
   Linux-6.7. This command has no effect when running on older Linux
   kernels. Use _syd-lock_(1) to check the latest Landlock ABI supported
   by the running Linux kernel.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *lock/create*
:< Use _landlock_(7) to confine file creation, renames and links.
   This category corresponds to the _landlock_(7) access right
   *LANDLOCK_ACCESS_FS_MAKE_REG* and only applies to the content of the
   directory not the directory itself.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *lock/delete*
:< Use _landlock_(7) to confine file unlinking, renames and links.
   This category corresponds to the _landlock_(7) access right
   *LANDLOCK_ACCESS_FS_REMOVE_FILE* and only applies to the content of
   the directory not the directory itself.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *lock/rename*
:< Use _landlock_(7) to confine link or rename a file from or to a
   different directory (i.e. reparent a file hierarchy). This category
   corresponds to the _landlock_(7) access right *LANDLOCK_ACCESS_FS_REFER*
   and only applies to the content of the directory not the directory
   itself. This access right is supported as of Landlock ABI version 2
   which was introduced with Linux-5.19. This command has no effect when
   running on older Linux kernels. Use _syd_lock_(1) to check the latest
   Landlock ABI supported by the running Linux kernel.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *lock/symlink*
:< Use *Landlock LSM* to confine symbolic link creation, renames and links.
   This category corresponds to the _landlock_(7) access right
   *LANDLOCK_ACCESS_FS_MAKE_SYM* and only applies to the content of the
   directory not the directory itself.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *lock/truncate*
:< Use *Landlock LSM* to confine file truncation with _truncate_(2),
   _ftruncate_(2), _creat_(2), or _open(2)_ with *O_TRUNC*.
   This category corresponds to the _landlock_(7) access right
   *LANDLOCK_ACCESS_FS_TRUNCATE* and only applies to the content of the
   directory not the directory itself. This access right is
   supported as of Landlock ABI version 3 which was introduced with
   Linux-6.2. This command has no effect when running on older Linux
   kernels. Use _syd-lock_(1) to check the latest Landlock ABI supported
   by the running Linux kernel.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *lock/readdir*
:< Use *Landlock LSM* to confine directory listings.
   This category corresponds to the _landlock_(7) access right
   *LANDLOCK_ACCESS_FS_READ_DIR* and applies to the given directory and
   the directories beneath it.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *lock/mkdir*
:< Use *Landlock LSM* to confine directory creation and renames.
   This category corresponds to the _landlock_(7) access right
   *LANDLOCK_ACCESS_FS_MAKE_DIR* and only applies to the content of the
   directory not the directory itself.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *lock/rmdir*
:< Use *Landlock LSM* to confine directory deletion and renames.
   This category corresponds to the _landlock_(7) access right
   *LANDLOCK_ACCESS_FS_REMOVE_DIR* and only applies to the content of
   the directory not the directory itself.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *lock/mkbdev*
:< Use *Landlock LSM* to confine block device creation, renames and
   links. This category corresponds to the _landlock_(7) access right
   *LANDLOCK_ACCESS_FS_MAKE_BLOCK*.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *lock/mkcdev*
:< Use *Landlock LSM* to confine character device creation, renames and
   links. This category corresponds to the _landlock_(7) access right
   *LANDLOCK_ACCESS_FS_MAKE_CHAR*.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *lock/mkfifo*
:< Use *Landlock LSM* to confine named pipe (FIFO) creation, renames and
   links. This category corresponds to the _landlock_(7) access right
   *LANDLOCK_ACCESS_FS_MAKE_FIFO*.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *lock/bind*
:< Use *Landlock LSM* to confine network ports for _bind_(2) and UNIX
   domain socket creation, renames and links. This category corresponds to
   the Landlock access right *LANDLOCK_ACCESS_NET_BIND_TCP* for network
   ports, and *LANDLOCK_ACCESS_FS_MAKE_SOCK* for UNIX domain sockets. The
   latter access right only applies to the content of the directory not the
   directory itself. The access right *LANDLOCK_ACCESS_NET_BIND_TCP* is
   supported as of Landlock ABI version 4 which was introduced with
   Linux-6.7. This command has no effect when running on older Linux
   kernels. Use _syd_lock_(1) to check the latest Landlock ABI supported by
   the running Linux kernel.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *lock/connect*
:< Use *Landlock LSM* to confine network ports for _connect_(2).
   This category corresponds to the Landlock access right
   *LANDLOCK_ACCESS_NET_CONNECT_TCP*. This access right is supported as
   of Landlock *ABI* version 4 which was introduced with Linux-6.7. This
   command has no effect when running on older Linux kernels. Use
   _syd_lock_(1) to check the latest Landlock ABI supported by the
   running Linux kernel.
   This category is enforced completely in kernel-space so it can be
   used to construct a multi-layered sandbox.
   See the *Lock Sandboxing* section for more information.
|< *block*
:< Application firewall with capability to include _ipset_ and _netset_ files.
   List of filtered system calls are _accept_(2), _accept4_(2),
   _connect_(2), _sendto_(2), _sendmsg_(2), _sendmmsg_(2). IPv4 and
   IPv6 family sockets are supported. Source and target addresses are
   checked against the IP blocklist. Refer to the description of the
   *block* command in _syd_(2) manual page for more information.
|< *fs*
:< Confine file opens based on filesystem type. By default, no
   filesystem types are allowed. To make this sandboxing practical, the
   _fs_ profile included by the _linux_ profile allows all filesystem types
   except aafs, bpf_fs, securityfs, selinux, smack, debugfs, pstorefs,
   tracefs, cgroup, cgroup2, nsfs, pid_fd, rdtgroup, devmem, efivarfs,
   hostfs, mtd_inode_fs, openprom, daxfs, secretmem, bdevfs, binderfs,
   usbdevice, xenfs, and zonefs. Use _allow/fs+<fstype>_ to allow a
   filesystem type.
|< *force*
:< Verified Execution: Verify binary/library integrity at
   _exec_(3)/_mmap_(2) time which is similar to *Veriexec* (NetBSD) &
   *IntegriForce* (HardenedBSD). See the *Force Sandboxing* section for
   more information.
|< *tpe*
:< Trusted Path Execution: Execution only allowed from *Trusted
   directories* for *Trusted files* which are not writable by group or
   others and are optionally owned by root or current user. This feature is
   similar to the implementation of Grsecurity & HardenedBSD. See the *TPE
   Sandboxing* section for more information.
|< *crypt*
:< Transparent File Encryption with AES-CTR and HMAC-SHA256, see the
   *Crypt Sandboxing* section for more information.
|< *proxy*
:< SOCKS5 proxy forwarding with network namespace isolation. Defaults to
   TOR. See the *Proxy Sandboxing* section for more information.
|< *pty*
:< Run sandbox process inside a new pseudoterminal. See the *PTY
   Sandboxing* section for more information.
|< *mem, pid*
:< Memory and PID sandboxing: Simple, unprivileged alternatives to
   Control Groups. See the *Memory Sandboxing* and *PID Sandboxing*
   sections for more information.
|< *SafeSetID*
:< Safe user/group switching with predefined UID/GID transitions like
   *SafeSetID* of the *Linux* kernel. See the *SafeSetID* section for more
   information.
|< *Ghost mode*
:< Detach Syd from the sandbox process, similar to _seccomp_(2) Level 1, aka
   "Strict Mode". See the *Ghost mode* section for more information.

Sandboxing for a category may be _on_ or _off_: If sandboxing is off,
none of the relevant system calls are checked and all access is granted.
If, however, sandboxing is on, the action defaults to _deny_ and
allowlists and denylists can be used to refine access rights, e.g.
_allow/read+/etc/passwd_. The default action for a sandboxing category
may be changed with the respective option, e.g. default/force:kill.
See the _syd_(2) manual page for more information on how to configure
Syd sandbox policies. If the sandbox process invokes a system call that
violates access, this attempt is reported in system log and the system
call is denied from execution. There are two ways to customise this
behaviour. Syd may be configured to _allow_ some _glob_(3p) patterns. If
the path argument of the system call which is subject to be modified
matches a pattern in the list of allowed _glob_(3p) patterns, this
attempt is not denied. If, however it matches a pattern in the list of
_deny_ _glob_(3p) patterns the attempt is denied. *If many rules match
the same path or address, the last matching pattern wins*. It is also
possible to use the actions _exit_, _kill_, _abort_, _stop_, _panic_,
and _warn_ instead of the _allow_ and _deny_ actions. The list of
available sandboxing actions is given below:

[< *allow*
:< Allow system call.
|< *warn*
:< Allow system call and warn.
|< *filter*
:< Deny system call silently.
|< *deny*
:< Deny system call and warn. This is the default.
|< *panic*
:< Deny system call, warn and panic the current Syd thread.
|< *stop*
:< Deny system call, warn and stop offending process.
|< *abort*
:< Deny system call, warn and abort offending process.
|< *kill*
:< Deny system call, warn and kill offending process.
|< *exit*
:< Warn, and exit Syd immediately with deny _errno_(3) as exit value.

_deny_ is default unless another default action is set using one of the
_default/<category>:<action>_ options. See _syd_(2) manual page for more
information. _exit_ causes Syd to exit immediately with all the sandbox
processes running under it. _kill_ makes Syd send the offending process
a *SIGKILL* signal and deny the system call. _stop_ makes Syd send the
offending process a *SIGSTOP* signal and deny the system call. _abort_
makes Syd send the offending process a *SIGABRT* signal and deny the
system call. Unlike _kill_ and _stop_ actions sandbox processes are able
to catch the *SIGABRT* signal, therefore _abort_ action should only be
used for debugging in trusted environments where a _core_(5) dump file
may provide invaluable information. _panic_ causes the respective Syd
emulator thread to panic in which case the system call is denied by an
RAII guard. This behaviour of _panic_ action is currently functionally
equivalent to the _deny_ action, however it may be further extended in
the future where Syd emulator processes are fork+exec'ed and address
space is rerandomized by ASLR on each access violation. _warn_ makes Syd
allow the system call and print a warning about it which is used by
_pandora_(1) for learning mode. Additionally, Syd may be configured to
_filter_ some _glob_(3p) patterns. In this case a match will prevent Syd
from reporting a warning about the access violation, the system call is
still denied though. For _lock/\*_ categories the only available action
is _allow_, and these categories accept path names rather than
_glob_(3p) patterns as arguments. Relative paths are permitted for all
_lock/\*_ categories except _lock/bind_ which requires either an
absolute UNIX domain socket path or a port-range as argument.

## SANDBOX CATEGORY SETS

As of v3.38.0, multiple categories may be specified split by commas and
the following sets are defined to streamline sandbox profile
composition. Names are intentionally chosen to be consistent with
OpenBSD's _pledge_(2) and FreeBSD's capsicum _rights_(4freebsd):

[< *all*
:< All categories
|< *all-x*
:< All categories except *exec*
|< *lock/all*
:< All _landlock_(7) access rights
|< *lpath*
:< walk, stat, chdir
|< *rpath*
:< read, readdir
|< *lock/rpath*
:< lock/read, lock/readdir
|< *wpath*
:< write, truncate
|< *lock/wpath*
:< lock/write, lock/truncate
|< *cpath*
:< create, delete, rename
|< *lock/cpath*
:< lock/create, lock/delete, lock/rename
|< *dpath*
:< mkbdev, mkcdev
|< *lock/dpath*
:< lock/mkbdev, lock/mkcdev
|< *spath*
:< mkfifo, symlink
|< *lock/spath*
:< lock/mkfifo, lock/symlink
|< *tpath*
:< mkdir, rmdir
|< *lock/tpath*
:< lock/mkdir, lock/rmdir
|< *fown*
:< chown, chgrp
|< *fattr*
:< chmod, chattr, utime
|< *net*
:< net/bind, net/connect, net/sendfd
|< *lock/net*
:< lock/bind, lock/connect
|< *inet*
:< net/bind, net/connect
|< *lock/inet*
:< lock/bind, lock/connect
|< *bnet*
:< net/bind
|< *lock/bnet*
:< lock/bind
|< *cnet*
:< net/connect
|< *lock/cnet*
:< lock/connect
|< *snet*
:< net/sendfd

Some examples are given below:

```
default/all:kill
sandbox/inet:off
deny/cpath,rpath,wpath+${HOME}/.ssh/***
kill/spath+/tmp/***
allow/inet+loopback!1024-65535
kill/unix+/dev/log
```

## SANDBOX RULE SHORTCUTS

Sandbox capabilities may be passed to sandbox actions either as a single
unit or as a comma-delimited list, e.g:

```
allow/read,write,stat,exec+/***
allow/read,write,stat-/***
deny/read,write,stat+/***
deny/read,write-/***
filter/read,write,stat+/dev/mem
filter/read,write-/dev/mem
```

As of version 3.18.14, sandboxing modes may be specified as a single
unit or as a comma-delimited list, e.g:

```
sandbox/read,write,stat,exec:on
sandbox/net,lock:off
```

As of version 3.19.0, namespace types may be specified as a single unit
or as a comma-delimited list, e.g.:

```
unshare/user,pid,mount:on
unshare/net,cgroup:off
```

As of version 3.35.0, default modes may be specified as a single unit
or as a comma-delimited list, e.g:

```
default/write,truncate:kill
default/read,stat:allow
```

## SegvGuard

As of version 3.16.3, Syd has a simple implementation of SegvGuard. The
implementation is inspired by that of HardenedBSD with identical
defaults: If a sandbox process receives a signal that may produce a
_core_(5) dump file for _segvguard/maxcrashes_ times (defaults to 5), in
a period of _segvguard/expiry_ seconds (defaults to 2 minutes),
subsequent attempts to execute the same executable is denied for
_segvguard/suspension_ seconds (defaults to 10 minutes). SegvGuard can
be disabled by setting _segvguard/expiry:0_. SegvGuard support depends
on _ptrace_(2), therefore it may also be disabled by setting
_trace/allow_unsafe_ptrace:1_ at startup. The trigger signals for
SegvGuard are *SIGABRT*, *SIGBUS*, *SIGFPE*, *SIGILL*, *SIGIOT*,
*SIGKILL*, *SIGQUIT*, *SIGSEGV*, *SIGSYS*, *SIGTRAP*, *SIGXCPU*, and
*SIGXFSZ*. The signal *SIGKILL* is intentionally included into the list
even though it is not a _core_(5) dump file generating signal to make
_kill_ rules trigger SegvGuard, a design later mirrored in HardenedBSD's
work on PaX SEGVGUARD and Capsicum integration. 

Check out the following links for further information on SegvGuard:

- http://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Deter_exploit_bruteforcing
- http://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Active_kernel_exploit_response
- http://phrack.org/archives/issues/59/9.txt
- http://phrack.org/archives/issues/58/4.txt
- https://github.com/HardenedBSD/hardenedBSD/wiki/segvguard2-ideas---brainstorm
- https://hardenedbsd.org/article/shawn-webb/2025-03-01/hardenedbsd-february-2025-status-report

## Force Sandboxing

Force Sandboxing enhances system security by scrutinizing the path
provided to _execve_(2) and _execveat_(2) system calls, comparing them
against a predefined Integrity Force map -- a registry of
path-to-checksum correlations. Upon invocation of these calls, the
sandbox computes the checksum of the target binary and cross-references
it with the map. Discrepancies trigger rule-defined actions: execution
might proceed with a logged warning, or culminate in the termination of
the process in violation. This mechanism allows for rigorous enforcement
of binary integrity, echoing the preventative ethos of HardenedBSD's
Integriforce and NetBSD's Veriexec by proactively mitigating
unauthorised code execution, albeit with a unique emphasis on flexible,
user-defined consequence management ranging from permissive alerts to
stringent execution blocks.

Distinguishing itself through user-centric customization, Force Sandboxing
offers a versatile approach to execution integrity. Administrators can tailor
the sandbox's response to checksum mismatches -- kill, deny, or warn -- thereby
balancing security needs with operational flexibility. This adaptability,
combined with tools like _syd-sha_(1) for checksum calculation and _syd-path_(1)
for rule creation, positions Force Sandboxing as a powerful ally in the
preservation of system integrity. See _force_ command in _syd_(2) manual
page on how to add/remove entries to/from the Integrity Force map.

As of version 3.16.3, Syd checks the paths of the dynamic libraries an
executable is linked against for force access as well. This only works
for ELF files.

As of version 3.21.3, Syd hooks into _mmap_(2), and _mmap2_(2) system
calls and checks the file descriptor for Force access when the memory
protection mode includes *PROT_EXEC* and flags does not include
*MAP_ANONYMOUS* which typically indicates a _dlopen_(3). Therefore
libraries dynamically loaded at runtime are checked for Force access as
well.

## TPE sandboxing

As of version 3.21.0, Syd introduces Trusted Path Execution (TPE)
sandboxing, which restricts the execution of binaries to ensure they
come from _trusted directories_. As of version 3.37.2, the binary file
must be _trusted_ as well as its parent directory. The intention is to
make privilege escalation harder when an account restricted by TPE is
compromised as the attacker won't be able to execute custom binaries
which are not in the trusted path. A binary is _trusted_ if the file and
its parent directory meet the following criteria:

- Not writable by group or others.
- Optionally owned by root, controlled by the _tpe/root_owned_ option.
- Optionally owned by the current user or root, controlled by the _tpe/user_owned_ option.
- Optionally part of the root filesystem, controlled by the _tpe/root_mount_ option.

If these criteria are not met, the execution is denied with an *EACCES*
_errno_(3), and optionally, the offending process can be terminated with the
*SIGKILL* signal using the _default/tpe:kill_ option. This mechanism
ensures that only binaries from secure, trusted paths can be executed,
enhancing security by preventing unauthorized code execution. TPE
sandboxing operates by checking the the executables at three stages:

- During the system call entry of _execve_(2) and _execveat_(2) to check scripts.
- On _ptrace_(2) exec event to check the ELF executable and dynamic loader.
- On _mmap_(2) when dynamic libraries are mapped to memory, typically with _dlopen_(3).

TPE can be configured to apply to a specific user group. By default, TPE
applies to all users. However, administrators can specify an untrusted
GID with the _tpe/gid_ setting, restricting TPE only to users in that
group. Additionally, TPE can negate GID logic with the _tpe/negate_
setting, making the specified group trusted and exempt from TPE.

Syd's TPE implementation is based on HardenedBSD's which is inspired
by GrSecurity's TPE. Check out the following links for more information:

- http://phrack.org/issues/52/6.html#article
- http://phrack.org/issues/53/8.html#article
- https://wiki.gentoo.org/wiki/Hardened/Grsecurity_Trusted_Path_Execution

## Lock Sandboxing

Lock sandboxing utilises the *Landlock Linux Security Module* for simple
unprivileged access control. It is enforced completely in kernel-space
and the policy is also applied to the Syd process, such that a
compromised Syd process is still stuck inside the _landlock_(7) sandbox,
therefore Lock sandboxing can be used to construct a multi-layered
sandbox for added security. Lock sandboxing may be turned on with the
_sandbox/lock:on_ sandbox command at startup. Paths to files and file
hierarchies should be populated using the _lock/\*_ categories either
specifying them one at a time, e.g. _allow/lock/read+/usr_,
_allow/lock/write+/dev/null_ or by specifying them as a comma delimited
list, e.g. allow/lock/read,write,ioctl+/dev/null. The shorthand
_lock/all_ is provided to ease configuration and it stands for the union
of categories _lock/read_, _lock/write_, _lock/exec_, _lock/ioctl_,
_lock/create_, _lock/delete_, _lock/rename_, _lock/symlink_,
_lock/truncate_, _lock/readdir_, _lock/mkdir_, _lock/rmdir_,
_lock/mkdev_, _lock/mkfifo_, and _lock/bind_. As of
version 3.29.0, network confinement is supported and allowlisted
_bind_(2) and _connect_(2) ports can be specified using the commands
_allow/lock/bind+port_ and _allow/lock/connect+port_. A closed range in
format _port1-port2_ may also be specified instead of a single port
number. Use the _lock/bind_ category with an absolute path to confine
UNIX domain socket creation, renames and links, e.g
_allow/lock/bind+/run/user/${SYD_UID}_. As of version 3.35.0, the default
compatibility level has been changed to _Hard Requirement_. Compared to
the old default _Best Effort_, this level ensures the sandbox is fully
enforced. Moreover, *ENOENT* ("No such file or directory"), errors are
made fatal in this level. The compatibility level may be changed at
startup using the command _default/lock_. See the _syd_(2) manual page
for more information.

## Crypt Sandboxing

This sandboxing category provides transparent file encryption using
AES-CTR, with HMAC-SHA256 ensuring secure data handling without manual
encryption steps. When _sandbox/crypt:on_ is set, files matching the
_glob_(3) patterns specified by _crypt+_ are encrypted on write and
decrypted on read. Configuration includes specifying a 32-bit decimal
encryption key serial ID for the _keyrings_(7) interface using
_crypt/key/main_, and specifying a 32-bit decimal authentication key
serial ID for the _keyrings_(7) interface using _crypt/key/auth_.
Specifying the same key serial ID for both options is permitted and the
option _crypt/key_ may be used as a shorthand to set both key serial
IDs. The specified key serial IDs are used with the
*ALG_SET_KEY_BY_KEY_SERIAL* _setsockopt_(2) operation which is new in
Linux-6.2, therefore _Crypt sandboxing requires Linux-6.2 or newer_.
The keys must have _search_ permission -- i.e. have the
*KEY\_(POS|USR|GRP|OTH)\_SEARCH* permission bit(s) set so the kernel can
locate and copy the key data into the crypto API; otherwise the
operation will be denied (*EPERM*: "Operation not permitted"). Refer to
the following link for more information
https://lkml.org/lkml/2022/10/4/1014.

The utility _syd-key_(1) may be used to generate encryption keys and
save them to _keyrings_(7) for use with Crypt sanboxing. To avoid
including the key serial IDs into the configuration file, the user may
set the key serial IDs using an environment variable and then specify
this environment variable, e.g: crypt/key:${SYD_KEY_ID}. The user
_must_ use an environment variable name that starts with the prefix
*SYD_* but does not start with the prefix *SYD_TEST_* as such
environment variables don't leak into the sandbox process. Similarly the
user _must_ refrain from using any environment variable specified under
the ENVIRONMENT section of the _syd_(1) manual page.

Encryption operates via Linux kernel cryptography API sockets, using
zero-copy techniques with _splice_(2) and _tee_(2) to avoid unencrypted
data in memory. To assert we use zero-copy exclusively and respect
user's privacy by avoiding to read plain-text into memory at all costs,
_syd_aes_ threads who are responsible for encryption are confined with a
_seccomp_(2) filter to deny the _read_(2), _open_(2), and _socket_(2)
system calls (and many more) and allow the _write_(2) system call only
up to 32 bytes which is required to write the HMAC tag and the random IV
to the file. The setup sockets are created on startup, the key is
selected using the _keyrings_(7) interface without copying the key
material into userspace. IV uniqueness is ensured by generating a random
IV using _getrandom_(2) per file. In case of an error retrieving entropy
via _getrandom_(2) the random bytes in AT_RANDOM are used instead.
Per-file IV is prepended to encrypted files. This ensures security by
preventing IV reuse. Syd ensures that per-file IVs are securely zeroized
on drop.

A 32-byte HMAC (SHA256) message authentication tag is included between
the file magic header and the IV, and is authenticated on decrypt,
following the Encrypt-then-MAC approach. This provides integrity
checking and resistance against bit-flip attacks. By default, decryption
occurs in a memory file descriptor to prevent tampering, which limits
practicality for large files due to memory constraints. User may
specify a secure temporary backing directory with _crypt/tmp_ to
workaround this. Ideally this directory should be on encrypted storage
as Syd is going to write plaintext here. File locks are set before
attempting to encrypt files to ensure security and safe concurrent
access. Linux OFD locks are used for locking. Encrypted data is flushed
to disk only after all file descriptors that point to the encrypted open
file description are closed enabling safe and performant concurrent
access. File appends are handled efficiently with last block
reencryption. Only regular files will be encrypted. The file format
header *\\x7fSYD3* identifies encrypted files and the version in the
header must match the current Syd API which at the moment is *3*.
Compared to GSWTK's dbfencrypt, Crypt sandboxing avoids TOCTOU
vulnerabilities and encryption weaknesses by utilizing AES-CTR with
HMAC-SHA256 and robust setup steps, providing a more secure and
streamlined encryption process.

Crypt sandboxing employs the AES-CTR algorithm, a secure and efficient
symmetric key encryption method suitable for various applications. It
operates as a stream cipher (skcipher) with a block size of 1 byte,
allowing data to be encrypted in a byte-by-byte manner. The algorithm
uses a fixed key size of 32 bytes (256 bits) by default, providing
robust security, and a fixed initialization vector (IV) size of 16 bytes
to ensure randomness and uniqueness in each encryption operation.
Processing data in byte-sized chunks, the algorithm maintains a
consistent walk size of 16 bytes for traversal and operations, ensuring
seamless encryption and decryption processes. This configuration, with
its secure default key size, significantly enhances security, preventing
common encryption weaknesses and supporting efficient, transparent file
encryption within the sandbox environment. The inclusion of HMAC-SHA256
for integrity checking further enhances security by detecting any
unauthorized modifications or corruption of data. CTR is infinitely
parallelizable because each block in the stream can be encrypted
independently. This allows for encryption and decryption processes to
be split across multiple processors, significantly increasing
throughput. With hardware support such as AES-NI CPU instructions,
speeds can easily exceed a gigabyte per second.

As of version 3.21.2, Syd opens memory file descriptors with the flag
*MFD_NOEXEC_SEAL* during transparent decryption to ensure the memfds are
non-executable and can't ever be marked executable. This ensures
security as otherwise transparent decryption can be used to bypass Exec,
Force and TPE sandboxing. Notably, this flag requires Linux-6.3 or
newer. On older kernels, a backing directory must be specified with
_crypt/tmp_ for transparent decryption to work. Attempt to use
transparent decryption without a backing directory on older kernels will
fail with the _errno_(3) *EOPNOTSUPP* ("Operation not supported on
transport endpoint"). As of version 3.28.0, Syd allows this restriction
to be lifted with the option _trace/allow_unsafe_memfd:1_.

As of version 3.39.0, _keyrings_(7) interface is used for key management
and specifying keys as raw payload is no longer permitted. Moving key
material into the kernel _keyrings_(7) interface substantially reduces
the exposure of raw keys to userland, narrowing the attack surface for
memory-disclosure, core-dump, and accidental-persistence vulnerabilities
while enabling cryptographic operations to be performed without copying
key bytes into process memory. Because _keyrings_(7) enforce kernel-side
permissions and lifecycle semantics (search/view/revoke, expiries,
etc.), they provide a principled provenance and access-control model
that simplifies secure rotation, auditing, and least-privilege
enforcement. Together, these properties both harden the runtime security
posture and facilitate integration with hardware-backed or sealed key
types, improving operational compliance and reducing the likelihood of
application-level key-management errors.

*File Format*: Each file encrypted within the Crypt sandboxing framework
follows a structured format to ensure consistency, secure handling, and
clear identification. Each encrypted file starts with a five-byte magic
header, *\\x7fSYD3*, where *\\x7fSYD* indicates that the file is
encrypted by Syd, and *3* denotes the current API version. This header
is followed by a 32-byte HMAC (SHA256) message authentication tag,
providing integrity checking by authenticating the encrypted content.
Next is followed by a 16-byte initialization vector (IV), which is
unique per file, ensuring strong cryptographic security. The
AES-CTR-encrypted ciphertext follows the IV, providing the file's
protected content. Syd will only process files that match this format
and have a compatible version; if a file does not have the correct file
format header or API version, or if it exists unencrypted, Syd will
leave it untouched. This approach prevents unintended operations on
incompatible or unencrypted files.

```
+----------------+-------------------------+-----------------------+--------------------+
| Magic Header   | HMAC Tag                | Initialization Vector | Encrypted Content  |
| "\\x7fSYD3"     | 32 bytes (SHA256 HMAC)  | 16 bytes              | AES-CTR Ciphertext |
+----------------+----------------------- -+-----------------------+--------------------+
```

*Limitations:*

- *Large files* are not handled efficiently during decryption by default
  due to usage of in-memory files, specify a secure temporary backing
  directory with _crypt/tmp:/path_ to workaround this. Ideally this
  directory should be on encrypted storage as Syd is going to write
  plaintext here.
- *Concurrent Access*: Encrypted file access utilises Linux OFD locks,
  which are now standardized in POSIX 2024. Ensure that the underlying
  filesystem fully supports OFD locks to enable effective advisory file
  locking. Modern filesystems and NFS implementations compliant with POSIX
  2024 typically provide this support, mitigating issues present in older
  versions. The multithreaded architecture of Syd relies on OFD locks to
  ensure safe and efficient concurrent access, eliminating the need for
  alternative locking mechanisms such as POSIX advisory locks. For further
  details, refer to the _fcntl_locking_(2) manual page.
- *Crash Safety*: Currently, encrypted data is flushed to disk only
  after all file descriptors are closed. In the event of a system or
  sandbox crash, this may result in incomplete writes or potential data
  loss, as in-flight data might not be persisted. Future enhancements will
  focus on implementing transactional flush mechanisms and crash recovery
  procedures to ensure atomicity and integrity of encrypted data, thereby
  improving resilience against unexpected terminations.

*Utilities*:

- _syd-aes_(1): Encrypt/decrypt files akin to _openssl-enc_(1ssl).
- _syd-key_(1)
    - Generate random AES-CTR keys using _getrandom_(2), and save to _keyrings_(7).
    - Read passphrases from TTY or STDIN, hash with SHA3-256, and save to _keyrings_(7).

## Proxy Sandboxing

As of version 3.22.0, Proxy sandboxing in Syd confines network
communication exclusively through a designated SOCKS proxy, enforced by
the helper utility _syd-tor_(1). Configured at startup with
_sandbox/proxy:on_, this type implies the use of _unshare/net:1_,
isolating network namespaces to prevent direct network access. Traffic
is forwarded from a specified local port (proxy/port:9050) to an
external address and port (proxy/ext/host:127.0.0.1,
proxy/ext/port:9050). As of version 3.34.1, you may also specify an
external UNIX domain socket using e.g.
proxy/ext/unix:/path/socks5.sock. This setup ensures all network
interactions route through the proxy, leveraging zero-copy data
transfers and edge-triggered _epoll_(7) for efficient event handling.
The implementation enhances security by employing seccomp and Landlock
for additional confinement, preventing unauthorized network access and
ensuring strict adherence to the defined network path. This approach
minimizes the risk of proxy bypasses and maintains the integrity of the
network isolation.

## PTY Sandboxing

As of version 3.36.0, PTY Sandboxing runs the target process inside a
dedicated pseudoterminal managed by the _syd-pty_(1) helper, isolating
all terminal I/O from the host TTY and preventing direct _ioctl_(2) or
control-sequence escapes. The PTY main is proxied via an edge-triggered
_epoll_(7) loop with non-blocking zero-copy _splice_(2), ensuring no
unencrypted data ever traverses user space. A minimal _seccomp_(2)
filter confines only the essential PTY syscalls (e.g. *TIOCGWINSZ*,
*TIOCSWINSZ*) and denies all others -- including injection via *TIOCSTI* --
while Landlock locks down access to the PTY device, filesystem, and
network. Combined with no-exec memory seals and namespace isolation,
this approach hardens against terminal-based attacks and preserves the
confidentiality and integrity of the sandboxed session.

## Memory Sandboxing

This sandboxing category handles the system calls _brk_(2), _mmap_(2),
_mmap2_(2), and _mremap_(2) and checks the per-process memory usage on
each memory allocation request. If the memory usage reaches the maximum
value defined by _mem/max_, the system call is denied with *ENOMEM*.
Moreover the virtual memory size can be limited using _mem/vm_max_. If
the limit is reached on the entry of any of the respective system calls,
the system call is denied with *ENOMEM* and the signal *SIGKILL* is
delivered to the offending process. Subsequent to the delivery of the
signal, the _process_mrelease_(2) system call is called on the process
to immediately release memory. The default action may be changed using
the _default/mem_ option. The per-process memory usage is a fair
estimate calculated using the file _proc_pid_smaps_(5) summing the
following fields together:

- _Pss (Proportional Set Size)_ is similar to _Rss_, but \
accounts for shared memory more accurately by dividing it among the \
processes that share it. _Rss (Resident Set Size)_ is the portion of \
memory occupied by a process that is held in RAM.
- _Private_Dirty_ represents the private memory that has \
been modified (dirty).
- _Shared_Dirty_ represents the shared memory that has \
been modified.

As of version 3.43.1, the memory sandboxing system has been updated to
improve memory usage tracking. Syd now enforces a strict memory limit
based on allocation granularity, meaning that programs cannot exceed the
defined memory limits, even by the amount they allocate at once. This
change aligns the limit with the allocation size rather than allowing
any overflow beyond the limit. Additionally, memory tracking has been
optimized by switching from iterating over _proc_pid_smaps_(5) to using
the more efficient _/proc/pid/smaps_rollup_, which consolidates memory
usage information for better performance and more accurate enforcement
of memory constraints.

_Memory sandboxing is not an alternative to cgroups(7)!_ You should use
_cgroups_(7) when you can instead. This sandboxing category is meant for
more constrained environments where _cgroups_(7) is not supported or not
available due to missing permissions or other similar restrictions.

## PID sandboxing

This sandboxing category handles the system calls _fork_(2), _vfork_(2),
_clone_(2), and _clone3_(2) and checks the total number of tasks running
on the system on each process creation request. If the count reaches the
maximum value defined by _pid/max_, the system call is denied with
*EAGAIN*. If _pid/kill_ is set to true, the signal *SIGKILL* is
delivered to the offending process. This sandboxing category is best
coupled with a pid namespace using _unshare/pid_. In this mode, Syd will
check the number of running tasks in the current namespace only.

As of version 3.40.0, with _unshare/pid:1_ the limit and accounting
apply per PID namespace; on Linux 6.14 and newer the namespaced
_kernel.pid_max_ _sysctl_(8) is set to _max(pid/max, 301)_ so the
kernel's 300 reserved PIDs do not reduce the configured headroom, and on
older kernels _kernel.pid_max_ _sysctl_(8) is not modified.

_PID sandboxing is not an alternative to cgroups(7)!_ You should use
_cgroups_(7) when you can instead. This is meant for more constrained
environments where _cgroups_(7) is not supported or not available due to
missing permissions or other similar restrictions.

## SafeSetID

*SafeSetID*, introduced in version 3.16.8, enhancing the management of
UID/GID transitions. This feature enables finer-grained control by
allowing administrators to explicitly specify permissible transitions
for UID and GID changes, thus tightening security constraints around
process privilege management. It works by allowing predefined UID and
GID transitions that are explicitly configured using the
_setuid+<source_uid>:<target_uid>_ and
_setgid+<source_gid>:<target_gid>_ commands in the Syd configuration.
This ensures that transitions can only occur between specified user and
group IDs, and unauthorised privilege escalations are blocked. For
instance, a transition might be allowed from a higher-privileged user to
a less-privileged user but not vice versa, thereby preventing any
escalation of privileges through these system calls.

As of version 3.24.5, Syd applies a kernel-level _seccomp_(2) filter by
default to deny all set\*uid system calls with UID less than or equal to
11 which is typically the operator user, and all set\*gid system calls
with GID less than or equal to 14 which is typically the uucp group.
This means even a compromised Syd process cannot elevate privileges
using these system calls. Refer to the output of the command _syd-ls
setid_ to see the full list of system calls in this group.

When a UID or GID transition is defined Syd will keep the *CAP_SETUID*
and *CAP_SETGID* capabilities respectively and sandbox process will
inherit these capabilities from Syd. Since version 3.24.6, Syd drops the
*CAP_SETUID* capability after the first successful UID transition and
similarly the *CAP_SETGID* capability after the first successful GID
transition. This means Syd can only ever change its UID and GID once in
its lifetime. However, this does not completely lock the setid system
calls in the sandbox process: Transitions to Syd's current UID and GID
are continued in the sandbox process which means the first successful
UID and GID transition will continue to function as long as the sandbox
process keeps the respective *CAP_SETUID*, and *CAP_SETGID*
capabilities. This allows containing daemons, such as _nginx_(1), which
spawn multiple unprivileged worker processes out of a single main
privileged process.

## Ghost mode

Ghost Mode, introduced in Syd version 3.20.0, is a one-way sandboxing
mode, closely resembling _seccomp_(2) Level 1, also known as *Strict
Mode*. This mode enhances security by allowing a process to transition
to a highly restrictive state after completing its initial setup. When a
sandboxed process is ready for this higher level of confinement, it
invokes Ghost Mode by executing the _stat_(2) system call with the
virtual path _/dev/syd/ghost_. Upon receiving this command, Syd closes
the _seccomp_unotify_(2) file descriptor. This action elevates all
previously hooked system calls to a kernel-level deny with the *ENOSYS*
("Function not implemented") _errno_(3), effectively making them
unavailable. The transition to Ghost Mode is irreversible; once the file
descriptor is closed, the process is locked into this restricted state.
This mechanism ensures that the sandboxed process can only perform a
very limited set of operations, akin to those allowed in Seccomp Level
1, thus significantly reducing its potential attack surface. Ghost Mode
provides a robust security measure by denying all but the most essential
system calls, which is crucial for applications that require maximum
isolation and security after their initial configuration phase.

The mode is aptly named ghost because, upon closing the
_seccomp_unotify_(2) file descriptor, the sandboxed process effectively
detaches from Syd and becomes independent, much like a ghost. Entering
ghost mode subsequently causes the _syd_mon_ monitor thread and all
_syd_emu_ emulator threads to exit, and the remaining _syd_main_ thread
merely waits for the sandbox process to exit without any further
intervention. This detachment underscores the finality and isolation of
the Ghost Mode, ensuring that the process operates in a secure, tightly
confined environment without further interaction from Syd. This
mechanism is particularly useful for processes that require maximum
security and minimal system call exposure after their initial
configuration phase, providing a robust layer of protection against
various exploits and vulnerabilities.

A process cannot enter Ghost mode once the sandbox lock is set.
Alternatively, though, a process can set its process dumpable attribute
to zero using the *PR_SET_DUMPABLE* _prctl_(2). Under Syd, this achieves
almost the same effect as Syd will not be able to emulate system calls
with the per-process directory inaccessible. This provides an
unprivileged way to enter Ghost mode.

# SECURITY

Syd stands out for its ability to operate without requiring elevated
privileges, eliminating the need for root access. This feature
significantly simplifies setup and usage. Users benefit from the
capability to dynamically configure the sandbox from within, with
options to secure it further as needed. Tip: To take a quick peek at the
seccomp filters applied by Syd under various different configurations,
use _syd <flags...> -Epfc_ where PFC stands for Pseudo Filter Code which
yields a human-readable textual dump of Syd's _seccomp_(2) filters. Syd
further enrichens the output of this textual dump with *#* comments.

## Threat Model

_Syd strictly adheres to the current threat model of seccomp(2)_. The goal
is to restrict how untrusted userspace applications interact with the
shared OS kernel through system calls to protect the kernel from
userspace exploits (e.g., shellcode or ROP payload). The kernel is
trusted. Syd's threat model delineates the sandbox as the trusted
interceptor of system calls, while all user applications running within
the sandbox are considered untrusted. These untrusted applications can
manipulate their execution environment through syscalls, and attackers
are assumed to have the capability to execute arbitrary code within
these applications. Syd uses several mechanisms, including _seccomp_(2)
and _ptrace_(2) for syscall filtering, _landlock_(7) for filesystem
access restrictions, and _namespaces_(7) for process and device
isolation, to limit the impact of these potential attacks. The threat
model assumes that attackers have control over the untrusted user space
and may attempt reads, writes, or arbitrary code execution that could
influence the behavior of the trusted sandbox or exploit syscall
handling. The security of Syd relies on the correctness of its
implementation and the underlying Linux kernel features it utilises. It
is assumed that there are no vulnerabilities in Syd's interception and
handling of syscalls, nor in the enforcement mechanisms provided by
_landlock_(7) and _namespaces_(7). External attacks via network vectors
or physical access to hardware are considered out of scope for this
threat model.

"The sandbox lock" is an integral component of Syd's security
architecture, which governs the configurability and integrity of the
sandbox environment. By default, the sandbox lock is set to _on_,
effectively preventing any further sandbox commands after the initial
setup, thereby ensuring that once the sandbox is configured and the
primary process is executed, the security policies remain unaltered by
any untrusted processes within the sandbox. When the lock is set to
_exec_, only the initial sandbox process retains the authority to access
and modify the sandbox configuration, enabling a trusted process to
securely establish the sandbox parameters while maintaining a _pidfd_
(process ID file descriptor) to the initial process to safeguard against
PID recycling attacks. Conversely, if the lock is set to _off_, all
sandbox processes are permitted to access and modify the sandbox
configuration, allowing for broader configurability during the setup
phase. However, this state persists only until the sandbox is explicitly
locked, after which the lock becomes immutable and the sandbox policies
are fixed, preventing any subsequent processes from altering the
configuration. This layered locking mechanism, reinforced by the use of
_pidfd_ in _exec_ mode, effectively safeguards against untrusted
processes attempting to modify sandbox settings to escalate privileges
or circumvent restrictions, thereby maintaining a robust and secure
execution environment within Syd's framework. In _ipc_ mode, the sandbox
configuration is accessible through a UNIX socket which may or may not
be accessible from within the sandbox depending on sandbox ACL rules.
In _read_ mode, the sandbox configuration is accessible only to reads,
but NOT edits. Transition from lock modes _off_, _exec_, and _ipc_ into
one of _read_ and _on_ is one-way and idempotent: It results in the
sandbox policy getting sealed in memory using the _mseal_(2) system call
either immediately or simultaneously with sandbox process startup.
Transitions between lock modes _read_ and _on_ are not permitted.

"Crypt Sandboxing" in Syd ensures the confidentiality and integrity of
specified files by transparently encrypting them using AES-CTR with
HMAC-SHA256, even when adversaries fully control processes within the
sandbox (i.e., attackers can execute arbitrary code and perform any
allowed system calls). In this extended threat model, it is acknowledged
that while attackers may access plaintext data within the sandbox's
memory during process execution, they cannot extract encryption keys or
plaintext data from outside the controlled environment, nor can they
interfere with the encryption process to leak keys or plaintext to
persistent storage or external channels. Cryptographic operations are
performed via kernel-level cryptography API sockets using zero-copy
techniques to prevent plaintext from residing in user-space memory
buffers accessible to attackers. The _syd_aes_ threads responsible for
encryption are confined with strict _seccomp_(2) filters, denying them
critical system calls like _read_(2), _open_(2), and _socket_(2), and
allowing only minimal _write_(2) operations required for encryption
metadata (e.g., writing the HMAC tag and random IV to the file). This
confinement prevents exploitation that could leak sensitive data.
Encryption keys are handled using kernel _keyrings_(7) interface and the
*ALG_SET_KEY_BY_KEY_SERIAL* _setsockopt_(2) option. The threat model
trusts the kernel and Syd's implementation, assuming attackers cannot
exploit kernel vulnerabilities to access keys or plaintext within kernel
memory or cryptographic operations. Additionally, file locks are
employed before attempting to encrypt files to ensure safe concurrent
access. In contrast to the general threat model, Crypt Sandboxing
acknowledges that untrusted processes within the sandbox have access to
plaintext data in memory during normal operation, as they need to read
or write the plaintext files. However, the goal is to prevent attackers
from accessing the plaintext outside the controlled environment or
tampering with the encryption process to compromise confidentiality and
integrity. This is achieved by ensuring that the encryption keys remain
secure and that the encryption and decryption processes are tightly
controlled and isolated from untrusted code.

## Accessing remote process memory

Syd denies various system calls which can access remote process memory such
as _ptrace_(2) and _process_vm_writev_(2) and common sandboxing profiles such as
_paludis_ and _user_ disallow write access to the _/proc/pid/mem_ file. This
makes TOCTOU attack vectors harder to realise. Refer to the the output
of the command _syd-ls deny_ to see the full list of denied system
calls.

## Enhanced Handling of PTRACE_TRACEME

As of version 3.16.3, Syd introduced a new feature for managing the
*PTRACE_TRACEME* operation, aimed at improving stealth against detection.
Traditionally, *PTRACE_TRACEME* is the only _ptrace_(2) operation allowed by a
tracee, which makes it a common target for detection of ptracers. By converting
*PTRACE_TRACEME* into a no-operation (no-op) that always succeeds, Syd aims to
subtly prevent simple detection methods that rely on this operation.
Additionally, other _ptrace_(2) operations are modified to return an
*EPERM* ("Operation not permitted") _errno_(3) instead of *ENOSYS*
("Function not implemented"), which helps reduce the likelihood of the
sandbox being detected through these errors. This approach enhances the
discreetness of Syd's operation by mitigating straightforward detection
tactics used by monitored processes.

As of version 3.19.0, Syd extends this mitigation and turns the system
call _ptrace_(2) into a no-op. Again, this provides a best-effort
mitigation against using requests such as *PTRACE_ATTACH* or
*PTRACE_SEIZE* to detect a ptracer.

As of version 3.47.0, Syd improves this mitigation and turns the
_prctl_(2) calls with *PR_SET_PTRACER* argument into a no-op.

As of version 3.47.0, Syd improves this mitigation to defend against
intelligent _ptrace_(2) detectors which utilize multiple _ptrace_(2)
requests to detect ptracer. Refer to the following links for more
information on intelligent _ptrace_(2) detection:

- https://arxiv.org/pdf/2109.06127
- https://seblau.github.io/posts/linux-anti-debugging
- https://docs.rs/debugoff

## Hardened procfs and devfs

To enhance system security and mitigate potential attack vectors, Syd
enforces restrictions on _procfs_(5) and devfs file systems by
implementing several key measures: denying both the listing and opening
of block devices and files of unknown types by omitting entries
corresponding to these file types (identified by *DT_BLK* and
*DT_UNKNOWN*) from directory listings and rejecting _open_(2) operations
on them. This prevents unauthorized enumeration and access to system
storage devices, thereby mitigating information disclosure and potential
tampering.

Syd also restricts visibility within the _/proc_ directory so that
processes can only see their own process IDs, effectively preventing
discovery and potential interaction with other running processes, which
reduces risks of information leakage, privilege escalation, and process
manipulation. Access to the _/proc_ entries of the Syd process itself is
explicitly denied, safeguarding the sandbox manager from inspection or
interference and preventing access to sensitive information about the
sandboxing mechanism that could be exploited to bypass security controls
or escape the sandbox.

Additionally, Syd addresses risks associated with magic symbolic links
in _/proc_ -- such as _/proc/[pid]/exe_ and _/proc/[pid]/fd/\*_ -- by
denying access to these links when they refer to processes other than
the calling process, thus preventing exposure of sensitive file
descriptors or executable paths of other processes and mitigating
unauthorized access or container escape scenarios; this mitigation can
be disabled with the _trace/allow_unsafe_magiclinks:1_ option if
necessary, though doing so is not recommended.

Collectively, these hardened controls over procfs and devfs
significantly reduce the attack surface by preventing information
disclosure, unauthorized access, and potential privilege escalations,
ensuring that sandboxed applications operate within a tightly controlled
and secure environment that adheres to the principle of least privilege
and maintains system integrity. Refer to the following links for more
information:

- https://forums.whonix.org/t/proc-pid-sched-spy-on-keystrokes-proof-of-concept-spy-gksu/8225
- https://homes.luddy.indiana.edu/xw7/papers/zhou2013identity.pdf
- https://petsymposium.org/2016/files/papers/Don%E2%80%99t_Interrupt_Me_While_I_Type__Inferring_Text_Entered_Through_Gesture_Typing_on_Android_Keyboards.pdf
- https://staff.ie.cuhk.edu.hk/~khzhang/my-papers/2016-oakland-interrupt.pdf
- https://www.cs.ucr.edu/~zhiyunq/pub/sec14_android_activity_inference.pdf
- https://www.gruss.cc/files/procharvester.pdf
- https://www.kicksecure.com/wiki/Dev/Strong_Linux_User_Account_Isolation#/proc/pid/sched_spy_on_keystrokes
- https://www.openwall.com/lists/oss-security/2011/11/05/3
- https://www.usenix.org/legacy/event/sec09/tech/full_papers/zhang.pdf
- https://www.openwall.com/lists/oss-security/2025/11/05/3

## Hardened proc_pid_status(5)

As of version 3.38.0, Syd filters _proc_pid_status_(5) at _open_(2)
boundary to defeat common sandbox-fingerprinting heuristics while
preserving compatibility with ordinary tooling. When a process (or its
threads) reads /proc/<pid>/status or /proc/<pid>/task/<tid>/status, Syd
normalizes only the security-critical fields -- zeroing _TracerPid_,
_NoNewPrivs_, _Seccomp_, and _Seccomp_filters_, and rewriting the
sandbox-revealing phrases in _Speculation_Store_Bypass_ and
_SpeculationIndirectBranch_. This targeted normalization breaks trivial
anti-analysis checks (ptracer presence, seccomp/no_new_privs probes,
speculative mitigation fingerprints) without altering process state.

The security impact is twofold: untrusted code loses a low-cost oracle
for environment discovery, reducing the likelihood of logic bombs or
capability gating based on sandbox detection, and defenders retain
observability because the kernel's real enforcement still applies --
only the user-space view of these select fields is masked. For forensic
and debugging workflows that explicitly need the unfiltered view, this
mitigation can be temporarily relaxed per trace with
_trace/allow_unsafe_proc_pid_status:1_, after which toggling back to _:0_
restores the hardened, stealth-preserving default.

## Hardened uname(2)

As of version 3.15.1, Syd mediates _uname_(2) and returns a policy
governed _utsname_ that suppresses host identification and constrains
kernel disclosure. The release string is synthesized to expose only the
Linux major and minor as observed on the host or, as of 3.36.1, as
supplied via *SYD_ASSUME_KERNEL* for controlled feature detection, while
the micro component is randomized per Syd run to limit patch level
fingerprinting; reads of _/proc/version_ and
_/proc/sys/kernel/osrelease_ are hardened to present the same masked
view. As of 3.40.0, the nodename, domainname, and version fields are
sourced from the options _uts/host_, _uts/domain_, and _uts/version_
with defaults _localhost_, _(none)_, and a startup random value. As of
3.44.2, this restriction may be relaxed at startup with the option
_trace/allow_unsafe_uname:1_. Practical effects include disrupting
exploit and loader selection that depend on exact release matching,
reducing cross host correlation via stable node and domain labels,
neutralizing sandbox and VM fingerprinting heuristics that key off
_uname_(2) and the corresponding _proc_(5) paths, and keeping build and
compatibility probes functional by retaining _major.minor_ semantics
while allowing explicit control through *SYD_ASSUME_KERNEL*. Workloads
that tie licensing, clustering, telemetry, or feature gates to the
precise host release or to the original nodename should use the _uts_
options to supply the required identity or opt out with the relaxation
flag.

## Denying TIOCLINUX ioctl

The limitation on the use of the *TIOCLINUX* _ioctl_(2) within secure
environments, similar to the Syd sandbox, is an essential security measure
addressing vulnerabilities specific to Linux terminal operations. The
*TIOCLINUX* _ioctl_(2) command offers various functionalities, including but not
limited to manipulating console settings, changing keyboard modes, and
controlling screen output. While these capabilities can be leveraged for
legitimate system management tasks, they also introduce potential security
risks, particularly in multi-user environments or in the context of sandboxed
applications.

The security concerns surrounding *TIOCLINUX* stem from its ability to alter
terminal behaviors and settings in ways that could be exploited for unauthorised
information disclosure, terminal hijacking, or privilege escalation. For
instance, manipulating the console display could mislead users about the true
nature of the operations being executed, or altering keyboard settings could
capture or inject keystrokes.

In summary, the restriction on *TIOCLINUX* within secure environments is a vital
security strategy, addressing the complex risks associated with direct terminal
manipulation capabilities. This precaution is in keeping with the broader
security community's efforts to mitigate known vulnerabilities and enhance the
security posture of systems handling sensitive processes and data.

## Denying TIOCSTI ioctl

The restriction on the use of the *TIOCSTI* _ioctl_(2) within the Syd
sandbox addresses a significant security vulnerability associated with
terminal input injection. The *TIOCSTI* _ioctl_(2) allows a byte to be
inserted into the terminal input queue, effectively simulating keyboard
input. This capability, while potentially useful for legitimate
purposes, poses _a substantial security risk_, especially in scenarios
where a process might retain access to a terminal beyond its intended
lifespan. Malicious use of this _ioctl_(2) can lead to the injection of
commands that execute with the privileges of the terminal's owning
process, thereby breaching the security boundaries intended by user
permissions and process isolation mechanisms. The concern over *TIOCSTI*
is well-documented in the security community. For example, OpenBSD has
taken measures to mitigate the risk by disabling the *TIOCSTI*
_ioctl_(2), reflecting its stance on the _ioctl_(2) as _one of the most
dangerous_ due to its potential for abuse in command injection attacks.
The decision to disable or restrict *TIOCSTI* in various Unix-like
operating systems underscores the _ioctl_(2)'s inherent security
implications, particularly in the context of privilege escalation and
the execution of unauthorised commands within a secured environment.

In summary, the restriction on *TIOCSTI* within Syd is a critical
security measure that prevents a class of vulnerabilities centered
around terminal input injection, safeguarding against unauthorised
command execution and privilege escalation. This precaution aligns with
broader security best practices and mitigations adopted by the security
community to address known risks associated with terminal handling and
process isolation.

## Denying FS_IOC_SETFLAGS ioctl

As of version 3.24.2, Syd denies the *FS_IOC_SETFLAGS* _ioctl_(2)
request by default, a critical security measure to ensure that once file
flags are set, they remain unchanged throughout the runtime of the
sandbox. This policy is particularly focused on the _immutable_ and
_append-only_ flags, which need to be configured by an administrator at
the start of the Syd process. Once these attributes are set on crucial
system and log files -- marking them either as immutable to prevent any
modification, or append-only to ensure that existing data cannot be
erased -- they are frozen. This means that no subsequent modifications
can be made to these attributes, effectively locking down the security
settings of the files against any changes. This approach prevents
scenarios where, even after a potential security breach, malicious
entities are unable to alter or delete important files, thus maintaining
the integrity and reliability of the system against tampering and
ensuring that audit trails are preserved.

## Denying PR_SET_MM prctl

The *PR_SET_MM* _prctl_(2) call allows processes with the *CAP_SYS_RESOURCE*
capability to adjust their memory map descriptors, facilitating operations like
self-modifying code by enabling dynamic changes to the process's memory layout.
For enhanced security, especially in constrained environments like Syd, this
capability is restricted to prevent unauthorised memory manipulations that could
lead to vulnerabilities such as code injection or unauthorised code execution.
Notably, Syd proactively drops *CAP_SYS_RESOURCE* among other capabilities at
startup to minimise security risks. This action is part of Syd's broader
security strategy to limit potential attack vectors by restricting process
capabilities.

## Restricting prctl option space and trace/allow_unsafe_prctl

Syd meticulously confines the scope of permissible _prctl_(2) operations to
enhance security within its sandbox environment. By limiting available
_prctl_(2) options to a specific set, including but not limited to
*PR_SET_PDEATHSIG*, *PR_GET_DUMPABLE*, *PR_SET_NO_NEW_PRIVS*, and
*PR_SET_SECCOMP*, Syd ensures that only necessary process control
functionalities are accessible, thereby reducing the risk of exploitation
through less scrutinised _prctl_(2) calls. This constraint is pivotal in
preventing potential security vulnerabilities associated with broader _prctl_(2)
access, such as unauthorised privilege escalations or manipulations of process
execution states. However, recognizing the need for flexibility in certain
scenarios, Syd offers the option to lift these restrictions through the
_trace/allow_unsafe_prctl:1_ setting. This capability allows for a tailored
security posture, where users can opt for a more permissive _prctl_(2)
environment if required by their specific use case, while still maintaining
awareness of the increased security risks involved.

## Restricting io_uring interface and trace/allow_unsafe_uring

The _io_uring_(7) interface can be used to _bypass path sandboxing_. By default,
Syd restricts _io_uring_(7) operations due to their ability to perform system
calls that could undermine the sandbox's security controls, particularly those
designed to limit file access and modify file permissions. The setting,
_trace/allow_unsafe_uring_, when enabled, relaxes these restrictions, allowing
_io_uring_(7) operations to proceed unimpeded. While this can significantly
enhance I/O performance for applications that rely on _io_uring_(7) for
efficient asynchronous operations, it requires careful consideration of the
security implications, ensuring that its use does not inadvertently compromise
the sandboxed application's security posture. Refer to the output of the
command _syd-ls uring_ to see the full list of system calls that belong
to the _io_uring_(7) interface.

## Restricting creation of device special files

Since version 3.1.12, Syd has enhanced its security model by disallowing
the creation of device special files through the _mknod_(2) and
_mknodat_(2) system calls. This decision is rooted in mitigating
potential security vulnerabilities, as device special files could be
exploited to circumvent established path-based access controls within
the sandbox environment. These files, which include character and block
devices, can provide direct access to hardware components or facilitate
interactions with kernel modules that could lead to unauthorised actions
or data exposure. By restricting their creation, Syd significantly
reduces the risk of such exploit paths, reinforcing the integrity and
security of the sandboxed applications. This measure ensures that only
predefined types of files -- such as FIFOs, regular files, and sockets --
are permissible, aligning with the principle of least privilege by
limiting file system operations to those deemed safe within the
sandbox's context.

## Sharing Pid namespace with signal protections

Since version 3.6.7, Syd has introduced a crucial security feature that
prevents sandboxed processes from sending signals to the Syd process or
any of its threads. This protection is implemented by hooking and
monitoring system calls related to signal operations, including
_kill_(2), _tkill_(2), _tgkill_(2), and _pidfd_open_(2). When a
sandboxed process attempts to send a signal to Syd or its threads, these
system calls are intercepted, and the operation is denied at the seccomp
level with an *EACCES* ("Permission denied") _errno_(3). This measure
ensures that Syd maintains control over the execution and management of
sandboxed processes, safeguarding against interruptions or unauthorised
interactions that could compromise the security or stability of the
sandbox environment. This security mechanism is part of Syd's broader
strategy to share the same root, private proc, and mount namespaces with
the sandboxed process, facilitating secure and simple system call
emulation. By making Syd and its threads immune to signals from
sandboxed processes, the integrity and isolation of the sandboxed
environment are significantly enhanced, preventing potential
exploitation scenarios where sandboxed processes could disrupt the
operation of the sandbox manager or interfere with other sandboxed
processes.

As of version 3.35.2, Syd puts itself in a new process group using
_setpgid_(2) and releases the controlling terminal using the *TIOCNOTTY*
_ioctl_(2) request. Moreover a scope-only Landlock sandbox is installed
unconditionally to further isolate the sandbox process from the Syd
process. This ensures that terminal-generated signals and I/O remain
confined to the sandbox's process group and cannot affect Syd or any
other processes, further strengthening the sandbox's isolation
guarantees alongside the existing seccomp-based PID namespace
protections.

## Process Priority and Resource Management

Since version 3.8.1, Syd has been implementing strategies to ensure the
smooth operation of the host system while managing security through its
sandboxing mechanism. It sets the _nice_(2) value of its system call
handler threads to _19_, ensuring these threads operate at _the lowest
priority_ to minimise CPU starvation for other critical processes. This
approach prioritises system stability and fair CPU resource
distribution, enabling Syd to handle numerous system calls without
compromising the host's performance and responsiveness.

Enhancing this strategy, Syd introduced further adjustments in versions
3.8.6 and 3.9.7 to address I/O and CPU resource management more
comprehensively. From version 3.8.6, it sets the I/O priority of the
system call handler threads to _idle_, ensuring that I/O operations do
not monopolise resources and lead to I/O starvation for other processes.
Similarly, from version 3.9.7, it adjusts the CPU scheduling priority of
these threads to _idle_, further safeguarding against CPU starvation.
These measures collectively ensure that Syd maintains optimal
performance and system responsiveness while securely sandboxing
applications, striking a balance between security enforcement and
efficient system resource utilization.

As of version 3.30.0, changes in process and I/O priorities are
inherited by sandbox processes as well and sandbox processes are
prevented from making any further changes. Moreover, the option
_trace/allow_unsafe_nice_ may be set at startup to prevent Syd from
making any changes and allow sandbox processes access to the system
calls that are used to make process and I/O priority changes.

## Streamlining File Synchronization Calls

As of version 3.8.8, Syd has rendered the _sync_(2) and _syncfs_(2)
system calls as no-operations (no-ops), ensuring they report success
without executing any underlying functionality. This adjustment is
designed to streamline operations within the sandboxed environment,
bypassing the need for these file synchronization actions that could
otherwise impact performance or complicate the sandbox's control over
file system interactions. By adopting this approach, Syd enhances its
compatibility with applications that issue these calls, without altering
the sandboxed process's behavior or the integrity of file system
management. As of version 3.28.0, this restriction can be disabled at
startup with the option _trace/allow_unsafe_sync:1_. This is useful in
scenarios where sync is actually expected to work such as when
sandboxing databases.

## Restricting Resource Limits, Core Dumps, and trace/allow_unsafe_prlimit

Since version 3.9.6, Syd has implemented restrictions on setting process
resource limits and generating core dumps for the sandboxed process,
enhancing the sandbox's security posture. This measure prevents the
sandboxed process from altering its own resource consumption boundaries
or producing core dumps, which could potentially leak sensitive
information or be exploited for bypassing sandbox restrictions. However,
recognizing the need for flexibility in certain use cases, Syd provides
the option to disable these restrictions at startup through the
_trace/allow_unsafe_prlimit:1_ setting. This allows administrators to
tailor the sandbox's behavior to specific requirements, balancing
security considerations with functional needs.

## Enhancing Sandbox Security with Landlock

Since version 3.0.1, Syd leverages _landlock_(7) to enforce advanced
filesystem sandboxing, significantly bolstering the security framework
within which sandboxed processes operate. By integrating Landlock, Syd
empowers even unprivileged processes to create secure sandboxes,
enabling fine-grained access control over filesystem operations without
requiring elevated permissions. This approach is instrumental in
mitigating the risk of security breaches stemming from bugs or malicious
behaviors in applications, offering a robust layer of protection by
restricting ambient rights, such as global filesystem or network access.
Landlock operates by allowing processes to self-impose restrictions on
their access to system resources, effectively creating a secure
environment that limits their operation to a specified set of files and
directories. This mechanism is particularly useful for running legacy
daemons or applications that require specific environmental setups, as
it allows for the precise tailoring of access rights, ensuring processes
can only interact with designated parts of the filesystem. For instance,
by setting Landlock rules, Syd can confine a process's filesystem
interactions to read-only or read-write operations on explicitly allowed
paths, thus preventing unauthorised access to sensitive areas of the
system.

Furthermore, the inclusion of the Syd process itself within the
Landlock-enforced sandbox adds an additional layer of security. This
design choice ensures that even if the Syd process were compromised, the
attacker's ability to manipulate the sandboxed environment or access
unauthorised resources would be significantly constrained. This
self-sandboxing feature underscores Syd's commitment to maintaining a
high security standard, offering peace of mind to users by ensuring
comprehensive containment of sandboxed processes.

## Namespace Isolation in Syd

Syd enhances sandbox isolation through meticulous namespace use,
starting from version 3.0.2. Version 3.9.10 marks a pivotal enhancement
by restricting user subnamespace creation, addressing a key path
sandboxing bypass vulnerability. This strategic limitation thwarts
sandboxed processes from altering their namespace environment to access
restricted filesystem areas. Furthermore, since version 3.11.2, Syd
maintains process capabilities within user namespaces, mirroring the
_unshare_(1) command's --keep-caps behavior. This ensures sandboxed
processes retain necessary operational capabilities, enhancing security
without compromising functionality. Additionally, Syd utilises the
powerful _bind_ command within the mount namespace to create secure,
isolated environments by allowing specific filesystem locations to be
remounted with custom attributes, such as _ro_, _noexec_, _nosuid_,
_nodev_, or _nosymfollow_, providing a flexible tool for further
restricting sandboxed processes' access to the filesystem.

Syd also introduces enhanced isolation within the mount namespace by
offering options to bind mount temporary directories over /dev/shm and
/tmp, ensuring that sandboxed processes have private instances of these
directories. This prevents inter-process communication through shared
memory and mitigates the risk of temporary file-based attacks, further
solidifying the sandbox's defence mechanisms. As of version 3.35.2, an
empty mount namespace may be built from scratch starting with the
_root:tmpfs_ command. As of version 3.11.2, Syd mounts the _procfs_(5)
filesystem privately with the _hidepid=2_ option, enhancing privacy by
concealing process information from unauthorised users. As of version
3.37.2, this option is changed to _hidepid=4_ which is new in Linux>=5.8
for added hardening. As of version 3.39.0 the option _subset=pid_ is
also supplied to private _procfs_(5) mount for added hardening.
This option is also new in Linux>=5.8.

Syd's _container_ and _immutable_ profiles exemplify its adaptability,
offering from isolated to highly restrictive environments. The container
profile provides a general-purpose sandbox, while the immutable profile
enforces stricter controls, such as making essential system directories
read-only, to prevent tampering. This comprehensive approach underlines
Syd's adept use of kernel features for robust sandbox security, ensuring
a secure and controlled execution environment for sandboxed
applications. See _syd-cat -pcontainer_, and _syd-cat -pimmutable_ to
list the rules in these sandboxing profiles.

As of version 3.23.0, Syd has further strengthened its security with the
introduction of a time namespace, represented by the _unshare/time:1_
option, allows Syd to reset the boot-time clock, ensuring that the
_uptime_(1) command reports container uptime instead of host uptime.
Moreover, the creation of namespaces, including mount, UTS, IPC, user,
PID, net, cgroup, and time is denied by default to prevent unauthorized
namespace manipulation that could undermine path sandboxing security. To
allow specific namespace types, administrators must explicitly enable
them via the _trace/allow_unsafe_namespace_ setting. Another restriction
to note is that the system calls _mount_(2), _mount_setattr_(2),
_umount_(2), and _umount2_(2) are denied by default unless _mount_
namespace is allowed. This change ensures tighter control over process
capabilities and isolation, reinforcing the defense mechanisms against
potential security breaches.

## Restricting environment and trace/allow_unsafe_env

As of version 3.11.1, Syd has implemented measures to clear unsafe
environment variables, such as *LD_PRELOAD*, enhancing security by preventing
the manipulation of dynamic linker behavior by sandboxed processes. This action
mitigates risks associated with dynamic linker hijacking, where adversaries may
load malicious shared libraries to execute unauthorised code, potentially
leading to privilege escalation, persistence, or defence evasion. Variables like
*LD_PRELOAD* allow specifying additional shared objects to be loaded before any
others, which could be exploited to override legitimate functions with malicious
ones, thus hijacking the execution flow of a program. To accommodate scenarios
where developers might need to use these variables for legitimate purposes,
Syd allows this security feature to be disabled at startup with
_trace/allow_unsafe_env:1_, offering flexibility while maintaining a
strong security posture. This careful balance ensures that sandboxed
applications operate within a tightly controlled environment, significantly
reducing the attack surface and enhancing the overall security framework within
which these applications run. Refer to the output of the command _syd-ls
env_ to see the full list of environment variables that Syd clears from
the environment of the sandbox process. As of version 3.39.0, Syd
additionally clears *LANG* and the full set of *LC_\** locale variables
(e.g. *LC_CTYPE*, *LC_TIME*, *LC_ALL*, etc.) to avoid leaking locale settings
into the sandboxed process -- preventing subtle behavior differences or
information disclosure that could be abused. Similarly, the *TZ* variable
is cleared to prevent leaking timezone settings to the sandbox process.
The builtin _linux_ profile masks the file _/etc/localtime_ and the
_glob_(3p) pattern _/usr/share/zoneinfo/\*\*_ with the file
_/usr/share/zoneinfo/UTC_ preventing another vector of timezone settings
leaking into the environment of the sandbox process. For controlled
exceptions, the CLI -e flag provides fine-grained control: _-e var=val_
injects var=val into the child environment, _-e var_ removes var from
the child environment, and _-e var=_ explicitly passes through an
otherwise unsafe variable; any of these forms may be repeated as needed.

## Managing Linux Capabilities for Enhanced Security

Since its 3.0.17 release, Syd strategically curtails specific Linux
_capabilities_(7) for sandboxed processes to bolster security. By revoking privileges
such as *CAP_SYS_ADMIN* among others, Syd significantly reduces the risk of
privilege escalation and system compromise. This proactive measure ensures that
even if a sandboxed process is compromised, its ability to perform sensitive
operations is severely limited. The comprehensive list of dropped capabilities,
including but not limited to *CAP_NET_ADMIN*, *CAP_SYS_MODULE*, and
*CAP_SYS_RAWIO*, reflects a meticulous approach to minimizing the attack surface.
Refer to the output of the command _syd-ls drop_ to see the full list of
_capabilities_(7) that Syd drops at startup.

Exceptions to this stringent policy, introduced in version 3.11.1, such
as retaining *CAP_NET_BIND_SERVICE* with _trace/allow_unsafe_bind:1_,
*CAP_NET_RAW* with _trace/allow_unsafe_socket:1_, *CAP_SYSLOG* with
_trace/allow_unsafe_syslog:1_ and *CAP_SYS_TIME* with
_trace/allow_unsafe_time:1_, offer a nuanced security model. These
exceptions allow for necessary network, syslog and time adjustments
within the sandbox, providing flexibility without significantly
compromising security.

Since version 3.12.5, Syd allows the user to prevent dropping capabilities at
startup using the command _trace/allow_unsafe_caps:1_. This command may be used to
construct privileged containers with Syd.

This balanced strategy of restricting _capabilities_(7), coupled with selective
permissions, exemplifies Syd's commitment to crafting a secure yet functional
sandbox environment. By leveraging the granularity of Linux _capabilities_(7),
Syd offers a robust framework for safeguarding applications against a variety
of threats, underscoring its role as a pivotal tool in the security arsenal of
Linux environments.

## Path Resolution Restriction For Chdir and Open Calls

In Syd version 3.15.1, a configurable security feature is available to
address the risk of directory traversal attacks by restricting the use
of _.._ components in path arguments for _chdir_(2), _open_(2),
_openat_(2), _openat2_(2), and _creat_(2) system calls. This feature is
off by default, ensuring broad compatibility and operational flexibility
for a range of applications. When enabled with the _trace/deny_dotdot:1_
command, Syd strengthens its defence mechanisms against unauthorised
directory access, echoing the flexibility seen in FreeBSD's
_vfs.lookup_cap_dotdot_ sysctl. This allows for a nuanced approach to
filesystem security, where administrators can tailor the sandbox's
behavior to match specific security requirements or operational
contexts. By drawing on the security insights of FreeBSD and
HardenedBSD, Syd provides a versatile toolset for managing path
traversal security, adaptable to the unique demands of various
application environments. See the following links for more information:

- https://man.freebsd.org/cgi/man.cgi?open(2)
- https://cgit.freebsd.org/src/tree/sys/kern/vfs_lookup.c#n351

## Enhanced Symbolic Link Validation

As of version 3.13.0, Syd enhances security by enforcing stricter
validation on symbolic links within _/proc/pid/fd_, _/proc/pid/cwd_,
_/proc/pid/exe_, and _/proc/pid/root_, addressing potential misuse in
container escape scenarios. Specifically, Syd returns an *EACCES*
("Permission denied") _errno_(3) for attempts to resolve these symbolic
links if they do not pertain to the _current process_, akin to
implementing *RESOLVE_NO_MAGICLINKS* behavior of the _openat2_(2) system
call. This measure effectively hardens the sandbox against attacks
exploiting these links to access resources outside the intended
confinement, bolstering the isolation provided by Syd and mitigating
common vectors for privilege escalation and sandbox escape. As of
version 3.14.5, Syd keeps intercepting path system calls even if
sandboxing is off making this protection unconditional.

## Trusted Symbolic Links

As of version 3.37.2, Syd implements a robust symbolic-link hardening
mechanism that intercepts every _symlink_(7) resolution within untrusted
directories -- those marked world-writable, group-writable, or carrying
the sticky bit -- and denies any follow operation, returning *EACCES*
("Permission denied"); this behavior mirrors GrSecurity's
*CONFIG_GRKERNSEC_LINK* and guarantees that symlink chains in shared or
temporary locations cannot be weaponized for TOCTOU or link-trick
exploits. Under the default policy, neither direct nor nested symlinks
in untrusted paths will be traversed, and the check is applied at the
_seccomp_(2) interception layer prior to any mutable state changes --
ensuring an early, fail-close enforcement. Administrators may relax this
restriction at startup or runtime by enabling the
_trace/allow_unsafe_symlinks:1_ option, which restores legacy symlink
behavior for compatibility at the cost of re-exposing potential
link-based race vulnerabilities. Refer to the following links for more
information:

- https://wiki.gentoo.org/wiki/Hardened/Grsecurity2_Quickstart
- https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Linking_restrictions
- https://xorl.wordpress.com/2010/11/11/grkernsec_link-linking-restrictions/
- https://man7.org/linux/man-pages/man5/proc_sys_fs.5.html

## Trusted Hardlinks

As of version 3.37.4, Syd introduces a comprehensive _Trusted Hardlinks_
policy to mitigate a class of vulnerabilities stemming from unsafe
hardlink creation, particularly those enabling
time-of-check-to-time-of-use (TOCTOU) exploitation and privilege
escalation in shared filesystem environments. This mitigation enforces
strict constraints on which files may be linked, based on their
visibility, mutability, and privilege-related attributes. A file is
permitted as a hardlink target only if it is accessible for both reading
and writing by the caller, ensuring that immutable or opaque targets
cannot be leveraged in multi-stage attack chains. Furthermore, the file
must be a regular file and must not possess privilege-escalation
enablers such as the set-user-ID bit or a combination of set-group-ID
and group-executable permissions. These checks are performed
preemptively and unconditionally during syscall handling to eliminate
reliance on ambient filesystem state and to maintain integrity under
adversarial conditions. Administrators may relax this policy for
compatibility purposes using the _trace/allow_unsafe_hardlinks:1_ option,
though doing so reintroduces well-documented attack surfaces and
undermines the guarantees provided by Syd's secure execution model.
Refer to the following links for more information:

- https://wiki.gentoo.org/wiki/Hardened/Grsecurity2_Quickstart
- https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Linking_restrictions
- https://xorl.wordpress.com/2010/11/11/grkernsec_link-linking-restrictions/
- https://man7.org/linux/man-pages/man5/proc_sys_fs.5.html

## Trusted File Creation

As of version 3.37.4, Syd enforces a strict _Trusted File Creation_
policy designed to mitigate longstanding race-condition vulnerabilities
associated with unprivileged use of *O_CREAT* in shared or adversarial
environments. Building upon the Linux kernel's _protected_fifos_ and
_protected_regular_ sysctls -- as well as the stricter semantics of
grsecurity's *CONFIG_GRKERNSEC_FIFO* -- this mitigation blocks all
*O_CREAT* operations targeting pre-existing FIFOs or regular files
unless the calling process is the file's owner and the file is neither
group-writable nor world-writable, irrespective of the parent
directory's ownership or permissions. Unlike upstream Linux, which
allows certain accesses if the file resides in a directory owned by the
caller, Syd eliminates this dependency to close subtle privilege
boundary gaps and ensure consistent, capability-centric enforcement even
in nested namespace or idmapped mount scenarios. This policy guarantees
that users cannot preempt or hijack file-based IPC or partial writes via
shared directories, while maintaining usability through precise
capability trimming. For compatibility with legacy workloads or
permissive setups, this restriction may be selectively disabled by
setting the _trace/allow_unsafe_create:1_ option, though doing so
reintroduces exposure to well-documented filesystem race attacks.

As of version 3.45.0, Syd extends this policy to deny file creation
through dangling symbolic links as part of its filesystem race
hardening. At the _open_(2) boundary, the presence of *O_CREAT* implicitly
adds *O_NOFOLLOW* unless *O_EXCL* is also specified, so attempts to create
or truncate a path whose final component is a symlink will fail rather
than resolving the link target. This behaviour directly addresses
classes of vulnerabilities where privileged components are tricked into
creating or modifying files behind attacker-controlled symlinks, such as
CVE-2021-28153 in GLib (file creation via dangling symlink replacement)
and repeated symlink- or mount-race attacks in container runtimes:
CVE-2018-15664 (docker cp path traversal via symlink and mount races),
CVE-2019-16884 (runc bind-mount escape through user-controlled symlinked
host paths), CVE-2021-30465 (runc container escape via crafted /proc and
mount races), CVE-2025-31133 (runc maskedPath abuse to obtain writable
procfs bindings), CVE-2025-52565 (runc /dev/console bind-mount symlink
races leading to writable procfs targets), and CVE-2025-52881 (runc
redirected writes bypassing LSM enforcement to arbitrary procfs files).
By enforcing fail-closed semantics for all *O_CREAT* operations that
encounter symlinks, Syd reduces the attack surface for these patterns
even when higher-level code assumes symbolic links cannot influence file
creation. Refer to the following links for more information:

- https://wiki.gentoo.org/wiki/Hardened/Grsecurity2_Quickstart
- https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#FIFO_restrictions
- https://xorl.wordpress.com/2010/11/24/grkernsec_fifo-named-pipe-restrictions/
- https://man7.org/linux/man-pages/man5/proc_sys_fs.5.html
- https://nvd.nist.gov/vuln/detail/CVE-2021-28153
- https://github.com/advisories/GHSA-9hh6-p5c5-mmmf
- https://nvd.nist.gov/vuln/detail/CVE-2018-15664
- https://nvd.nist.gov/vuln/detail/CVE-2019-16884
- https://nvd.nist.gov/vuln/detail/CVE-2021-30465
- https://nvd.nist.gov/vuln/detail/CVE-2025-31133
- https://nvd.nist.gov/vuln/detail/CVE-2025-52565
- https://nvd.nist.gov/vuln/detail/CVE-2025-52881
- https://www.openwall.com/lists/oss-security/2025/11/05/3
- https://github.com/opencontainers/runc/security
- https://www.starlab.io/blog/linux-symbolic-links-convenient-useful-and-a-whole-lot-of-trouble

## Memory-Deny-Write-Execute Protections

Syd version 3.14.1 enhances its security framework by implementing
Memory-Deny-Write-Execute (MDWE) protections, aligning with the *PR_SET_MDWE*
and *PR_MDWE_REFUSE_EXEC_GAIN* functionality introduced in Linux kernel 6.3.
This feature establishes a stringent policy against creating memory mappings
that are _simultaneously writable and executable_, closely adhering to the
executable space protection mechanisms inspired by PaX project. In addition,
Syd fortifies these MDWE protections by employing kernel-level seccomp filters
on critical system calls, including _mmap_(2), _mmap2_(2), _mprotect_(2),
_pkey_mprotect_(2), and _shmat_(2). These filters are designed to intercept and
restrict operations that could potentially contravene MDWE policies, such as
attempts to make non-executable memory mappings executable or to map shared
memory segments with executable permissions. By integrating *PR_SET_MDWE*
for preemptive kernel enforcement and utilizing seccomp filters for
granular, kernel-level control over system call execution, Syd provides
a robust defence mechanism against exploitation techniques that exploit
memory vulnerabilities, thereby ensuring a securely hardened execution
environment. This restriction may be relaxed using the
_trace/allow_unsafe_exec_memory:1_ sandbox command at startup. Even
with this restriction relaxed, Syd is going to call *PR_SET_MDWE*, but it
will use the *PR_MDWE_NO_INHERIT* flag to prevent propagation of the MDWE
protection to child processes on _fork_(2).

As of version 3.25.0, Syd kills the process on memory errors rather than
denying these system calls with *EACCES* ("Permission denied"). This
ensures the system administrator gets a notification via _dmesg_(1), and
has a higher chance to react soon to investigate potentially malicious
activity. In addition, repeated failures are going to trigger SegvGuard.

As of version 3.37.0, Syd addresses a fundamental architectural vulnerability in
the Linux kernel's Memory-Deny-Write-Execute (MDWE) implementation through
proactive file descriptor writability assessment during memory mapping
operations. This enhancement directly mitigates Linux kernel bug 219227, which
exposes a critical W^X enforcement bypass wherein adversaries can circumvent
memory protection mechanisms by exploiting the semantic disconnect between
file-backed memory mappings and their underlying file descriptors. The
vulnerability manifests when executable memory regions are mapped with
*PROT_READ|PROT_EXEC* permissions from file descriptors that retain _write
access_, enabling post-mapping modification of executable memory content
through standard file I/O operations -- effectively transforming read-only
executable mappings into mutable code regions that violate fundamental
W^X invariants. By implementing mandatory writability validation prior
to permitting any file-backed executable memory mapping, Syd enforces
strict temporal isolation between memory mapping permissions and
underlying file descriptor capabilities, thereby preventing the
exploitation of this kernel-level abstraction leakage that would
otherwise enable arbitrary code injection through seemingly benign file
operations. This defense mechanism operates at the syscall interception
layer, providing comprehensive protection against sophisticated memory
corruption attacks that leverage the incongruity between virtual memory
management and file system semantics to achieve unauthorized code
execution within ostensibly hardened environments. This restriction may
be relaxed using the _trace/allow_unsafe_exec_memory:1_ sandbox
command at startup.

## Advanced Memory Protection Mechanisms

Syd version 3.15.1 enhances its security framework by integrating
sophisticated a seccomp BPF hook to meticulously block
_executable+shared_ memory mappings, targeting a critical vulnerability
exploitation pathway. As of version 3.21.3, Syd also blocks
_executable+anonymous_ memory. These updates refine the sandbox's
defence against unauthorised memory access and arbitrary code execution
by inspecting and filtering system calls, notably _mmap_(2), and
_mmap2_(2), to enforce stringent policies against dangerous memory
mapping combinations. While this bolstered security measure
significantly reduces the attack surface for exploits like buffer
overflows and code injections, it acknowledges potential legitimate use
cases, such as Just-In-Time (JIT) compilation and plugin architectures,
that may require exceptions. To accommodate necessary exceptions without
compromising overall security, Syd allows these restrictions to be
relaxed with explicit configuration through the
_trace/allow_unsafe_exec_memory:1_ command, ensuring that users can
fine-tune the balance between security and functionality according to
specific requirements, with a keen eye on preventing the propagation of
relaxed security settings to child processes.

## Null Address Mapping Prevention

In our ongoing effort to enhance the security features of Syd, as of
version 3.15.1 we introduced a crucial update inspired by the practices
of HardenedBSD, specifically aimed at bolstering our sandbox's defences
against null pointer dereference vulnerabilities. Following the model
set by HardenedBSD, Syd now includes a new security measure that
completely prohibits the mapping of memory at the NULL address using the
_mmap_(2) and _mmap2_(2) system calls with the *MAP_FIXED* and
*MAP_FIXED_NOREPLACE* flags. This addition is implemented through
meticulous seccomp filter rules that block these specific mapping
requests when the first argument (addr) is zero, effectively rendering
attempts to exploit null pointer dereferences as non-viable by ensuring
such memory allocations result in respective system call getting denied
with *EACCES* ("Permission denied"). By disallowing the execution of
arbitrary code at the NULL address, Syd significantly reduces the attack
surface associated with such vulnerabilities, reinforcing the sandbox's
commitment to providing a robust security framework for Linux systems.
This technical enhancement reflects our dedication to leveraging
advanced security insights from the broader community, embodying our
proactive stance on safeguarding against evolving threats.

Linux has _vm/mmap_min_addr_ which guards against this already. Hence,
this acts as a second layer of defense. Unlike Syd, Linux allows
processes with the *CAP_SYS_RAWIO* capability to edit/override this
value. As of version 3.37.0, Syd caps this value at page size like
OpenBSD does for added hardening against such edits.

As of version 3.25.0, all addresses lower than the value of
_vm/mmap_min_addr_ at Syd startup are included into the seccomp filter
the action of the filter is set to kill process rather than deny with
EACCES. This ensures the system administrator gets a notification via
_dmesg_(1), and has a higher chance to react soon to investigate
potentially malicious activity. In addition, repeated failures are going
to trigger SegvGuard.

## Default Memory Allocator Security Enhancement

As of version 3.46.0, Syd has transitioned to using the GrapheneOS
allocator as its default memory allocator. This new allocator leverages
modern hardware capabilities to provide substantial defenses against
common vulnerabilities like heap memory corruption, while reducing the
lifetime of sensitive data in memory. While the previously used mimalloc
with the secure option offered notable security improvements, the
GrapheneOS allocator goes further with features like out-of-line
metadata protection, fine-grained randomization, and aggressive
consistency checks. It incorporates advanced techniques such as hardware
memory tagging for probabilistic detection of use-after-free errors,
zero-on-free with write-after-free detection, and randomized quarantines
to mitigate use-after-free vulnerabilities. The allocator is designed to
prevent traditional exploitation methods by introducing high entropy,
random base allocations across multiple memory regions, and offers a
portable solution being adopted by other security-focused operating
systems like Secureblue. It also heavily influenced the next-generation
musl malloc implementation, improving security with minimal memory
usage. Refer to the following links for more information:

- https://grapheneos.org/features#exploit-mitigations
- https://github.com/GrapheneOS/hardened_malloc

## Enhanced Security for Memory File Descriptors

In version 3.21.1, Syd significantly enhanced its security posture by
introducing restrictions on memory file descriptors (memfds). The
_memfd_create_(2) system call is now sandboxed under Create sandboxing,
with the name argument prepended with _!memfd:_ before access checks.
This allows administrators to globally deny access to memfds using rules
like _deny/create+!memfd:\*_. Additionally, the _memfd_secret_(2) system
call, which requires the _secretmem.enable=1_ boot option and is seldom
used, was denied to prevent potential exploits. Despite file I/O being
restricted on secret memfds, they could be abused by attackers to write
payloads and map them as executable, thus bypassing denylisted code
execution controls.

Building on these changes, version 3.21.2 further fortifies security by
making memfds non-executable by default. This is achieved by removing
the *MFD_EXEC* flag and adding the *MFD_NOEXEC_SEAL* flag to
_memfd_create_(2), ensuring memfds cannot be made executable. Notably,
the *MFD_NOEXEC_SEAL* flag requires Linux-6.3 or newer to function.
These measures collectively mitigate the risk of memfd abuse, which can
involve executing malicious code within a sandbox, circumventing
security mechanisms like Exec, Force, and TPE sandboxing. For scenarios
where executable or secret memfds are genuinely required, the
_trace/allow_unsafe_memfd:1_ option allows for relaxing these
restrictions, though it introduces increased security risks. By default,
these enhancements enforce a robust security posture, preventing
attackers from leveraging memfds as a vector for unauthorized code
execution.

## Path Masking

Introduced in version 3.16.7, the _Path Masking_ feature in Syd enhances
security by enabling the obfuscation of file contents without denying
access to the file itself. This functionality is critical in scenarios
where compatibility requires file presence, but not file readability.
Path Masking works by redirecting any attempt to _open_(2) a specified
file to the character device _/dev/null_, effectively presenting an
empty file to the sandboxed process. The original file metadata remains
unchanged, which is essential for applications that perform operations
based on this data. Moreover, masked files can still be executed,
providing a seamless integration where executability is required but
content confidentiality must be preserved.

This feature leverages _glob_(3p) patterns to specify which files to
mask, allowing for flexible configuration tailored to diverse security
needs. By default, Syd masks sensitive paths such as _/proc/cmdline_ to
prevent the leakage of potentially sensitive boot parameters, aligning
with Syd's security-first design philosophy. Path Masking is a robust
security enhancement that minimises the risk of sensitive data exposure
while maintaining necessary system functionality and compliance with
expected application behaviors.

## Refined Socket System Call Enforcement

In Syd version 3.16.12, we have strengthened the enforcement of socket system
call restrictions within the sandbox using kernel-level BPF filters. This
enhancement builds upon existing features by embedding these controls directly
into the Syd process, ensuring that even if Syd is compromised, it cannot
utilise or manipulate denied socket domains. This proactive measure restricts
socket creation strictly to permitted domains such as UNIX (*AF_UNIX*), IPv4
(*AF_INET*), and IPv6 (*AF_INET6*), significantly reducing the network attack
surface. The _trace/allow_unsupp_socket:1_ option allows for the extension of
permissible socket domains, catering to specific needs but potentially
increasing exposure risks. Additionally, _trace/allow_safe_kcapi:1_ enables access
to the Kernel Crypto API, facilitating necessary cryptographic operations
directly at the kernel level. These enhancements provide a more secure and
configurable environment, allowing administrators precise control over network
interactions and improving the overall security posture of the sandbox.

## Enhanced Execution Control (EEC)

The Enhanced Execution Control (EEC) feature, introduced in Syd version
3.17.0, represents a significant advancement in the sandbox's defence
mechanisms. This feature strategically disables the _execve_(2) and
_execveat_(2) system calls for the Syd process after they are no longer
required for executing the sandbox process, thus safeguarding against
their potential abuse by a compromised Syd process. The prohibition of
these critical system calls adds a robust layer to the existing
Memory-Deny-Write-Execute (MDWE) protections, intensifying the system's
defences against exploit techniques such as code injection or
return-oriented programming (ROP). Concurrently, EEC ensures that the
_ptrace_(2) syscall is limited following the initial use of the
*PTRACE_SEIZE* call for execution-related mitigations. This action
effectively prevents subsequent system trace operations, barring
unauthorised process attachments and further securing the system against
manipulation. Together, these measures enhance Syd's security
architecture, reflecting an ongoing commitment to implement rigorous,
state-of-the-art safeguards within the execution environment.

As of version 3.17.1, the Enhanced Execution Control (EEC) has been
further strengthened by integrating _mprotect_(2) hardening mechanisms
specifically targeting the prevention of the _ret2mprotect_ exploitation
technique. This enhancement blocks attempts to alter memory protections
to executable (using the *PROT_EXEC* flag) via the _mprotect_(2) and
_pkey_mprotect_(2) system calls. By adding these checks, EEC mitigates the
risk associated with compromised Syd processes by enforcing stringent
memory operation policies that prevent unauthorised memory from becoming
executable, thereby countering sophisticated memory corruption attacks
such as return-oriented programming (ROP) and other code injection
strategies. This proactive security measure is crucial for maintaining
the integrity of the sandbox environment, ensuring that Syd continues to
offer robust protection against evolving exploit techniques.

As of version 3.23.9, the Enhanced Execution Control (EEC) feature has
been expanded to mitigate Sigreturn Oriented Programming (SROP) attacks
by denying access to the system calls _sigreturn_(2) and
_rt_sigreturn_(2) for _syd_(1), _syd-oci_(1), and _syd-tor_(1). Given
the lack of signal handlers, these system calls have no legitimate use.
By preventing these calls, the system is better protected against SROP
attacks, which involve manipulating signal handler frames to control
program state, thus significantly enhancing the security of the
execution environment. For further reading, refer to section 2.4.4
Sigreturn-oriented programming in the Low-Level Software Security book
(URL:
https://llsoftsec.github.io/llsoftsecbook/#sigreturn-oriented-programming
). SROP (Bosman and Bos 2014) is a special case of ROP where the
attacker creates a fake signal handler frame and calls _sigreturn_(2), a
system call on many UNIX-type systems normally called upon return from a
signal handler, which restores the state of the process based on the
state saved on the signal handler's stack by the kernel previously. The
ability to fake a signal handler frame and call sigreturn gives an
attacker a simple way to control the state of the program.

## Enhanced execve and execveat Syscall Validation

As of version 3.24.2, security enhancements to _execve_(2) and
_execveat_(2) syscalls have been introduced to thwart simple
Return-Oriented Programming (ROP) attacks. Per the Linux _execve_(2)
manpage: "On Linux, argv and envp can be specified as NULL. In both
cases, this has the same effect as specifying the argument as a pointer
to a list containing a single null pointer. _Do not take advantage of
this nonstandard and nonportable misfeature!_ On many other UNIX
systems, specifying argv as NULL will result in an error (*EFAULT*: "Bad
address"). Some other UNIX systems treat the envp==NULL case the same as
Linux." Based on this guidance, Syd now rejects _execve_(2) and
_execveat_(2) with *EFAULT* when one of the pathname, argv and envp
arguments is NULL.  This mitigation targets basic ROP chains where NULL
pointers are used as placeholders to bypass argument validation checks,
a common tactic in exploiting buffer overflow vulnerabilities. For
example, a typical ROP chain trying to execute _execve_(2) with argv and
envp set to NULL would be intercepted and denied under these rules:

```
0x0000:         0x40ee2b pop rdx; ret
0x0008:              0x0 [arg2] rdx = 0
0x0010:         0x402885 pop rsi; ret
0x0018:              0x0 [arg1] rsi = 0
0x0020:         0x4013cc pop rdi; ret
0x0028:         0x460000 [arg0] rdi = 4587520
0x0030:         0x438780 execve
```

An attacker might circumvent this mitigation by ensuring that none of
the critical syscall arguments are NULL. This requires a more
sophisticated setup in the ROP chain, potentially increasing the
complexity of the exploit and reducing the number of vulnerable targets.
This focused security measure enhances system resilience against simple
ROP exploits while maintaining compliance with POSIX standards,
promoting robustness and cross-platform security.

As of version 3.25.0, Syd terminates the process upon entering these
system calls with NULL arguments rather than denying them with *EFAULT*.
This ensures the system administrator gets a notification via kernel
audit log, ie. _dmesg_(1), about potentially malicious activity. In
addition, repeated failures are going to trigger SegvGuard.

We have verified the same issue is also present on HardenedBSD and
notified upstream:
- Issue: https://git.hardenedbsd.org/hardenedbsd/HardenedBSD/-/issues/106
- Fix: https://git.hardenedbsd.org/hardenedbsd/HardenedBSD/-/commit/cd93be7afbcfd134b45b52961fc9c6907984c85f

## Securebits and Kernel-Assisted Executability

As of version 3.41.0, Syd initializes the per-thread securebits in a
kernel-cooperative manner: on Linux 6.14 and newer, which provide the
executability-check interface (_execveat_(2) with *AT_EXECVE_CHECK*) and the
corresponding interpreter self-restriction securebits, Syd first attempts to
install a comprehensive securebits configuration (with locks) that hardens
capability semantics and execution constraints; if the kernel refuses changes
due to privilege (e.g., *CAP_SETPCAP* not present) and returns *EPERM*
("Operation not permitted"), Syd deterministically degrades to the
unprivileged, interpreter-facing policy only, thereby enabling and
locking a file-descriptor-based executability check and prohibiting
interactive snippet execution unless the same kernel probe passes, while
on older kernels the secure-exec policy setup is treated as a no-op and
startup proceeds without altering executability behavior; this
initialization is inherited across forks and execs (with the kernel rule
that the _keep capabilities_ base flag is cleared on exec), is
orthogonal to the _no_new_privs_ attribute, and is designed to be
monotonic and predictable under mixed-privilege and mixed-kernel
deployments: unsupported features are ignored, permission failures do
not abort startup, and the resulting state is the strongest policy the
kernel will accept; Users may opt out of these defaults per deployment
by setting _trace/allow_unsafe_exec_script:1_ to skip the script/file
vetting policy, _trace/allow_unsafe_exec_interactive:1_ to allow
interactive interpreter inputs again, _trace/allow_unsafe_exec_null:1_
to permit legacy exec with NULL argv/envp as described in the previous
subsection, or _trace/allow_unsafe_cap_fixup:1_ to preserve traditional
UID/capability-fixup semantics. Refer to the following links for more
information:

- https://docs.kernel.org/userspace-api/check_exec.html
- https://man7.org/linux/man-pages/man2/execveat.2.html
- https://man7.org/linux/man-pages/man7/capabilities.7.html
- https://man7.org/linux/man-pages/man2/prctl.2.html
- https://man7.org/linux/man-pages/man2/pr_set_securebits.2const.html
- https://www.man7.org/linux/man-pages/man2/PR_SET_KEEPCAPS.2const.html

## Enhanced Path Integrity Measures

As of version 3.17.4, Syd incorporates crucial enhancements to maintain
the integrity of file system paths by systematically denying and masking
paths that contain control characters. These modifications are essential
for preventing the exploitation of terminal-based vulnerabilities and
for maintaining robustness in logging activities. Paths identified with
control characters are not only denied during sandbox access check but
are also sanitized when logged to ensure that potentially harmful data
does not compromise log integrity or facilitate inadvertent security
breaches. Such measures underscore Syd's ongoing commitment to
fortifying security by adhering to rigorous, up-to-date standards for
handling untrusted input efficiently.

As of version 3.18.6, this restriction can be relaxed by using the
setting _trace/allow_unsafe_filename:1_. This setting may be toggled
from within the sandbox during runtime prior to locking the sandbox.

As of version 3.28.0, Syd has enhanced its path integrity measures by
incorporating an implementation based on David A. Wheeler's Safename
Linux Security Module (LSM) patches. This update not only prevents the
creation of filenames containing potentially harmful characters but also
hides existing files with such names. Invalid filenames are now denied
with an *EILSEQ* ("Illegal byte sequence") _errno_(3) when necessary. In
alignment with Wheeler's recommendations on restricting dangerous
filenames, the validation now enforces stricter rules:

- *Control Characters*: Filenames containing control characters (bytes 0x00–0x1F and 0x7F) are denied.
- *UTF-8 Encoding*: Filenames must be valid UTF-8 sequences.
- *Forbidden Characters*: The following characters are disallowed in
  filenames as they may interfere with shell operations or be
  misinterpreted by programs: \*, ?, [, ], ", <, >, |, (, ), &, ', !, \\, ;, $, and `.
- *Leading Characters*: Filenames cannot start with a space ( ), dash (-), or tilde (~).
- *Trailing Characters*: Filenames cannot end with a space ( ).

As of version 3.37.9, space checks have been extended to cover UTF-8
whitespace, thanks to an idea by Jacob Bachmeyer, see
https://seclists.org/oss-sec/2025/q3/123 for more information.

As of version 3.38.0, the characters :, {, and } have been removed from
the forbidden set to improve usability and reduce false positives. : is
used commonly across /dev and /proc. {} are used by _firefox_(1) for
filenames under the profile directory.

As of version 3.48.0, deny _errno_(3) has been changed from *EINVAL*
("Invalid argument") to *EILSEQ* ("Illegal byte sequence") to match ZFS
behaviour.

These measures mitigate security risks associated with malicious
filenames by ensuring that both new and existing filenames adhere to
stringent validation rules. This enhancement strengthens overall system
robustness by preventing potential exploitation through untrusted input
in file operations. For more information, refer to the following links:

- https://dwheeler.com/essays/fixing-unix-linux-filenames.html
- https://lwn.net/Articles/686021/
- https://lwn.net/Articles/686789/
- https://lwn.net/Articles/686792/

## Device Sidechannel Mitigations

As of Syd version 3.21.0, Syd's device sidechannel mitigations align
closely with *GRKERNSEC_DEVICE_SIDECHANNEL* in Grsecurity, aiming to
prevent timing analyses on block or character devices via _stat_(2) or
_inotify_(7)/_fanotify_(7). For stat-family system calls, Syd, like
Grsecurity, matches the last access and modification times to the
creation time for devices, thwarting unprivileged user timing attacks.
Instead of dropping events, Syd strips access and modify
_fanotify_(7)/_inotify_(7) flags at syscall entry, preventing unsafe
_fanotify_(7)/_inotify_(7) event generation. This approach ensures
unauthorized users cannot determine sensitive information, such as the
length of the administrator password. Syd's solution offers robust
security by dynamically stripping flags, enhancing protection against
these sidechannel attacks without compromising functionality. As of
version 3.40.0, these mitigations can be disabled using the options
_trace/allow_unsafe_stat_bdev_, _trace/allow_unsafe_stat_cdev_,
_trace/allow_unsafe_notify_bdev_, _trace/allow_unsafe_notify_cdev_
respectively. Refer to the following links for more information:

- https://web.archive.org/web/20130111093624/http://vladz.devzero.fr/013_ptmx-timing.php
- https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Eliminate_stat/notify-based_device_sidechannels

## Restricting CPU Emulation System Calls

As of version 3.22.1, Syd denies the _modify_ldt_(2), _subpage_prot_(2),
_switch_endian_(2), _vm86_(2), and _vm86old_(2) system calls by default,
which are associated with CPU emulation functionalities. These calls can
only be allowed if the _trace/allow_unsafe_cpu_ option is explicitly
set. This restriction helps mitigate potential vulnerabilities and
unauthorized access that can arise from modifying CPU state or memory
protections, thus strengthening the overall security posture of the
sandbox environment.

## Kernel Keyring Access Restriction

To enhance system security, access to the kernel's key management
facility via the _add_key_(2), _keyctl_(2), and _request_key_(2) system
calls is restricted by default as of version 3.22.1. These calls are
crucial for managing keys within the kernel, enabling operations such as
adding keys, manipulating keyrings, and requesting keys. The restriction
aims to prevent unauthorized or potentially harmful modifications to
keyrings, ensuring that only safe, controlled access is permitted.
However, administrators can relax this restriction by enabling the
"trace/allow_unsafe_keyring" option, allowing these system calls to be
executed when necessary for legitimate purposes.

Because of this restriction, Syd is not affected by CVE-2024-42318
although we use Landlock. See here for more information:
https://www.openwall.com/lists/oss-security/2024/08/17/2

## Restricting Memory Protection Keys System Calls

As of version 3.22.1, Syd denies the system calls _pkey_alloc_(2),
_pkey_free_(2), and _pkey_mprotect_(2) by default. These system calls
are associated with managing memory protection keys, a feature that can
be leveraged to control memory access permissions dynamically. To allow
these system calls, administrators can enable the
_trace/allow_unsafe_pkey_ option. This restriction enhances security by
preventing unauthorized or potentially harmful manipulations of memory
access permissions within the sandbox environment, ensuring stricter
control over memory protection mechanisms.

## Restricting vmsplice System Call

As of version 3.23.5, Syd disables the _vmsplice_(2) system call by
default to enhance security. This syscall, identified as a potential
vector for memory corruption and privilege escalation, poses significant
risks in sandboxed environments. By default, disabling _vmsplice_(2)
reduces the attack surface, aligning with security practices in other
systems like Podman. Refer to the following links for more information:

- https://lore.kernel.org/linux-mm/X+PoXCizo392PBX7@redhat.com/
- https://lwn.net/Articles/268783/

As of version 3.41.3, _vmsplice_(2) call may be permitted at startup
using the _trace/allow_unsafe_vmsplice:1_ option.

## Enforcing Position-Independent Executables (PIE)

As of version 3.23.9, Syd mandates that all executables must be
Position-Independent Executables (PIE) to leverage Address Space Layout
Randomization (ASLR). PIE allows executables to be loaded at random
memory addresses, significantly enhancing security by making it more
difficult for attackers to predict the location of executable code. This
randomization thwarts various types of exploits, such as buffer overflow
attacks, which rely on predictable memory addresses to execute malicious
code. To accommodate scenarios where PIE is not feasible, users can
relax this restriction using the _trace/allow_unsafe_exec_nopie:1_
option. This ensures compatibility while maintaining a robust security
posture by default, aligning with Syd's overarching strategy of
employing advanced security measures to mitigate potential attack
vectors.

## Enforcing Non-Executable Stack

As of version 3.23.16, Syd mandates that all executables must have a
non-executable stack to enhance security. A non-executable stack helps
to prevent various types of exploits, such as stack-based buffer
overflow attacks, by making it more difficult for attackers to execute
malicious code from the stack. This security measure is similar to the
enforcement of Position-Independent Executables (PIE) and is a crucial
part of Syd's comprehensive security strategy. To accommodate scenarios
where a non-executable stack is not feasible, administrators can relax
this restriction using the _trace/allow_unsafe_exec_stack:1_ option.
This ensures compatibility while maintaining a robust security posture
by default, aligning with Syd's overarching strategy of employing
advanced security measures to mitigate potential attack vectors.

As of version 3.23.19, Syd enforces this restriction at _mmap_(2) boundary
as well so it is no longer possible to _dlopen_(3) a library with executable
stack to change the stack permissions of the process to executable. This
is useful in mitigating attacks such as CVE-2023-38408. Refer to the URL
https://www.qualys.com/2023/07/19/cve-2023-38408/rce-openssh-forwarded-ssh-agent.txt
for more information. As of version 3.25.0, Syd kills the process in
this case rather than denying the system call to be consistent with
other memory related seccomp filters. This ensures the system
administrator gets a notification via the audit log, and has a higher
chance to react soon to investigate potentially malicious activity. In
addition, repeated failures are going to trigger SegvGuard.

## Mitigation against Page Cache Attacks

As of version 3.25.0, Syd denies the _mincore_(2) system call by default,
which is typically not needed during normal run and has been successfully
(ab)used for page cache attacks: https://arxiv.org/pdf/1901.01161

To quote the *Countermeasures* section of the article:

Our side-channel attack targets the operating system page cache via
operating system interfaces and behavior. Hence, it clearly can be
mitigated by modifying the operating system implementation. *Privileged
Access.* The _QueryWorkingSetEx_ and _mincore_ system calls are the core
of our side-channel attack. Requiring a higher privilege level for these
system calls stops our attack. The downside of restricting access to
these system calls is that existing programs which currently make use of
these system calls might break. Hence, we analyzed how frequently
_mincore_ is called by any of the software running on a typical Linux
installation. We used the Linux _perf_ tools to measure over a 5 hour
period whenever the _sys_enter_mincore_ system call is called by any
application. During these 5 hours a user performed regular operations on
the system, i.e., running various work-related tools like Libre Ofﬁce,
gcc, Clion, Thunderbird, Firefox, Nautilus, and Evince, but also
non-work-related tools like Spotify. The system was also running regular
background tasks during this time frame. Surprisingly, the
_sys_enter_mincore_ system call was not called a single time. This
indicates that making the _mincore_ system call privileged is feasible
and would mitigate our attack at a very low implementation cost.

As of version 3.35.2, the new system call _cachestat_(2) is also denied
for the same reason as it is a scalable version of the _mincore_(2)
system call. Again, as of version 3.35.2, the option
_trace/allow_unsafe_page_cache_ has been added to relax this restriction
at startup. This may be needed to make direct rendering work with
Firefox family browsers.

## Enforcing AT_SECURE and UID/GID Verification

As of version 3.27.0, Syd enhances security by enforcing the *AT_SECURE*
flag in the auxiliary vector of executables at _ptrace_(2) boundary upon
receiving the *PTRACE_EVENT_EXEC* event to enforce secure-execution
mode. This event happens after the executable binary is loaded into
memory but before it starts executing. This enforcement ensures that
the C library operates in a secure mode, disabling unsafe behaviors like
loading untrusted dynamic libraries or accessing insecure environment
variables. Additionally, Syd performs strict UID and GID verification to
confirm that the process's user and group IDs match the expected values,
preventing unauthorized privilege escalation. If the verification fails
or the *AT_SECURE* flag cannot be set, Syd terminates the process to
prevent potential security breaches. This mitigation can be relaxed at
startup with the option _trace/allow_unsafe_exec_libc:1_, though
doing so is not recommended as it reduces the effectiveness of the
sandbox. Notably, secure-execution mode is enforced by _apparmor_(7) too
and it may also be enforced by other LSMs and eBPF. You may find some
implications of the secure-execution mode below. Refer to the _ld.so_(8)
and _getauxval_(3) manual pages for implications of secure-execution
mode on your system.

glibc dynamic linker strips/ignores dangerous LD_\* variables in
secure-execution mode, including *LD_LIBRARY_PATH*, *LD_PRELOAD* (only
standard dirs; paths with slashes ignored), *LD_AUDIT*, *LD_DEBUG*,
*LD_DEBUG_OUTPUT*, *LD_DYNAMIC_WEAK*, *LD_HWCAP_MASK*, *LD_ORIGIN_PATH*,
*LD_PROFILE*, *LD_SHOW_AUXV*, *LD_USE_LOAD_BIAS*, etc. glibc also treats some
non-LD_\* variables as unsafe in secure-execution mode: *GCONV_PATH*,
*GETCONF_DIR*, *HOSTALIASES*, *LOCALDOMAIN*, *LOCPATH*, *MALLOC_TRACE*,
*NIS_PATH*, *NLSPATH*, *RESOLV_HOST_CONF*, *RES_OPTIONS*, *TMPDIR*,
*TZDIR* (stripped/ignored). Refer to the _ld.so_(8) manual page for more
information. As of version 3.11.1, Syd also strips unsafe environment
variables before executing the sandbox process by default and this can
be disabled altogether with _trace/allow_unsafe_env:1_ or unsafe
environment variables can be selectively allowed using the _-e var=_
format, e.g. _-eLD_PRELOAD=_ See the *Restricting environment and
trace/allow_unsafe_env* section of this manual page for more
information.

glibc's *LD_PREFER_MAP_32BIT_EXEC* is always disabled in
secure-execution mode (mitigates ASLR-weakening). Historical bugs (e.g.,
CVE-2019-19126) fixed cases where this wasn't ignored after a security
transition. Refer to the _ld.so_(8) manual page and the following links
for more information:

- https://lists.gnu.org/archive/html/info-gnu/2020-02/msg00001.html
- https://alas.aws.amazon.com/ALAS-2021-1511.html

glibc *GLIBC_TUNABLES* environment variable handling under *AT_SECURE*:
tunables carry security levels (*SXID_ERASE*, *SXID_IGNORE*) so they're
ignored/erased for secure-execution mode; post-CVE-2023-4911 hardening
ensures secure-execution mode invocations with hostile GLIBC_TUNABLES
are blocked/terminated. Refer to the following links for more
information:

- https://lwn.net/Articles/947736/
- https://access.redhat.com/security/cve/cve-2023-4911
- https://nvd.nist.gov/vuln/detail/CVE-2023-4911

glibc _secure_getenv_(3) returns NULL when *AT_SECURE* is set; any glibc
subsystem that uses _secure_getenv_(3) (e.g., timezone, locale, iconv,
resolver paths) will ignore environment overrides in secure-execution
mode. Similarly calling _getauxval_(3) with the flag *AT_SECURE* returns
true in secure-execution mode.

musl libc honors *AT_SECURE* and likewise ignores preload/library/locale
environment knobs in secure-execution mode; examples include *LD_PRELOAD*,
*LD_LIBRARY_PATH*, and *MUSL_LOCPATH*. Refer to the following links for more
information:

- https://musl.libc.org/manual.html
- https://wiki.musl-libc.org/environment-variables

Because the Linux host kernel is not aware of Syd setting the
*AT_SECURE* bit, the _proc_pid_auxv_(5) file will report the bit as
unset. On the contrary, when verbose logging is turned on using the
_log/verbose:1_ option, Syd will correctly log this bit as set after
parsing the _proc_pid_auxv_(5) file of the sandbox process.

## Process Name Modification Restriction

As of version 3.28.0, Syd introduces a critical security enhancement
that logs and denies attempts to set a process's name using the
*PR_SET_NAME* _prctl_(2) request. This mitigation is essential as it
prevents malicious software from disguising itself under legitimate
process names such as _apache_ or other system daemons, thereby
thwarting attempts to evade detection and maintain stealth within the
system. By default, any invocation of *PR_SET_NAME* within the sandboxed
environment is intercepted; the action is logged for audit purposes if
verbose logging is on, and the system call is denied with success
return, essentially turning it into a no-op. If there is a legitimate
need to permit process name changes within the sandbox, this restriction
can be overridden by enabling the _trace/allow_unsafe_prctl:1_ option,
which allows *PR_SET_NAME* requests to succeed without logging.

## Mitigation against Sigreturn Oriented Programming (SROP)

As of version 3.30.0, Syd employs a robust, multi-layered mitigation
strategy against Sigreturn Oriented Programming (SROP), a sophisticated
exploit technique that manipulates the state restoration behavior of the
_sigreturn_(2) system call to hijack process execution. This approach
addresses SROP's ability to bypass critical memory protections such as
ASLR, NX, and partial RELRO by setting up a fake stack frame to redirect
control flow upon signal return. Inspired by Erik Bosman's proposal in
May 2014 (LKML PATCH 3/4), Syd incorporates a signal counting mechanism
to track the number of signals delivered to a thread group, ensuring
that each _sigreturn_(2) invocation corresponds to an actual,
in-progress signal handler. A stray _sigreturn_(2) call violating this
rule causes the process to be terminated with the signal *SIGKILL*. This
method provides more precise protection than _sigreturn_(2) frame
canaries, which are susceptible to circumvention under certain
conditions and significantly enhances the integrity of sandboxed
environments, effectively blocking a critical class of attacks.
Administrators can disable these mitigations via the
_trace/allow_unsafe_sigreturn:1_ option, though doing so exposes systems
to exploitation and undermines security. For more information, refer to
the following links:

- http://www.cs.vu.nl/~herbertb/papers/srop_sp14.pdf
- https://web.archive.org/web/20221002135950/https://lkml.org/lkml/2014/5/15/660
- https://web.archive.org/web/20221002123657/https://lkml.org/lkml/2014/5/15/661
- https://web.archive.org/web/20221002130349/https://lkml.org/lkml/2014/5/15/657
- https://web.archive.org/web/20221002135459/https://lkml.org/lkml/2014/5/15/858
- https://lwn.net/Articles/674861
- https://lore.kernel.org/all/1454801964-50385-1-git-send-email-sbauer@eng.utah.edu/
- https://lore.kernel.org/all/1454801964-50385-2-git-send-email-sbauer@eng.utah.edu/
- https://lore.kernel.org/all/1454801964-50385-3-git-send-email-sbauer@eng.utah.edu/
- https://marc.info/?l=openbsd-tech&m=146281531025185
- https://isopenbsdsecu.re/mitigations/srop/

## Speculative Execution Mitigation

As of version 3.30.0, Syd integrates a robust mitigation mechanism
leveraging the _prctl_(2) system call to enforce speculative execution
controls to fortify the sandbox against advanced speculative execution
vulnerabilities, such as *Spectre* and related side-channel attacks.
Upon initialization, Syd attempts to apply the *PR_SPEC_FORCE_DISABLE*
setting for critical speculative execution features -- namely
*PR_SPEC_STORE_BYPASS*, *PR_SPEC_INDIRECT_BRANCH*, and
*PR_SPEC_L1D_FLUSH* -- thereby irrevocably disabling these CPU-level
misfeatures when permissible. This proactive stance ensures that, where
supported by the underlying kernel and hardware, speculative execution
is constrained to eliminate potential avenues for data leakage and
privilege escalation across privilege domains. The mitigation is
conditionally enforced based on the availability of per-task control via
_prctl_(2), and any inability to apply these settings due to
architectural constraints or insufficient permissions results in logged
informational messages without disrupting sandbox operations.
Furthermore, administrators retain the capability to override this
stringent security posture through the
_trace/allow_unsafe_exec_speculative:1_ configuration option, permitting
flexibility in environments where speculative execution controls may
need to be relaxed for compatibility or performance reasons. This dual
approach balances rigorous security enforcement with operational
adaptability, ensuring that Syd maintains a hardened execution
environment while providing mechanisms for controlled exceptions. By
systematically disabling speculative execution vulnerabilities at the
kernel interface level, Syd significantly mitigates the risk of
sophisticated side-channel exploits, thereby enhancing the overall
integrity and confidentiality of sandboxed applications. Refer to the
links below for more information:

- https://docs.kernel.org/admin-guide/hw-vuln/spectre.html
- https://docs.kernel.org/userspace-api/spec_ctrl.html

As of version 3.35.2, Syd disables Speculative Store Bypass mitigations
for _seccomp_(2) filters when _trace/allow_unsafe_exec_speculative:1_ is
set at startup.

## Cryptographically Randomized Sysinfo

Since Syd 3.28.0, the _sysinfo_(2) system call has been
cryptographically obfuscated by applying high-entropy offsets to memory
fields (e.g., total RAM, free RAM) and constraining them to plausible
power-of-two boundaries, frustrating trivial attempts at system
fingerprinting. Specifically, uptime and idle counters each incorporate
a distinct offset up to 0xFF_FFFF (~194 days) unless _unshare/time:1_
when time starts from zero, while load averages are randomized in
fixed-point format and clamped to realistic upper limits.
Administrators seeking genuine system metrics may disable these
transformations via _trace/allow_unsafe_sysinfo:1_, albeit at the cost
of enabling straightforward correlation and potential data leakage.

## Memory Sealing of Sandbox Policy Regions on Lock

Beginning with version 3.33.1, Syd applies Linux's _mseal_(2) syscall to
enforce immutability of policy-critical memory regions at the moment the
sandbox is locked with _lock:on_. At this point, all mutable structures
influencing access control -- such as ACLs, action filters, and syscall
mediation rules -- are sealed at the virtual memory level. Unlike
traditional permission schemes (e.g., W^X or _mprotect_(2)), _mseal_(2)
protects against structural manipulation of memory mappings themselves,
preventing _mmap_(2), _mremap_(2), _mprotect_(2), _munmap_(2), and
destructive _madvise_(2) operations from altering sealed VMAs. This
eliminates attacker primitives that rely on reclaiming, remapping, or
changing permissions on enforcement data, thereby closing off advanced
data-oriented exploitation paths such as policy subversion through
remapped ACLs or revocation of constraints via memory permission resets.
Syd permits legitimate late-stage policy configuration during startup
and defers sealing until _lock:on_ is called, after which mutation of
enforcement state is structurally frozen. The process is one-way and
idempotent; sealed memory cannot be unsealed, ensuring strong guarantees
once lockdown is complete. For diagnostic or non-hardened environments,
this mechanism may be disabled explicitly via the startup toggle
_trace/allow_unsafe_nomseal:1_, which should only be used with full
awareness of the resulting relaxation in protection. When enabled,
sealing substantially raises the integrity threshold of the sandbox,
ensuring that post-lock policy enforcement is immune to both direct and
indirect memory-level tampering.

## Force Close-on-Exec File Descriptors

The _trace/force_cloexec_ option, introduced in Syd version 3.35.2,
ensures that all _creat_(2), _open_(2), _openat_(2), _openat2_(2),
_memfd_create_(2), _socket_(2), _accept_(2), and _accept4_(2) system
calls made by the sandbox process include the *O_CLOEXEC* flag. This
feature can be toggled at runtime via Syd's virtual stat API, enabling
dynamic adjustment of confinement levels as needed. The *O_CLOEXEC*
flag, when set on file descriptors, ensures they are automatically
closed when executing a new program via _execve_(2) or similar system
calls. This automatic closure of file descriptors is critical for
enhancing security and safety, as it prevents file descriptors from
being unintentionally inherited by newly executed programs, which could
otherwise lead to unauthorized access to sensitive files or resources.
By enforcing the *O_CLOEXEC* flag across all _open_(2) calls, Syd
mitigates the risk of file descriptor leakage, effectively isolating the
sandboxed environment and ensuring a clean execution context for newly
spawned processes.

## Force Randomized File Descriptors

The _trace/force_rand_fd_ option, introduced in Syd version 3.35.2,
ensures that all _creat_(2), _open_(2), _openat_(2), _openat2_(2),
_memfd_create_(2), _socket_(2), _accept_(2), and _accept4_(2) system
calls made by the sandbox process allocate file descriptors at random
available slots rather than the lowest-numbered one. When this feature
is enabled, Syd specifies a random available slot (rather than the
lowest-numbered one) to the *SECCOMP_IOCTL_NOTIF_ADDFD* operation which
is used to install a file descriptor to the sandbox process.
Randomizing file descriptor numbers makes it significantly harder for an
attacker to predict or deliberately reuse critical descriptors, thereby
raising the bar against file-descriptor reuse and collision attacks.
Note that enabling this may break programs which rely on the POSIX
guarantee that _open_(2) returns the lowest available descriptor. This
behavior can be toggled at runtime via Syd's virtual stat API, allowing
operators to enable or disable descriptor randomization without
restarting or recompiling the sandboxed process. We're also cooperating
with the HardenedBSD project to implement a similar feature in the BSD
kernel. Refer to the following link for more information:
https://git.hardenedbsd.org/hardenedbsd/HardenedBSD/-/issues/117

## Syscall Argument Cookies

To further harden the _seccomp_(2) boundary, as of version 3.35.2 Syd
embeds cryptographically-strong, per-instance "cookies" into unused
architecture-defined syscall argument slots (e.g., the 5th and 6th
arguments of _openat2_(2)). These cookies are generated at startup via
the OS random number generator using _getrandom_(2), and are checked
in the BPF filter so that only calls bearing the correct 32- or 64-bit
values will be allowed. By requiring this unpredictable token, Syd
raises the bar against arbitrary or forged syscalls: Attackers must
first discover or leak the randomized cookies despite Address Space
Layout Randomization (ASLR) before mounting a successful path or network
operation. This approach effectively transforms unused syscall
parameters into an application-level authorization mechanism, preventing
trivial reuse of legitimate code paths and mitigating
time-of-check-to-time-of-use (TOCTTOU) and ROP payloads that rely on
guessing or omitting optional arguments. In combination with absolute
path enforcement and the denial of relative descriptors (e.g. AT_FDCWD),
syscall argument cookies form a lightweight, zero-cost integrity check
that elevates syscall hardening without kernel modifications or
performance penalties. As an example, here is how the filters look in
pseudo filter code for the system calls _openat2_(2) and _socket_(2) on
x86-64. _openat2_(2) uses two unused arguments as cookies and
_socket_(2) uses three. In addition, _openat2_(2) denies negative file
descriptor arguments such as *AT_FDCWD*:

```
# filter for syscall "openat2" (437) [priority: 65528]
if ($syscall == 437)
	if ($a0.hi32 > 0)
	else
		if ($a0.hi32 == 0)
			if ($a0.lo32 > 2147483647)
			else
				if ($a4.hi32 == 2047080271)
					if ($a4.lo32 == 419766579)
						if ($a5.hi32 == 2863373132)
							if ($a5.lo32 == 396738706)
								action ALLOW;
		else
			if ($a4.hi32 == 2047080271)
				if ($a4.lo32 == 419766579)
					if ($a5.hi32 == 2863373132)
						if ($a5.lo32 == 396738706)
							action ALLOW;

# filter for syscall "socket" (41) [priority: 65529]
if ($syscall == 41)
	if ($a3.hi32 == 3378530982)
		if ($a3.lo32 == 4160747949)
			if ($a4.hi32 == 2899982880)
				if ($a4.lo32 == 990920938)
					if ($a5.hi32 == 3611760485)
						if ($a5.lo32 == 1163305215)
							action ALLOW;
```

Another example is how the critical _seccomp_(2) notify _ioctl_(2)
requests *SECCOMP_IOCTL_NOTIF_SEND* and *SECCOMP_IOCTL_NOTIF_ADDFD* are
confined for the Syd emulator threads. *SECCOMP_IOCTL_NOTIF_SEND* is
critical because it allows pass-through of system calls to the host
Linux kernel with the *SECCOMP_USER_NOTIF_FLAG_CONTINUE* flag in the
_seccomp_(2) response data structure. This flag must be used with utmost
care and in the hands of an attacker it can be a tool for further
exploitation. *SECCOMP_IOCTL_NOTIF_ADDFD* is critical because it allows
file descriptor transfer between the Syd process and the sandbox process
and in the hands of an attacker it can be a tool for file descriptor
stealing. As part of this mitigation three syscall cookies are enforced
for _ioctl_(2) system calls with the *SECCOMP_IOCTL_NOTIF_SEND* and
*SECCOMP_IOCTL_NOTIF_ADDFD* requests. Coupled with the startup
randomization of the _seccomp_(2) notify file descriptor, this
mitigation raises the bar for an attacker trying to call arbitrary or
forged syscalls within a compromised Syd emulator thread. Excerpt from
the seccomp filter in pseudo filter code is given below:

```
# Syd monitor rules with seccomp fd 626
#
# pseudo filter code start
#
# filter for arch x86_64 (3221225534)
...
# filter for syscall "ioctl" (16) [priority: 65497]
if ($syscall == 16)
	if ($a0.hi32 == 0)
		if ($a0.lo32 == 626)
			if ($a1.hi32 == 4294967295)
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_RECV)
					action ALLOW;
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_SEND)
					if ($a3.hi32 == 4195042482)
						if ($a3.lo32 == 329284685)
							if ($a4.hi32 == 3163914537)
								if ($a4.lo32 == 2000745976)
									if ($a5.hi32 == 3932715328)
										if ($a5.lo32 == 2409429749)
											action ALLOW;
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_ADDFD)
					if ($a3.hi32 == 2387882717)
						if ($a3.lo32 == 529632567)
							if ($a4.hi32 == 2017338540)
								if ($a4.lo32 == 3732042218)
									if ($a5.hi32 == 4202049614)
										if ($a5.lo32 == 546113052)
											action ALLOW;
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_SET_FLAGS)
					action ALLOW;
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_ID_VALID)
					action ALLOW;
			if ($a1.hi32 == 0)
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_RECV)
					action ALLOW;
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_SEND)
					if ($a3.hi32 == 4195042482)
						if ($a3.lo32 == 329284685)
							if ($a4.hi32 == 3163914537)
								if ($a4.lo32 == 2000745976)
									if ($a5.hi32 == 3932715328)
										if ($a5.lo32 == 2409429749)
											action ALLOW;
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_ADDFD)
					if ($a3.hi32 == 2387882717)
						if ($a3.lo32 == 529632567)
							if ($a4.hi32 == 2017338540)
								if ($a4.lo32 == 3732042218)
									if ($a5.hi32 == 4202049614)
										if ($a5.lo32 == 546113052)
											action ALLOW;
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_SET_FLAGS)
					action ALLOW;
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_ID_VALID)
					action ALLOW;
...
	# default action
	action KILL_PROCESS;
# invalid architecture action
action KILL_PROCESS;
```

List of system calls protected by cookies is given below. The list may
be further extended in the future to cover more system calls used by
Syd:

- _ioctl_(2)
    - *PROCMAP_QUERY*
    - *SECCOMP_IOCTL_NOTIF_SEND*
    - *SECCOMP_IOCTL_NOTIF_ADDFD*
- _linkat_(2), _renameat2_(2), _unlinkat_(2)
- _memfd_create_(2)
- _openat2_(2)
- _pipe2_(2)
- _socket_(2), _bind_(2), _connect_(2), _accept4_(2) (*64-bit only*)
- _truncate_(2), _truncate64_(2), _ftruncate_(2)
- _uname_(2)
- _fchdir_(2), _umask_(2)

As of version 3.36.0, this mitigation may be disabled at startup using
the _trace/allow_unsafe_nocookie:1_ option.

As of version 3.48.2, all cookies reside in a single contiguous memory
region hardened with guard pages, read-only protection, and _mseal_(2)
where available. This consolidation reduces entropy consumption to one
_getrandom_(2) call and eliminates per-cookie allocation overhead while
preserving the cryptographic unpredictability of each token.

## Shared Memory Hardening

As of version 3.48.0, Syd denies access to _sysvipc_(7) and
_mq_overview_(7) system calls by default to enforce a strict
shared-nothing architecture. This hardening eliminates an entire class
of inter-process communication (IPC) vulnerabilities, including "memory
squatting" attacks where malicious actors preemptively allocate shared
memory keys to hijack or disrupt legitimate applications, as detailed in
the research by Portcullis. By blocking the creation and usage of System
V shared memory, semaphores, message queues, and POSIX message queues,
Syd closes complex kernel attack surfaces that have historically
harbored privilege escalation and information leakage bugs. This strict
isolation aligns with modern container security best practices, ensuring
that sandboxed processes cannot interfere with the host or other
containers via shared global namespaces. If legacy application
compatibility is required, these subsystems can be selectively
re-enabled using the _trace/allow_unsafe_shm:1_ and
_trace/allow_unsafe_msgqueue:1_ options, partially exposing the sandbox
to the aforementioned risks. Refer to the following links for more
information:

- https://man7.org/linux/man-pages/man7/sysvipc.7.html
- https://man7.org/linux/man-pages/man7/mq_overview.7.html
- https://labs.portcullis.co.uk/whitepapers/memory-squatting-attacks-on-system-v-shared-memory/
- https://labs.portcullis.co.uk/presentations/i-miss-lsd/
- https://www.cve.org/CVERecord?id=CVE-2013-0254

## Shared Memory Permissions Hardening

As of version 3.37.0, Syd introduces a kernel-enforced mitigation
against System V shared memory squatting by conditioning allow rules on
strict permission masks. By inspecting the mode bits passed to
_shmget_(2), _msgget_(2), _semget_(2) and _mq_open_(2) system calls, the
sandbox admits creates only when user-, group-, and other-permission
fields exclude unsafe write or execute flags (i.e., no bits set in mask
0o177). This measure prevents untrusted processes from elevating
permissions after creation or exploiting legacy IPC segments with
permissive ACLs, which could lead to disclosure or corruption of shared
pages. Based on the attack taxonomy described in *Memory Squatting:
Attacks on System V Shared Memory* (Portcullis, 2013), mode checks take
place within the _seccomp_(2) BPF filter before any mapping. The
*IPC_SET* operations of the _shmctl_(2), _msgctl_(2), and _semctl_(2)
system calls are also denied, preventing permission changes after
creation. Additionally, any attempt to attach a shared memory segment
with the *SHM_EXEC* flag via _shmat_(2) is denied to enforce W^X
policies, blocking executable mappings through shared memory. The
_seccomp_(2) filter also blocks the *MSG_STAT_ANY*, *SEM_STAT_ANY*, and
*SHM_STAT_ANY* operations (Linux 4.17+), which would otherwise return
segment metadata without verifying its mode, mitigating unintended
information leaks. This mitigation is applied in the parent _seccomp_(2)
filter, ensuring that the Syd process itself is subject to these
restrictions. Administrators may relax this policy at startup using the
_trace/allow_unsafe_perm_msgqueue:1_ and _trace/allow_unsafe_perm_shm:1_
options, but doing so reintroduces the classic squatting vulnerabilities
documented in CVE-2013-0254 and related research. For more information
refer to the following links:

- https://labs.portcullis.co.uk/whitepapers/memory-squatting-attacks-on-system-v-shared-memory/
- https://labs.portcullis.co.uk/presentations/i-miss-lsd/
- https://www.cve.org/CVERecord?id=CVE-2013-0254

## Mitigation Against Heap Spraying

As of version 3.23.18, Syd introduces a critical security enhancement to
mitigate kernel heap-spraying attacks by restricting the _msgsnd_(2)
system call. This call, integral to System V message queues, is
essential for inter-process communication (IPC) in Unix-like operating
systems. System V message queues allow processes to send and receive
messages asynchronously, facilitating robust communication between
processes. However, it is also frequently exploited for heap spraying, a
technique that increases the predictability of memory allocations to
facilitate arbitrary code execution. Notably, exploits such as
CVE-2016-6187, CVE-2021-22555, and CVE-2021-26708 have leveraged this
system call for kernel heap-spraying to achieve privilege escalation and
kernel code execution. Heap spraying aims to introduce a high degree of
predictability to heap allocations, facilitating arbitrary code
execution by placing specific byte sequences at predictable memory
locations. This method is particularly dangerous because it increases
the reliability of exploiting vulnerabilities by aligning memory in a
way that malicious code execution becomes feasible. To counter this,
Syd now disables the _msgsnd_(2) system call by default, which is
commonly used for heap spraying due to its ability to allocate large,
contiguous blocks of memory in the kernel heap. This preemptive measure
significantly reduces the attack surface, preventing attackers from
leveraging this system call to bypass security mitigations and achieve
kernel code execution. Administrators can re-enable this call using the
_trace/allow_unsafe_shm:1_ option if required for legitimate
inter-process communication needs, ensuring that the default
configuration prioritizes security against such advanced exploitation
techniques. Refer to the following links for more information:

- https://en.wikipedia.org/wiki/Heap_spraying
- https://grsecurity.net/how_autoslab_changes_the_memory_unsafety_game
- https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
- https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html
- https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html

## Denying Restartable Sequences

As of version 3.37.0, Syd denies access to the restartable sequences
with the _rseq_(2) system call by default, substantially elevating the
security baseline of the sandbox. The restartable sequences interface
enables user space to register per-thread critical regions with
kernel-enforced atomicity guarantees, but critically, also exposes a
user-controlled abort handler address. In adversarial scenarios, this
facility can be abused: attackers with the ability to manipulate process
memory or _rseq_(2) registration can redirect execution to arbitrary,
attacker-chosen code locations on preemption or CPU migration, bypassing
intra-process isolation boundaries and subverting mechanisms such as
memory protection keys or control-flow integrity. By prohibiting
_rseq_(2), Syd eliminates this kernel-facilitated control-flow transfer
primitive, foreclosing a sophisticated class of attacks that leverage
restartable sequence state for privilege escalation, sandbox escape, or
bypass of compartmentalization. This mitigation exemplifies a
least-privilege syscall surface and strong adherence to modern threat
models, allowing only strictly necessary system calls and neutralizing
emergent attack vectors rooted in nuanced kernel-user collaboration.
Administrators may explicitly re-enable this system call if required for
compatibility using the _trace/allow_unsafe_rseq:1_ startup option, with
the understanding that doing so weakens this critical security boundary.
For more information, refer to the following links:

- https://arxiv.org/abs/2108.03705
- https://arxiv.org/abs/2406.07429
- https://www.usenix.org/system/files/usenixsecurity24-yang-fangfei.pdf

## Personality Syscall Restrictions

As of version 3.37.0, Syd implements comprehensive restrictions on the
_personality_(2) system call to mitigate security vulnerabilities
associated with unsafe _personality_(2) flags, particularly the
*ADDR_NO_RANDOMIZE* flag which can disable Address Space Layout
Randomization (ASLR) -- a fundamental memory protection mechanism that
prevents reliable exploitation of memory corruption vulnerabilities by
randomizing memory layout or the *READ_IMPLIES_EXEC* flag which can
bypass memory protections provided by Memory-Deny-Write-Execute, aka
W^X. This security enhancement aligns Syd with industry-standard
container runtimes including Docker and Podman, which employ identical
restrictions to balance security with application compatibility by
maintaining an allowlist of safe personality values: *PER_LINUX* for
standard Linux execution domain, *PER_LINUX32* for 32-bit compatibility,
*UNAME26* for legacy kernel version reporting, *PER_LINUX32|UNAME26* for
combined 32-bit and legacy compatibility, and *GET_PERSONALITY* for
querying current _personality_(2) without modification. The
implementation follows the principle of least privilege by denying all
potentially dangerous _personality_(2) modifications while permitting
only essential compatibility requirements, thereby preventing malicious
actors from leveraging _personality_(2) flags to make exploits more
predictable and reliable -- a behavior specifically monitored by
security detection systems. Administrators requiring unrestricted
personality system call access can disable these restrictions using
_trace/allow_unsafe_personality:1_, though this should be undertaken
with careful consideration of the security implications as it
potentially exposes the sandbox to personality-based security bypasses
that could compromise the isolation guarantees provided by Syd's broader
security hardening strategy encompassing comprehensive system call
filtering, capability restrictions, and resource access controls.

As of version 3.47.0, Syd extends these protections by adding
*ADDR_COMPAT_LAYOUT* -- which forces a legacy, more predictable memory
layout -- and *MMAP_PAGE_ZERO* -- which allows mapping page 0 and can
turn NULL-pointer dereferences into code execution -- to the
_personality_(2) "kill list", so that any attempt within the sandbox to
enable *READ_IMPLIES_EXEC*, *ADDR_NO_RANDOMIZE*, *ADDR_COMPAT_LAYOUT*,
or *MMAP_PAGE_ZERO* results in immediate termination of the offending
process. During sandbox setup, Syd also proactively clears all four of
these flags from the inherited _personality_(2) so that untrusted
workloads always start with ASLR-friendly layouts and without the
ability to rely on legacy low-entropy address layouts or exploit
NULL-pointer mappings.

## Thread-Level Filesystem and File-Descriptor Namespace Isolation

As of version 3.37.2, Syd's interrupt, IPC and emulator worker threads
are each placed into their own filesystem and file-descriptor namespace
by _unshare_(2)'ing both *CLONE_FS* and *CLONE_FILES*. This per-thread
isolation ensures that working directory, _umask_(2) and open-file table
changes in one thread cannot leak into -- or be influenced by -- any
other, closing subtle attack vectors such as TOCTOU races on shared
_procfs_(5) or fd entries, descriptor reuse across threads, and
cwd-based side channels. By scoping thread-local filesystem state and
descriptor tables, this enhancement hardens Syd's sandbox manager
against advanced multithreading exploits and preserves strict separation
between the monitoring and emulation components.

## Denying MSG_OOB Flag in send/recv System Calls

As of version 3.37.5, Syd unconditionally denies the use of the
*MSG_OOB* flag in all _send_(2), _sendto_(2), _sendmsg_(2), and
_sendmmsg_(2) calls -- regardless of socket family -- by returning the
*EOPNOTSUPP* ("Operation not supported on transport endpoint")
_errno_(3). As of version 3.41.1, the restriction includes the system
calls _recv_(2), _recvfrom_(2), _recvmsg_(2), and _recvmmsg_(2). This
measure addresses long-standing security concerns with out-of-band
messaging semantics in stream sockets, where urgent data bypasses normal
in-order delivery rules and is handled via separate kernel paths. Such
semantics are rarely required by modern software but introduce
complexity and subtle state transitions inside the kernel's networking
stack, which have historically led to memory safety bugs and race
conditions exploitable from unprivileged code. By default, removing
*MSG_OOB* support reduces the kernel attack surface for sandboxed
processes without impacting typical application behavior. For controlled
environments where *MSG_OOB* is explicitly required, Syd provides the
opt-in _trace/allow_unsafe_oob:1_ flag to restore legacy behavior,
though enabling it reintroduces the inherent risks associated with
out-of-band data handling. This mitigation is enabled by default on all
architectures without the _socketcall_(2) multiplexer which are
aarch64, arm, loongarch64, mips64, mipsel64, parisc, parisc64, riscv64,
x32, and x86_64. It is not supported on architectures x86, m68k, mips,
mipsel, ppc, ppc64, ppc64le, s390, s390x, sheb, and sh. For more
information refer to the following links:

- https://googleprojectzero.blogspot.com/2025/08/from-chrome-renderer-code-exec-to-kernel.html
- https://chromium-review.googlesource.com/c/chromium/src/+/6711812
- https://u1f383.github.io/linux/2025/10/03/analyze-linux-kernel-1-day-0aeb54ac.html

## Denying O_NOTIFICATION_PIPE Flag in pipe2

As of version 3.37.5, Syd unconditionally denies the use of the
*O_NOTIFICATION_PIPE* flag in _pipe2_(2) by returning the *ENOPKG* ("Package
not installed") _errno_(3), unless the _trace/allow_unsafe_pipe:1_
option is provided at startup. This restriction addresses the security
risks associated with notification pipes -- a specialized and
seldom-used mechanism designed for delivering kernel event notifications
(currently only from the keys subsystem) to userspace when the kernel is
built with *CONFIG_WATCH_QUEUE*. Unlike normal pipes, notification pipes
operate with distinct semantics and are tightly integrated with kernel
internals, creating a more complex and less widely audited code path.
Historically, vulnerabilities in notification pipe handling have
demonstrated that exposing this functionality to unprivileged, sandboxed
code can create exploitable kernel attack surface. Because typical
sandboxed applications, including high-risk workloads such as browser
renderers, have no legitimate need for notification pipes, Syd disables
this flag by default, thereby eliminating an entire class of low-value
yet high-risk kernel interfaces. The _trace/allow_unsafe_pipe:1_ flag
can be used to re-enable this capability for controlled testing or
compatibility purposes, but doing so reintroduces the underlying
security concerns. Refer to the following links for more information:

- https://chromium-review.googlesource.com/c/chromium/src/+/4128252
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=watch_queue

## madvise(2) Hardening

As of version 3.41.3, Syd tightens its _seccomp_(2) BPF policy by
argument-filtering madvise(2) to an allow-list that is safe for
untrusted workloads and has well-understood locality: *MADV_SEQUENTIAL*,
*MADV_DONTNEED*, *MADV_REMOVE*, *MADV_HUGEPAGE*, *MADV_NOHUGEPAGE*,
*MADV_DONTDUMP*, *MADV_COLLAPSE*, *MADV_POPULATE_READ*, *MADV_POPULATE_WRITE*,
and (since Linux 6.13) the lightweight guard operations
*MADV_GUARD_INSTALL*/*MADV_GUARD_REMOVE* (page-table-level red zones that
fault on access without VMA churn). The advice *MADV_HWPOISON* is denied
and all other advice are treated as no-op because they enable
cross-domain information leaks or system-wide pressure channels with no
isolation benefit, e.g., *MADV_MERGEABLE* drives KSM deduplication which
has been repeatedly shown to enable cross-VM/process side channels and
targeted bit-flip exploitation (Flip Feng Shui) as well as newer remote
and timing channels. *MADV_WILLNEED*/*MADV_RANDOM* manipulate page-cache
residency and prefetch behavior that underpin page-cache side-channel
attacks; and reclaim steering like *MADV_FREE*/*MADV_COLD*/*MADV_PAGEOUT*
introduces externally observable memory-pressure/timing signals and
accounting ambiguity that sandboxes should not expose; privileged page
state changes *MADV_SOFT_OFFLINE*/*MADV_HWPOISON* are unnecessary in
least-authority contexts and remain outside the sandbox contract even if
capability checks would reject them. This design follows the strict
syscall-and-argument allow-listing discipline also employed by Google's
Sandbox2/Sandboxed-API while remaining specific to Syd's threat model.
To temporarily relax this mitigation for tracing/compatibility, set
_trace/allow_unsafe_madvise:1_ at startup, otherwise unsafe advice
remain blocked by default. Refer to the following links for more
information:

- https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_razavi.pdf
- https://www.ndss-symposium.org/wp-content/uploads/2022-81-paper.pdf
- https://svs.informatik.uni-hamburg.de/publications/2024/Lindemann_ACSAC2024_FakeDD.pdf
- https://arxiv.org/pdf/1901.01161
- https://lwn.net/Articles/790123/
- https://lwn.net/Articles/1011366/
- https://developers.google.com/code-sandboxing/sandbox2/explained
- https://developers.google.com/code-sandboxing/sandboxed-api/explained

## setsockopt(2) Hardening

As of version 3.46.1, Syd introduces a fine-grained _setsockopt_(2)
hardening layer that denies a curated set of historically fragile or
highly privileged _socket_(2) options by matching on the (level,
optname) pair in a dedicated _seccomp_(2) filter, covering netfilter
rule programming (iptables, ip6tables, arptables, ebtables), multicast
routing control, IPv4/IPv6 multicast group management, IPv6 header
manipulation, TCP repair and upper-layer protocol hooks, congestion
control selection, UDP corking, AF_PACKET ring/fanout configuration,
BPF-based socket filters, and VSOCK buffer sizing. Syd converts these
dangerous combinations into success-returning no-ops emulating a
successful _setsockopt_(2) while silently discarding the request, which
preserves compatibility with applications that merely probe for these
features but never rely on their semantics, and at the same time removes
a substantial kernel attack surface reachable from unprivileged code.
This mitigation is enabled by default on all architectures without the
_socketcall_(2) multiplexer which are aarch64, arm, loongarch64, mips64,
mipsel64, parisc, parisc64, riscv64, x32, and x86_64. It is not
supported on architectures x86, m68k, mips, mipsel, ppc, ppc64, ppc64le,
s390, s390x, sheb, and sh. The mitigation may be relaxed at startup
using the option _trace/allow_unsafe_setsockopt:1_. Refer to the
following links for more information:

- https://nvd.nist.gov/vuln/detail/CVE-2016-9793
- https://www.cve.org/CVERecord?id=CVE-2016-9793
- https://security-tracker.debian.org/tracker/CVE-2016-9793
- https://ubuntu.com/security/CVE-2016-9793
- https://www.exploit-db.com/exploits/41995
- https://nvd.nist.gov/vuln/detail/CVE-2017-6346
- https://www.cve.org/CVERecord?id=CVE-2017-6346
- https://security-tracker.debian.org/tracker/CVE-2017-6346
- https://ubuntu.com/security/CVE-2017-6346
- https://www.cvedetails.com/cve/CVE-2017-6346/
- https://nvd.nist.gov/vuln/detail/CVE-2018-18559
- https://www.cve.org/CVERecord?id=CVE-2018-18559
- https://security-tracker.debian.org/tracker/CVE-2018-18559
- https://ubuntu.com/security/CVE-2018-18559
- https://www.cvedetails.com/cve/CVE-2018-18559/
- https://nvd.nist.gov/vuln/detail/CVE-2020-14386
- https://www.openwall.com/lists/oss-security/2020/09/03/3
- https://unit42.paloaltonetworks.com/cve-2020-14386/
- https://sysdig.com/blog/cve-2020-14386-falco
- https://gvisor.dev/blog/2020/09/18/containing-a-real-vulnerability/
- https://www.cve.org/CVERecord?id=CVE-2007-1353
- https://nvd.nist.gov/vuln/detail/CVE-2007-1353
- https://security-tracker.debian.org/tracker/CVE-2007-1353
- https://ubuntu.com/security/CVE-2007-1353
- https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2007-1353
- https://ssd-disclosure.com/ssd-advisory-linux-kernel-af_packet-use-after-free-2/

## Hardening against kernel pointer misuse

As of version 3.48.0, Syd hardens against kernel pointer misuse by
default. This mitigation deploys a _seccomp_(2) BPF filter to inspect
system call arguments known to accept pointers. If a user-supplied
argument is detected to point into kernel memory, the _seccomp_(2)
filter returns *EFAULT* ("Bad address") without passing it on to the host
kernel. This defense-in-depth measure effectively neutralizes a class of
critical vulnerabilities where the kernel fails to validate that a
user-supplied pointer resides in user-space memory (e.g. missing
_access_ok()_ checks), typically leading to arbitrary kernel memory
corruption. A seminal example of such a vulnerability is CVE-2017-5123,
where the _waitid_(2) system call failed to validate the _infop_
argument, allowing unprivileged users to trigger arbitrary kernel
writes. To disable this mitigation, set the configuration option
_trace/allow_unsafe_kptr:1_ at startup. Refer to the following links for
more information:

- https://lwn.net/Articles/736348/
- https://www.cvedetails.com/cve/CVE-2017-5123/
- https://salls.github.io/Linux-Kernel-CVE-2017-5123/
- https://github.com/salls/kernel-exploits/blob/master/CVE-2017-5123/exploit_smap_bypass.c
- https://www.cvedetails.com/cve/CVE-2018-1000199
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f67b15037a7a
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=27747f8bc355

## Hardening executable mappings

As of version 3.48.0, Syd performs self-hardening by enforcing
immutability and Execute-Only Memory (XOM) protections on its own
executable mappings during initialization. This mitigation, inspired by
the OpenBSD _mimmutable_(2) system call introduced by Theo de Raadt,
aims to protect the sandbox monitor itself from compromise and
code-reuse attacks like Return-Oriented Programming (ROP) by iterating
over its executable Virtual Memory Areas (VMAs) and applying
_mprotect_(2) to limit permissions to *PROT_EXEC* (blocking *PROT_READ*)
and _mseal_(2) to render them immutable. These operations prevent
attackers from scanning the text segment for gadgets or remapping memory
to bypass W^X (Write XOR Execute) policies. Note that this hardening is
applied on a best-effort basis; specifically, _mseal_(2) is only
available on 64-bit Linux kernels (version 6.10+), and _mprotect_(2) XOM
support depends on the underlying architecture and kernel configuration.
The hardening may be disabled at startup using the option
_trace/allow_unsafe_noxom:1_. Refer to the following links for more
information:

- https://lwn.net/Articles/779478/
- https://lwn.net/Articles/948129/
- https://lwn.net/Articles/958438/
- https://lwn.net/Articles/978010/
- https://lwn.net/Articles/1006375/
- https://man.openbsd.org/mimmutable.2
- https://www.openbsd.org/papers/csw2023.pdf

# HISTORY & DESIGN

- *sydbox-0* https://git.sr.ht/~alip/syd/tree/sydbox-0 is a _ptrace_(2) based sandbox.
- *sydbox-1* https://git.sr.ht/~alip/syd/tree/sydbox-1 is a _ptrace_(2) and _seccomp_(2) based sandbox.
- *sydbox-2* https://git.sr.ht/~alip/syd/tree/sydbox-1 is a _seccomp_(2) and _seccomp-notify_ based sandbox.
- *sydbox-3* is a rewrite of *sydbox-2* in Rust and it's what you are looking at.

This codebase has a history of a bit over 15 years and up to this point
we have used C11 as our implementation language for various reasons.
With *sydbox-3* we are moving forwards one step and writing the sandbox
from scratch using the Rust programming language with the only non-Rust
dependency being libseccomp.  Although we inherit many ideas and design
decisions from the old codebase, we also don't shy away from radically
changing the internal implementation making it much simpler, idiomatic,
and less prone to bugs. We have _proper multiarch support_ since release
3.0.11, e.g on x86-64, you can run your x32 or x86 binaries just fine
under Syd.

This version takes advantage of multithreading and handles system calls
using a thread pool whose size is equal to the number of CPUs on the
running machine and utilises globsets to match a list of patterns at
once, thus continues to perform reasonably well even with very long
rulesets. This version also comes with four new sandboxing categories
called *Lock Sandboxing*, *Memory Sandboxing*, *PID sandboxing*, *Stat
Sandboxing*, *Force Sandboxing*: *Lock Sandboxing* utilises the Landlock
Linux Security Module (LSM), *Memory Sandboxing* allows the user to
define a per-process memory limit, *PID sandboxing* allows the user to
define a limit on the maximum number of running tasks under the sandbox,
*Stat Sandboxing* can be used to effectively _hide files and
directories_ from the sandboxed process whereas *Force Sandboxing* can
be used to verify file checksums prior to exec, similar to HardenedBSD's
Integriforce and NetBSD's Veriexec.

Finally, the new Syd has support for namespaces. Use e.g. _syd -munshare/user:1_
to create a user namespace. You may use _mount_, _uts_, _ipc_, _pid_, _net_, and
_cgroup_ instead of _user_ to create various namespaces. You may use the _container_
profile as a shorthand to create namespaces with _syd -pcontainer_.

You may use Syd as your login shell because it is very practical to have a
restricted user. To do this simply add _/path/to/syd_ to the file _/etc/shells_
and do _chsh -s /path/to/syd username_ as root. In this mode the sandbox may be
configured using the files _/etc/user.syd-3_ and _~/.user.syd-3_. If you want to
restrict user configuration of the sandbox, lock the sandbox using _lock:on_ at
the end of the site-wide configuration file.

# EXHERBO

Syd is the default sandbox of *Exherbo Linux*. We use it to provide a restricted
environment under which package builds run with controlled access to file system
and network resources. _exheres-0_ has a function called _esandbox_ to interact
with Syd.

# SEE ALSO

_syd_(1), _syd_(2), _syd_(5), _seccomp_(2), _pidfd_getfd_(2), _pidfd_send
signal_(2), _ioctl_(2), _ioctl_tty_(2), _prctl_(2), _namespaces_(7),
_cgroup_namespaces_(7), _ipc_namespaces_(7), _mount_namespaces_(7),
_network_namespaces_(7), _pid_namespaces_(7), _user_namespaces_(7),
_uts_namespaces_(7)

https://exherbo.org/docs/eapi/exheres-for-smarties.html#sandboxing

# AUTHORS

Maintained by Ali Polatel. Up-to-date sources can be found at
https://gitlab.exherbo.org/sydbox/sydbox.git and bugs/patches can be
submitted to https://gitlab.exherbo.org/groups/sydbox/-/issues. Discuss
in #sydbox on Libera Chat or in #sydbox:mailstation.de on Matrix.
