Environment, Send, and Sync blog post

These are some notes for a blog post I will eventually write.

NOTE: Want to discuss this post? Disagree with some of my claims? Chat about it on the tracking issue, or open a Pull Request to this markdown file.

Terms

  • Environment - All the resources available to a program. This environment could be provided by hardware, or managed by other software, including an operating system. Could include:
    • Address Space/Mapped Memory
    • Hardware peripherals (e.g. UART, FIFOs)
    • System Calls, other ABI-ish things
    • Access control to resources (managed by hardware like an MMU/MPU or silicon support such as Arm's TrustZone)
  • Program - A single compiled binary/application. May include statically or dynamically linked dependencies. Examples include:
    • A Rust OR C binary executed on a Linux hosted platform
    • A Rust OR C binary executed on a bare-metal application
  • Process - A single program (2) executing within a single environment. Examples include:
    • A single "process" on an operating system (todo: define 'operating system process' better)
    • A program executing on a bare-metal CPU
  • Thread - A single execution flow within a process. Every process contains at least one thread, and a single process may contain multiple threads, which are cooperatively or pre-emptively scheduled (1). Examples include:
    • A Rust thread spawned by std::thread::spawn
    • The main function of a Rust program (bare metal or hosted)
    • An Interrupt Service Routine running on a bare metal CPU
  • Send - A Rust term denoting that data may be TRANSFERRED between one thread and another. This is checked at compile time. Examples include:
    • Fully owned data
  • Sync - A Rust term denoting that data may be SIMULTANEOUSLY SHARED between one thread and another. This is checked at compile time.
    • data behind &'static references (inner mutability, etc.)

(1): Note: cooperative userspace scheduling, such as async/await in Rust, is not considered separately here, and only the single/multi-threaded nature of the executor is relevant.

(2): Note: In this doc, I only discuss "native" applications, and do not consider applications with an active language runtime or interpreter. These act as their own "environment" that may differ from that of the host.

Background Reading

Programming Languages vs Systems

TODO: More than just ideas

  • PLs serve as a tool to help address some system-domain concerns
  • PLs cannot (reasonably) understand all system-domain concerns, particularly WRT portability. This may be due to HW or OS differences
    • Ex: Mutexes and Deadlocks - Spinlocks vs Critical Sections
  • PLs (and compilers) can only act on concepts they understand, and that can be observed at compile time
  • Operating Systems are programs that manage the real-world environment, and provide abstract environments for the programs they run.

Embedded Metaphor claims

  • In a bare-metal environment on a single core with a single application, we would model execution as a single process, with the main task (non-interrupt context) acting as one thread, and each interrupt acting as a separate thread. However, some exceptions:
    • With an underlying OS, such as Tock-OS, which may partition the environment with an MPU/MMU, would then consider the OS and applications to be separate processes
    • With an underlying hardware support environment, such as Arm's TrustZone, the environment may be partitioned between secure/non-secure, and we would therefore consider the Secure and Non-Secure contexts as separate processes.
  • In a bare-metal multi-core environment, we must generally assume the code executing on each core is it's own PROCESS, UNLESS the environment is truly 100% shared.
    • TODO: Do caches and registers mean environment never truly converges?
  • Although multi-threaded programs (on the desktop) may run on multiple cores, this is ONLY possible because the OS (in this case the "environment-giver" manages to provide a consistent environment/abstraction to code running on all cores, including single memory address space, etc.)

General Claims

  1. PLs should not attempt to abstract/address all systems concerns
  2. PLs are generally only "natively" capable of addressing concerns within a single environment
  3. PLs may have tools for crossing environmental boundaries, but will not generally be able to make compile-time assumptions of correctness across these boundaries
  4. Processes entail (potentially) divergent/partitioned environments, and therefore cannot be handled "natively" by programming languages
  5. Libraries may be used to abstract over platform/environment-specific behaviors (ex: Rust's stdlib)
  6. Send and Sync are guarantees over threads, e.g. within a single environment, and can therefore not help us with interprocess comms
  7. Use of FFI code (including OS interfaces, if in C, or even in Rust today) entails an environment-crossing, unless the languages provide compatible guarantees, OR the compiler used understands both 'programs' that are stitched together.
    • TODO: This is still one environment, but the compiler can't necessarily help us. The developer must guarantee safety or other guarantees are upheld. This claim may need to be split/refactored.
  8. Although Rust and rustc can not be used to enforce correctness across multiple environments, we can still provide safe interfaces
    • Subclaim: This is due to runtime non-determinism, which can't be avoided as programs/hardware may behave unpredictably outside of Rust's view

Twitter/Matrix Quotes from me:

As a succinct example, while it may be safe to pass a shared reference from one thread to another, it may NOT be safe to pass a shared reference from one process to another, for example if that memory is not mapped in the other process. Dereferencing it would cause a segfault.

But Rust's safety guarantees can really only extend to the bounds of "one environment", which means that at the OS seam (or wherever you draw that for a multicore MCU), there are certain not-so-documented rules for environment traversal and management.

You see this when you have things like libc re-mapping memory in one space (violating Rust's environmental rules), or when you have assets that can't be sent from one process to another (without the OS "fudging" the environment to make it work)

That being said: You have to draw the line somewhere, and I think this is a fair place to say "this is a systems/OS/hardware problem/detail, not a programming language problem", at least for the current future.

To investigate

  • How to Tock-OS/Redox OS handle IPC, or multi-core kernels?
    • Thread Local variables?
  • This touches around thing like ABIs/ABI stability, but I don't really address it outside of FFI concern. I guess today we treat different versions of Rust/rustc as separate languages.
  • How do we talk about multiple langs in one binary? The compiler can't help us, but this is still a single program/environment. This gets back to "system concerns", where the developer must understand the interactions between languages.
  • Note that libcore does NOT make makes very few environmental assumptions, but libstd does has additional assumptions.