Sans-IO: The secret to effective Rust for network services

232 points by wh33zle 5 days ago

This is billed as something revolutionary and forward progress but that’s exactly how we used to do async in $lang - including Rust - before language support for async/await landed.

The biggest productivity boost to my rust embedded firmware development was when I could stop manually implementing state machines and marshalling all local variables into custom state after custom state between each I/O operation snd let rust do that for me by using async/await syntax!

That’s, after all, what async desugars to in rust: an automatic state machine that saves values across I/O (await) points for you.

wh33zle 5 days ago

I tried to address this at the end of the post: If what you are implementing is mostly _sequential_ IO operations, then this model becomes a bit painful.
That isn't always the case though. In more packet-oriented usecases (QUIC, WebRTC & IP), doing the actual IO bit is easy: send & receive individual packets / datagrams.
There isn't really much the compiler can generate for you because you don't end up with many `.await` points. At the same time, the state management across all these futures becomes spaghetti code because many of these aspects should run concurrently and thus need to be in their own future / task.
tel 4 days ago

I don't think that's quite true. The lift here is that the state machine does not do any IO on its own. It always delegates that work to the event loop that's hosting it, which allows it to be interpreted in different contexts. That makes it more testable and more composable as it makes fewer assumptions about the runtime environment.
Theoretically, you could do the same thing with async/await constructing the state machines for you, although in practice it's pretty painful and most async/await code is impure.
There are lots of more experimental languages which exceptional support for this style of programming (Eff, Koka, Frank). Underlying all of Haskell's IO discourse is a very deep investment into several breeds of this kind of technology (free monads and their variants).
Lately, Unison has been a really interesting language which explores lots of new concepts but also has at its core an extensible effects system that provides excellent language-level support for this kind of coding.
- sriram_malhar 4 days ago
  > I don't think that's quite true. The lift here is that the state machine does not do any IO on its own.
  Here is a simple counter example. Suppose you have to process a packet that contains many sequences (strings/binary blobs) prefixed by 4 bytes of length.
  You are not always guaranteed to get the length bytes or the string all in one go. In a sequential system you'd accumulate the string as follows
  handle_input(...) while not received 4 bytes accumulate in buf len = toInt(buf[0..4]) while not received len bytes accumulate in buf
  If implemented as a state machine ,these would require two await points to assemble the string. Flattening this out into a state machine manually is a pain.
  - Arnavion 4 days ago
    
    I'm not sure what part of that is supposed to be a pain. The sans-io equivalent would be:
    handle_input(buf) -> Result { if len(buf) < 4 { return Error::IncompletePacket } len = toInt(buf[0..4]) if len(buf) < 4 + len { return Error::IncompletePacket } packet = buf[4..(4 + len)] return Ok { packet: packet, consumed: 4 + len } }
    where the semantics of `Error::IncompletePacket` are that the caller reads more into the buffer from its actual IO layer and then calls handle_input() again. So your "while not received required bytes: accumulate in buf" simply become "if len < required: return Error::IncompletePacket"
    
    tel 4 days ago
    
    I don't think that implementation is particularly good, although this is a big trick with Sans-IO: is the event loop responsible for buffering the input bytes? Or are the state machines?
    In effect, you have to be thoughtful (and explicit!) about the event loop semantics demanded by each state machine and, as the event loop implementer, you have to satisfy all of those semantics faithfully.
    A few alternatives include your version, one where `handle_input` returns something like `Result<Option<Packet>>` covering both error cases and successful partial consumption cases, one where `handle_input` tells the event loop how much additional input it knows it needs whenever it finishes parsing a length field and requires that the event loop not call it again until it can hand it exactly that many bytes.
    This can all be pretty non-trivial. And then you'd want to compose state machines with different anticipated semantics. It's not obvious how to do this well.
    
    sriram_malhar 4 days ago
    
    Fair enough. So let's complicate it a little. If you have hierarchical variable sized structures within structures (e.g. java class file), then you need a stack of work in progress (pointers plus length) at every level. In fact, the moment you need a stack to simulate what would otherwise have been a series of function calls, it becomes a pain.
    Or let's say you have a loop ("retry three times before giving up"), then you have to store the index in a recoverable struct. Put this inside a nested loop, and you know what I mean.
    I have run into these situations enough that a flat state machine becomes a pain to deal with.
    These are nicely solved using coroutines. That way you can have function related temporary state, IO-related state and stacks all taken care of simply.
  - tel 4 days ago
    
    I agree totally, it wasn't my intention to say that there aren't protocols which require non-trivial state machines to implement their behavior.
    To be more clear, I'm contesting that the only thing being discussed in the article is this convenience around writing state machines. I think whether or not you have to write non-trivial state machines by hand or have them generated by some convenient syntax is orthogonal to the bigger insight of what Sans-IO is going after.
    I think the most important part here is that you write these state machines such that they perform no impure calculation on their own. In other words, you write state machines that must be driven by an event loop which is responsible for interpreting commands from those state machines and that all IO (and more generally, all impure calculation) is performed exclusively by that event loop.
    It's much more possible to compose machines like this because they don't make as many assumptions on the runtime. It's not that they're reading from a blocking or non-blocking socket. It's that they process some chunk of bytes and _possibly_ want to send some chunk of bytes back. The event loop, constructed by the user of the state machine, is responsible for deciding how to read/write those bytes.
PaulHoule 4 days ago

It was how we did I/O in assembly language in the 1980s. How else would you write an interrupt-driven YMODEM implementation?
k_bx 5 days ago

Yep. The only things about async that bothers me is the need to write ".await" everywhere. I wish there'd be a way to inverse this, and actually just run ".await" by default, while having a special construct not to.
- vlmutolo 4 days ago
  
  It’s important to be able to see where the async function might pause execution.
  For example, if you’re holding a mutex lock, you probably want to avoid holding it “across” an await point so that it’s not locked for longer than necessary if the function is paused.
  - Arnavion 4 days ago
    
    I agree that the explicit yield syntax is good.
    To play devil's advocate though, the case of Mutex specifically has it going for it that MutexGuard is !Send, so it's a compiler error if a MutexGuard is held across an await point in a Send Future. But yes if your Future is also !Send then the compiler will allow it. In that case, your only recourse is that clippy has lints for holding Mutex and RefCell guards across await points, as long as you're running it and paying attention to it of course.
  - k_bx 3 days ago
    
    I disagree. It should be a compiler warning, maybe a "clippy" one, in such cases.
    Btw the problem of sync code blocking async code is very real and also needs to be resolved, adding explicit `.blocking` to every blocking call is just as bad as explicit .await at every line.
    Also, I like Haskell approach of being able to introduce syntax extensions at a file-level, so that for code that'd benefit from explicit await – I'd rather let author have it explicit.
  - PenguinCoder 3 days ago
    
    > It’s important to be able to see where the async function might pause execution.
    Why? Your example does not prove the point.
- mdtusz 5 days ago
  
  You mean `.await`, I assume?
  - k_bx 5 days ago
    
    Thanks, didn't have my coffee yet :)
  - sirdvd 5 days ago
    
    /s https://xkcd.com/2954/
Aissen 4 days ago

It is not billed as revolutionary. From the article:
> This pattern isn't something that we invented! The Python world even has a dedicated website about it.
And yet it is too common to find protocol libraries doing I/O in the wild :-(

zamalek 5 days ago

I had been mulling over this problem space in my head, and this is a seriously great approach to the direction I have been thinking (though still needs work, footnote 3 in the article).

What got me thinking about this was the whole fn coloring discussion, and a happy accident on my part. I had been writing a VT100 library and was doing my head in trying to unit test it. The problem was that I was essentially `parser::new(stdin())`. During the 3rd or 4th rewrite I changed the parser to `parser::push(data)` without really thinking about what I was doing. I then realized that Rust was punishing me for using an enterprise OOPism anti-pattern I have since been calling "encapsulation infatuation." I now see it everywhere (not just in I/O) and the havoc it wreaks.

The irony is that this solution is taught pre-tertiary education (and again early tertiary). The simplest description of a computer is a machine that takes input, processes/transforms data, and produces output. This is relevant to the fn coloring discussion because only input and output need to be concerned with it, and the meat-and-potatoes is usually data transformation.

Again, this is patently obvious - but if you consider the size of the fn coloring "controversy;" we've clearly all been missing/forgetting it because many of us have become hard-wired to start solving problems by encapsulation first (the functional folks probably feel mighty smug at this point).

Rust has seriously been a journey of more unlearning than learning for me. Great pattern, I am going to adopt it.

Edit: code in question: https://codeberg.org/jcdickinson/termkit/src/branch/main/src...

j1elo 4 days ago

> I changed the parser to `parser::push(data)` without really thinking about what I was doing. I then realized that Rust was punishing me for using an enterprise OOPism anti-pattern
Could you please elaborate more on this? I feel you're talking about an obvious problem with that pattern but I don't see how Rust punishes you for using it (as a very novice Rust learner)
- zamalek 4 days ago
  It's been a while, so I'm a bit hazy on the details. Every iteration of the code had the scanner+parser approach. The real problems started when testing the parser (because that was the double-encapsulated `Parser<Scanner<IO>>`). This means that in order to test the parser I had to mock complete VT100 data streams (`&mut &[u8]` - `&mut b"foo"` - is fortunately a Reader, so that was one saving grace). They were by all standards integration tests, which are annoying to write. Past experiences with fighting the borrow-checker taught me that severe friction (and lack of enjoyment) is a signal that you might be doing something wrong even if you can still get it to work, which is why I kept iterating.
  My first few parser designs also took a handler trait (the Alacritty VT100 stuff does this if you want a living example). Because, you know, encapsulate and DI all the things! Async traits weren't a thing at the time (at least without async-trait/boxing/allocation in a hot loop), so fn coloring was a very real problem for me.
  The new approach (partial, I haven't started the parser) is:
  input.read(&mut data); tokens = scanner.push(&data); ops = parser.push(&tokens);
  Maybe you can see from that how much simpler it would be to unit test the parser, I can pass it mock token streams instead of bytes. I can also assert the incremental results of it without having to have some mock handler trait impl that remembers what fns were invoked.
  I'm not sure if that really answers your question, but as I mentioned: it's been a while. And good luck with the learning!
  - binary132 4 days ago
    
    Correct me if I’m wrong, but would another way of saying this be: write parsers in terms of a buffer, not in terms of IO?
    
    zamalek 4 days ago
    
    Yup, that makes sense.
  - j1elo 4 days ago
    
    Thanks a lot! Yeah I see how the simpler design that has non-stacked implementations one on top of another is easier to understand and test. Yours is not only a lesson in design for Rust, but in general for any technology! This later idea is just to compose parts together, but those parts are able to work independently just fine (given properly formatted inputs). A simpler and more robust way of designing components that have to work together.
    
    zamalek 4 days ago
    
    Totally, Rust has substantially affected how I write code at my day job (C#). If I ever get round to learning a functional language, I'm sure that would have a much bigger effect.
wh33zle 4 days ago

I too came from the OOP world to Rust (6 years ago now) and in my first 2-3 years I produced horrible code!
Type parameters and traits everywhere. Structs being (ab-)used as class-like structures that provide functionality.
Rust works better if you avoid type parameters and defining your own traits for as much as possible.
Encapsulation is good if we talk about ensuring invariants are maintained. This blog post about parse, don't validate comes to my mind: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

ziziman 5 days ago

How does this design compare to using channels to send data to a dedicated handlers. When using channels i've found multiple issues: (1) Web-shaped code that is often hard to follow along (2) Requires to manually implement message types that can then be converted to network-sendable messages (3) Requires to explicitly give a transmitter to interested/allowed entities (4) You get a result if your channel message failed to transmit but NOT if your message failed to transmit over network

But besides that it's pretty convenient. Let's say you have a ws_handler channel, you just send your data through that and there is a dedicated handler somewhere that may or may not send that message if it's able to.

K0nserv 5 days ago

For 4 you can implement that with a channel passed along with the message to send a result back. You can then block the sending side all the way to the callsite if you wish.
My feeling is that sans-IO is particularly useful for libraries, although it can be used for applications too. In a library it means you don't force decisions about how I/O happens on your consumer, making it strictly more useful. This is important for Rust because there's already a bunch of ecosystem fragmentation between sync and async IO(not to mention different async runtimes)
- wh33zle 4 days ago
  
  The line between applications and libraries is fairly blurry, isn't it? In my experience, most applications grow to the point where you have internal libraries or could at least split out one or more crates.
  I would go as far as saying that whatever functionality your application provides, there is a core that can be modelled without depending on IO primitives.
  - binary132 4 days ago
    
    In my eyes an ideal library should not contain state, internally allocate (unless very obviously), or manage processes. The application should do that, or provide primitives for doing it which the library can make use of. That makes applications and libraries very very different in my mind.
    
    K0nserv 4 days ago
    
    The thing about state is a good point. With the sans-IO pattern we have inversion of IO and Time, but adding memory to that would be a nice improvement too.
    
    binary132 4 days ago
    
    Those C libraries that have initializers which take ** and do the allocation for you drive me nuts! I’m sure there’s some good reason, but can’t you trust me to allocate for myself, you know?
  - K0nserv 4 days ago
    
    Yes true, the one difference might be that you don't expect other consumers with a different approach to IO to use your internal libraries, although it does help you if you want to change that in the future and the testability is still useful
wh33zle 5 days ago

Channels work fine if you are happy for your software to have an actor-like design.
But as you say, it comes with problems: Actors / channels can be disconnected for example. You also want to make sure they are bounded otherwise you don't have backpressure. Plus, they require copying so achieving high-throughput may be tricky.

hardwaresofton 4 days ago

See also: monads and in particular the Free(r) monad, and effects systems[0].

The idea of separating logic from execution is a whole thing, well trodden by the Haskell ecosystem.

[EDIT] Also, they didn't mention how they encapsulated the `tokio::select!` call that shows up when they need to do time-related things -- are they just carrying around a `tokio::Runtime` that they use to make the loop code async without requiring the outside code to be async?

[EDIT2] Maybe they weren't trying to show an encapsulated library doing that, but rather to show that the outside application can use the binding in an async context...

I would have been more interested in seeing how they could implement an encapsulated function in the sans-IO style that had to do something like wait on an action or a timer -- or maybe the answer they're expecting there is just busy-waiting, or carrying your own async runtime instance (that can essentially do the busy waiting for you, with something like block_in_place.

[0]: https://okmij.org/ftp/Computation/free-monad.html

wh33zle 4 days ago

> I would have been more interested in seeing how they could implement an encapsulated function in the sans-IO style that had to do something like wait on an action or a timer
The "encapsulated function" is the `StunBinding` struct. It represents the functionality of a STUN binding. It isn't a single function you can just call, instead it requires an eventloop.
The point though is, that `StunBinding` could live in a library and you would be able to use it in your application by composing it into your program's state machine (assuming you are also structuring it in a sans-IO style).
The linked `snownet` library does exactly this. Its domain is to combine ICE + WireGuard (without doing IO) which is then used by the `connlib` library that composes ACLs on top of it.
Does that make sense?
EDIT: There is no busy-waiting. Instead, `StunBinding` has a function that exposes, what it is waiting for using `poll_timeout`. How the caller (i.e. eventloop) makes that happen is up to them. The appropriate action will happen once `handle_timeout` gets called with the corresponding `Instant`.
- hardwaresofton 4 days ago
  
  > The "encapsulated function" is the `StunBinding` struct. It represents the functionality of a STUN binding. It isn't a single function you can just call, instead it requires an eventloop. > > The point though is, that `StunBinding` could live in a library and you would be able to use it in your application by composing it into your program's state machine (assuming you are also structuring it in a sans-IO style).
  What I was thinking was that the functionality being executed in main could just as easily be in a library function -- that's what I meant by encapsulated function, maybe I should have said "encapsulated functionality".
  If the thing I want to do is the incredibly common read or timeout pattern, how do I do that in a sans-IO way? This is why I was quite surprised to see the inclusion of tokio::select -- that's not very sans IO, but is absolutely the domain of a random library function that you might want to expose.
  It's a bit jarring to introduce the concept as not requiring choices like async vs not, then immediately require the use of async in the event loop (required to drive the state machine to completion).
  Or is the point that the event loop should be async? That's a reasonable expectation for event loops that are I/O bound -- it's the whole point of a event loop/reactor pattern. Maybe I'm missing some example where you show an event loop that is not async, to show that you can drive this no matter whether you want or don't want async?
  So if I to try to condense & rephrase:
  If I want to write a function that listens or times out in sans-IO style, should I use tokio::select? If so, where is the async runtime coming from, and how will the caller of the function be able to avoid caring?
  - wh33zle 4 days ago
    
    > If I want to write a function that listens or times out in sans-IO style, should I use tokio::select? If so, where is the async runtime coming from, and how will the caller of the function be able to avoid caring?
    To "time-out" in sans-IO style means that your state machine has an `Instant` internally and, once called at a specific point in the future, compares the provided `now` parameter with the internal timeout and changes its state accordingly. See [0] for an example.
    > but is absolutely the domain of a random library function that you might want to expose.
    That entire `main` function is _not_ what you would expose as a library. The event loop should always live as high up in the stack as possible, thereby deferring the use of blocking or non-blocking IO and allowing composition with other sans-IO components.
    You can absolutely write an event loop without async. You can set the read-timeout of the socket to the value of `poll_timeout() - Instant::now` and call `handle_timeout` in case your `UdpSocket::recv` call errors with a timeout. str0m has an example [1] like that in their repository.
    > It's a bit jarring to introduce the concept as not requiring choices like async vs not, then immediately require the use of async in the event loop (required to drive the state machine to completion).
    All the event loops you see in the post are solely there to ensure we have a working program but are otherwise irrelevant, esp. implementation details like using `tokio::select` and the like. Perhaps I should have made that clearer.
    [0]: https://github.com/firezone/firezone/blob/1e7d3a40d213c9524a... [1]: https://github.com/algesten/str0m/blob/5b100e8a675cd8838cdd8...
    
    hardwaresofton 4 days ago
    
    > To "time-out" in sans-IO style means that your state machine has an `Instant` internally and, once called at a specific point in the future, compares the provided `now` parameter with the internal timeout and changes its state accordingly. See [0] for an example.
    This part of the post was clear -- I didn't ask any clarifications about that, my point was about what I see as "read or timeout", a reasonable functionality to expose as a external facing function.
    The question is still "If I want to read or timeout, from inside a function I expose in a library that uses sans-IO style, how do I do that?".
    It seems like the answer is "if you want to accomplish read or timeout at the library function level, you either busy wait or pull in an async runtime, but whatever calls your state machine has to take care of that at a higher level".
    You see how this doesn't really work for me? Now I have to decide if my read_or_timeout() function exposed is either the default sync (and I have to figure out how long to wait, etc), or async.
    It seems in sans-IO style read_or_timeout() would be sync, and do the necessary synchronous waiting internally, without the benefit of being able to run other tasks from unrelated state machines in the meantime.
    > That entire `main` function is _not_ what you would expose as a library.
    Disagree -- it's entirely reasonable to expose "read your public IP via STUN" as a library function. I think we can agree to disagree here.
    > The event loop should always live as high up in the stack as possible, thereby deferring the use of blocking or non-blocking IO and allowing composition with other sans-IO components.
    Sure... but that means the code you showed me should never be made into a library (we can agree to disagree there), and I think it's reasonable functionality for a library...
    What am I missing here? From unrelated code, I want to call `get_ip_via_stun_or_timeout(hostnames: &[String], timeout: Duration) -> Option<String>`, is what I'm missing that I need to wrap this state machine in another to pass it up to the level above? That I need to essentially move the who-must-implement-the-event-loop one level up?
    > You can absolutely write an event loop without async. You can set the read-timeout of the socket to the value of `poll_timeout() - Instant::now` and call `handle_timeout` in case your `UdpSocket::recv` call errors with a timeout. str0m has an example [1] like that in their repository.
    Didn't say you couldn't!
    What you've described is looping with a operation-supported timeout, which requires timeout integration at the function call level below you to return control. I get that this is a potential solution (I mentioned it in my edits on the first comment), but not mentioning it in the article was surprising to me.
    The code I was expecting to find in that example is like the bit in strom:
    https://github.com/algesten/str0m/blob/5b100e8a675cd8838cdd8...
    Clearly (IMO evidenced by the article using this method), the most ergonomic way to do that is with a tokio::select, and that's what I would reach for as well -- but I thought a major point was to do it sans IO (where "IO" here basically means "async runtime").
    Want to note again, this is not to do with the state machine (it's clear how you would use a passed in Instant to short circuit), but more about the implications of abstracting the use of the state machine.
    > All the event loops you see in the post are solely there to ensure we have a working program but are otherwise irrelevant, esp. implementation details like using `tokio::select` and the like. Perhaps I should have made that clearer.
    I personally think it exposes a downside of this method -- while I'm not a fan of simply opting in to either async (and whichever runtime smol/tokio/async-std/etc) or sync, what it seems like this pattern will force me to:
    - Write all code as sync - Write sync code that does waiting based on operations that yielding back control early - Hold my own tokio runtime so I can do concurrent things (this, you argue against)
    Async certainly can be hard to use and have many footguns, but this approach is certainly not free either.
    At this point if I think I want to write a library that supports both sync and async use cases it feels like feature flags & separate implementations might produce an easier to understand outcome for me -- the sync version can even start as mostly `tokio::Runtime::block_on`s, and graduate to a more performant version with better custom-tailored efficiency (i.e. busy waiting).
    Of course, I'm not disparaging the type state pattern here/using state machines -- just that I'd probably just use that from inside an async/sync-gated modules (and be able to share that code between two impls).
    
    wh33zle 4 days ago
    
    > What am I missing here? From unrelated code, I want to call `get_ip_via_stun_or_timeout(hostnames: &[String], timeout: Duration) -> Option<String>`, is what I'm missing that I need to wrap this state machine in another to pass it up to the level above? That I need to essentially move the who-must-implement-the-event-loop one level up?
    Essentially yes! For such a simple example as STUN, it may appear silly because the code that is abstracted away in a state machine is almost shorter than the event loop itself.
    That very quickly changes as the complexity of your protocol increases though. The event loop is always roughly the same size yet the protocol can be almost arbitrarily nested and still reduces down to an API of `handle/poll_timeout`, `handle_input` & `handle_transmit`.
    For example, we've been considering adding a QUIC stack next to the WireGuard tunnels as a control protocol in `snownet`. By using a sans-IO QUIC implementation like quinn, I can do that entirely as an implementation detail because it just slots into the existing state machine, next to ICE & WireGuard.
    > At this point if I think I want to write a library that supports both sync and async use cases it feels like feature flags & separate implementations might produce an easier to understand outcome for me -- the sync version can even start as mostly `tokio::Runtime::block_on`s, and graduate to a more performant version with better custom-tailored efficiency (i.e. busy waiting).
    > Of course, I'm not disparaging the type state pattern here/using state machines -- just that I'd probably just use that from inside an async/sync-gated modules (and be able to share that code between two impls).
    This is what quinn does: It uses tokio + async to expose an API that uses `AsyncRead` and `AsyncWrite` and thus fully buys into the async ecosystem. The actual protocol implementation however - quinn-proto - is sans-IO.
    The way I see this is that you can always build more convenience layers, whether or not they are in the same crate or not doesn't really matter for that. The key thing is that they should be optional. The problems of function colouring only exist if you don't focus on building the right thing: an IO-free implementation of your protocol. The protocol implementation is usually the hard bit, the one that needs to be correct and well-tested. Integration with blocking or non-blocking IO is just plumbing work that isn't difficult to write.
    
    hardwaresofton 4 days ago
    
    Ahh thanks for clarifying this! Makes a ton of sense now -- I need to try writing some of these style of programs (in the high perf Rust style) to see how they feel.
    > For example, we've been considering adding a QUIC stack next to the WireGuard tunnels as a control protocol in `snownet`. By using a sans-IO QUIC implementation like quinn, I can do that entirely as an implementation detail because it just slots into the existing state machine, next to ICE & WireGuard.
    Have you found that this introduces a learning curve for new contributors? Being able to easily stand up another transport is pretty important, and I feel like I can whip together an async-required interface for a new protocol very easily (given I did a decent job with the required Traits and used the typestate pattern) where as sans-IO might be harder to reason about.
    Thanks for pointing out quinn-proto (numerous times at this point) as well -- I'll take a look at the codebase and see what I can learn from it (as well as str0m).
    [EDIT]
    > The problems of function colouring only exist if you don't focus on building the right thing: an IO-free implementation of your protocol. The protocol implementation is usually the hard bit, the one that needs to be correct and well-tested.
    The post, in a couple lines!
    [EDIT2] Any good recommendations of a tiny protocol that might be a good walk through intro to this?
    Something even simpler than Gopher or SMTP? Would be nice to have a really small thing to do a tiny project in.
    
    wh33zle 4 days ago
    
    > [EDIT2] Any good recommendations of a tiny protocol that might be a good walk through intro to this? > > Something even simpler than Gopher or SMTP? Would be nice to have a really small thing to do a tiny project in.
    I only have experience in packet-oriented ones so I'd suggest sticking to that. Perhaps WireGuard could be simple enough? It has a handshake and timers so some complexity but nothing too crazy.
    DNS could be interesting too, because you may need to contact upstream resolvers if you don't have something cached.
    
    joshka 4 days ago
    
    If you want a web protocol, try oauth2. There's complexity in the number of things you can support, but in essence there's a state machine that can be modeled.
    
    hardwaresofton 4 days ago
    
    Ahh didn't even think of that level of the stack… It is true that the OAuth2 tango can be represented by a state machine…
    I’d probably do CAS instead, it’s simpler IMO.
    
    joshka 4 days ago
    
    The Stun protocol is surprisingly easy to implement, but see my comment at https://news.ycombinator.com/item?id=40879547 about why I'd just use async instead of making my own event loop system.
    https://gist.github.com/joshka/af299be87dbd1f64060e47227b577...
    
    hardwaresofton 4 days ago
    
    Thanks for the code! Going to pore over this.
    I read the comment and I definitely agree (though it took me a while to get to where you landed), I think there are some benefits:
    - More controllable/easy to reason about cancel safety (though this gets pushed up the stack somewhat). You just can't cancel a thread, but it turns out a ton of places in an async function are cancel points (everywhere you or some function calls .await, most obviuosly), and that can cause surprising problems.
    - Ability to easily slap on both sync and async shells (I personally think it's not unforgivable to smuggle a tokio current thread runtime in as a dep and use block_on for async things internally, since callers are none the wiser)
    Great comment though, very succinctly explained what I was getting at... I personally land on the "just make everything async" side of things. Not necessarily everything should be Send + Sync, but similar to Option/Result, I'd rather just start using async everywhere than try to make a sync world work.
    There's also libraries like agnostic[0] that make it somewhat easier to support multiple runtimes (though I've done it in the past with feature flags).
    > The problem with the approach suggested in the article is that it splits the flow (event loop) and logic (statemachine) from places where the flow is the logic (send a stun binding request, get an answer).
    Very concisely put -- If I'm understanding OP's point of view, the answer to this might be "don't make the flow the logic"? basically rather encoding the flow as a state machine and passing that up to an upper event loop (essentially requiring the upper layer to do it).
    Feels like there are at least 3 points in this design space:
    - Sync only state machines (event loop must be at the outermost layer) - Sync state machines with possibly internal async (event loops could be anywhere) - Async everything (event loops are everywhere)
    [0]: https://crates.io/crates/agnostic
    
    hardwaresofton 4 days ago
    
    also a bit late, but you've seen anyhow & miette right? noticed the color_eyre usage and was just wondering
    
    joshka 4 days ago
    
    yep - color_eyre is a better anyhow (and there's plans afoot to merge them into just one at some point[1]). Miette occupies a space that I generally don't need (except when processing data), while color-eyre is in the goldilocks zone.
    [1]: https://github.com/eyre-rs/eyre/issues/177
    
    hardwaresofton 4 days ago
    
    Thanks for the pointer! Rustaceans are spoiled for choice with good error handling and libraries, great to have so many great choices.

r3trohack3r 5 days ago

Oh hey thomaseizinger!

I got half way through this article feeling like this pattern was extremely familiar after spending time down inside rust-libp2p. Seems like that wasn't a coincidence!

Firezone looks amazing, connect all the things!

wh33zle 5 days ago

Haha thank you!
Yes there are indeed similarities to rust-libp2p! Over there, things are more interleaved though because the actual streams and connections are still within `Future`-like constructs and not strictly split like in the sans-IO case here.

amluto 5 days ago

> Also, sequential workflows require more code to be written. In Rust, async functions compile down to state machines, with each .await point representing a transition to a different state. This makes it easy for developers to write sequential code together with non-blocking IO. Without async, we need to write our own state machines for expressing the various steps.

Has anyone tried to combine async and sans-io? At least morally, I ought to be able to write an async function that awaits sans-io-aware helpers, and the whole thing should be able to be compiled down to a state machine inside a struct with a nice sans-io interface that is easily callable by non-async code.

I’ve never tried this, but the main issues I would forsee would be getting decent ergonomics and dealing with Pin.

10000truths 5 days ago

Rust has generators/coroutines that can somewhat address the use case you're describing, but they're an extra-unstable feature at the moment. Unfortunately, in its current incarnation, coroutines have the annoying limitation of only being exposed via the std::ops::Coroutine trait, so the underlying state machine generated by the compiler can't be manually allocated, even though the size of the state machine is ostensibly a compile-time constant.
It's not an issue for a single coroutine whose lifetime is contained within the function that defines it, since the compiler can figure that out and stack-allocate the state machine. But arguably the most useful application of coroutines is as elements in a queue for event loop machinery. But implementing that is made impossible unless you box the coroutines. Vec<Box<dyn Coroutine>> is not a cache friendly data structure, and you'll feel the pain if you're doing extremely high concurrency I/O and need a million elements in your Vec.
wh33zle 5 days ago
If Rust ever gets a native generator syntax, this might be become achievable because one would be able to say: `yield transmit` to "write" data whilst staying within the context of your async operation. In other words, every `socket.write` would turn into a `yield transmit`.
To read data, the generator would suspend (.await) and wait to be resumed with incoming data. I am not sure if there is nightly syntax for this but it would have to look something like:
```
  // Made up `gen` syntax: gen(yield_type, resume_type)
  gen(Transmit, &[u8]) fn stun_binding(server: SocketAddr) -> SocketAddr {
   let req = make_stun_request();

   yield Transmit {
      server,
      payload: req
   };

   let res = .await; // Made up "suspend and resume with argument"-syntax.
   
   let addr = parse_stun_response(res);

   addr
 }
```
- Arnavion 4 days ago
  
  Rust has had native generator syntax for a few years FYI. It's what async-await is built on. It's just gated behind a nightly feature.
  https://doc.rust-lang.org/stable/std/ops/trait.Coroutine.htm... and the syntax you're looking for for resuming with a value is `let res = yield ...`
  Alternatively there is a proc macro crate that transforms generator blocks into async blocks so that they work on stable, which is of course a round-about way of doing it, but it certainly works.
wh33zle 5 days ago

They actually play together fairly well higher up the stack. Non-blocking IO (i.e async) makes it easy to concurrently wait for socket IO and time. You can do it with blocking IO too by setting a read-timeout on the socket but using async primitives makes it a bit easier.
But I've also been mulling over the idea how they could be combined! One thing I've arrived at is the issue that async functions compile into opaque types. That makes it hard / impossible to use the compiler's facility of code-generating the state machine because you can't interact with it once it has been created. This also breaks the borrow-checker in some way.
For example, if I have an async operation with multiple steps (i.e. `await` points) but only one section of those needs a mutable reference to some shared data structure. As soon as I express this using an `async` function, the mutable reference is captured in the generated `Future` type which spans across all steps. As a result, Rust doesn't allow me to run more than one of those concurrently.
Normally, the advice for these situations is "only capture the mutable reference for as short as possible" but in the case of async, I can't do that. And splitting the async function into multiple also gets messy and kind of defeats the point of wanting to express everything in a single function again.
algesten 5 days ago

One thing I toyed with, but didn't get very far, was to encode the HTTP/1.1 protocol as a Sans-IO state machine with .await points for the IO, but rather than the IO registering Wakers with an async runtime, it relinquished control back to the user to perform the IO manually. One can think of it as .await releasing "up" instead of "down".
In the context of HTTP/1.1 the async code became a kind of "blueprint" for how the user wants the call to behave. At the time I was dead set on making it work for no_std (non allocator) environment, and I gave up because I couldn't find a way around how to need dynamic dispatch via Box<dyn X> (needing an allocator).

joshka 4 days ago

https://news.ycombinator.com/item?id=40879547

    async fn stun(
        server: SocketAddr,
        mut socket: impl Sink<(BindingRequest, SocketAddr), Error = color_eyre::Report>
            + Stream<Item = Result<(BindingResponse, SocketAddr), color_eyre::Report>>
            + Unpin
            + Send
            + 'static,
    ) -> Result<SocketAddr> {
        socket.send((BindingRequest, server)).await?;
        let (message, _server) = socket.next().await.ok_or_eyre("No response")??;
        Ok(message.address)
    }

Fully working code at https://gist.github.com/joshka/af299be87dbd1f64060e47227b577...

bionhoward 4 days ago

I wrote a library which I didn’t release yet, where the natural async approach seems impossible to compile in Rust if async I/O is tied too tightly to main logic. (In b4 skill issue)
Sans I/O mostly means, write pure functions and move I/O out of your main functionality as much as possible. Then you can deal with each part independently and this makes the compiler happy.
80-96% of your work on a sans io rust project is still going to be I/O, but it’s not complected with your main logic, so you can unit test the main logic more easily
kmac_ 5 days ago

This is another take on defunctionalization. You create a model of execution but do not execute it. I.e., return or queue a value of type Send, and do not execute "send". The execution is separate and actually deals with "real-world" side effects. The execution can be done by sync/async/transformed to monads, it doesn't matter.
ithkuil 5 days ago

A long time ago I had "fun" implementing all sorts of network protocols with such an event based library on C: https://github.com/cesanta/mongoose

ethegwo 5 days ago

Good job! Exposing state could make any async function 'pure'. All the user needs to do is push the state machine to the next state. I have tried to bind OpenSSL to async Rust before, its async API follows a similar design.

wh33zle 5 days ago

I did some quick research and found that there is an "async job" API in OpenSSL. That one appears to do IO though, it even says that creating a job is a very expensive operation and thus jobs should be reused.
Is the similarity you are seeing that the work itself that gets scheduled via a job is agnostic over how it is executed?
From this example [0] it looks more like that async API is very similar to Rust's futures:
- Within a job you can access a "wait context"
- You can suspend on some condition
- You can trigger a wake-up to continue executing
[0]: https://www.openssl.org/docs/man1.1.1/man3/ASYNC_is_capable....
- ethegwo 5 days ago
  
  Yes, you're right. It's not entirely similar, it's not IO-less. But in async Rust (or any other stackless coroutine runtimes), IO should be bound to the scheduler. This allows IO events callback scheduler and wake the task it binds to. Exposing and manually pushing state is a good way to decouple IO from the scheduler.
  - wh33zle 5 days ago
    
    Yes! Decoupling is the goal of this! Using non-blocking IO is still useful in this case because it means we can wait on two conditions at once (i.e. socket IO and time), see [0].
    It is possible to do the same blocking IO but it feels a little less natural: You have to set the read-timeout on the socket to the time when you need to wake-up the state machine.
    [0]: https://github.com/firezone/sans-io-blog-example/blob/99df77...

mpweiher 4 days ago

Reading the article and some of the comments, it sounds like they reinvented the hexagonal or ports/adapters architectural style?

mgaunard 5 days ago

This is just normal asynchronous I/O with callbacks instead of coroutines.

Uptrenda 5 days ago

I don't know what the take away is supposed to be here. Everything spoken about here is already basic network programming. It seems to focus on higher level plumbing and geeks out on state management even though this is just a matter of preference and has nothing to do with networking.

The most interesting thing I learned from the article is that cloudflare runs a public stun server. But even that isn't helpful because the 'good' and 'useful' version of the STUN protocol is the first version of the protocol which supports 'change requests' -- a feature that allows for NAT enumeration. Later versions of the STUN protocol removed that feature thanks to the 'helpful suggestions' of Cisco engineers who contributed to the spec.

K0nserv 5 days ago

The big thing, in the context of Rust, I think is how this solves function colouring, but it also makes testing really simple as outlined in the post.
The current situation in Rust is that if you implement a library, say one that does WebRTC, that uses the Tokio async runtime. Then it's very cumbersome for folks to use it if they are doing sync IO, using a different runtime(smol, async-std etc), are using iouring directly etc. With this approach you don't force the IO choice on consumers and make the library useful to more people.
- solidninja 4 days ago
  
  The parallels with abstracting over the effect type and Free(r) monads are really apparent if you've had exposure to that style of programming. As you said, the benefit is that you can separate the business logic (what you want to do) from the execution model (how you do it) and this is very much an ongoing theme in programming language development.

screcth 4 days ago

It would be better if the compiler could take the async code and transform it automatically to its sans io equivalent. Doing it manually seems error prone and makes it much harder to understand what the code is doing.

ibotty 4 days ago

That's just an initial encoding as described e.g. here: https://peddie.github.io/encodings/encodings-text.html

Am I missing something?

tmd83 5 days ago

Does the actual traffic goes through the gateway or the gateway is only used for setting up the connection?

wh33zle 5 days ago

Yes, traffic is routed to the gateway through a WireGuard tunnel. Broadly speaking, what happens is:
- Client and gateway perform ICE to agree on a socket pair (this is where hole-punching happens or if that fails, a relay is used)
- The socket pair determined by ICE is used to set up a WireGuard tunnel (i.e. a noise handshake using ephemeral keys).
- IP traffic is read from the TUN device and sent via the WireGuard tunnel to the gateway.
- Gateway decrypts it and emits it as a packet from its TUN device, thereby forwarding it to the actual destination.
It is worth noting that a WireGuard tunnel in this case is "just" the Noise Protocol [0] layered on top of UDP. This ensures the traffic is end-to-end encrypted.
[0]: https://noiseprotocol.org

Animats 5 days ago

"... be it from your Android phone, MacOS computer or Linux server. "

Why would you want this in a client? It's not like a client needs to manage tens of thousands of connections. Unless it's doing a DDOS job.

wh33zle 5 days ago

In Firezone's case, things are built on top of UDP so technically there aren't any (kernel-managed) connections and only a single file descriptor is allocated for the UDP socket.
The main benefit is being able to use `&mut` everywhere: At the time when we read an IP packet from the TUN device, we don't yet know, which gateway (exit node), it needs to go to. We first have to look at the user's policies and then encrypt and send it via a WireGuard tunnel.
Similarly, we need to concurrently receive on all of these tunnels. The tunnels are just a user-space concept though. All we do is receive on the UDP socket and index into the corresponding data structure based on the sending socket.
If all of these "connections" would use their own task and UDP socket, we'd would have to use channels (and thus copying) to dispatch them. Additionally, the policy state would have to be in an `Arc<Mutex>` because it is shared among all connections.

cryptonector 3 days ago

> This pattern isn't something that we invented! The Python world even has a dedicated website about it.

I mean, this is basically what the IO monad and monadic programming in Haskell end up pushing Haskell programmers to do.

Arnavion 4 days ago

See also this discussion from a few months ago about sans-io in Rust: https://news.ycombinator.com/item?id=39957617

joshka 4 days ago

This article / idea really refactors two things out of some IO code

- the event loop

- the state machine of data states that occur

But async rust is already a state machine, so the stun binding could be expressed as a 3 line async function that is fairly close to sans-io (if you don't consider relying on abstractions like Stream and Sink to be IO).

    async fn stun(
        server: SocketAddr,
        mut socket: impl Sink<(BindingRequest, SocketAddr), Error = color_eyre::Report>
            + Stream<Item = Result<(BindingResponse, SocketAddr), color_eyre::Report>>
            + Unpin
            + Send
            + 'static,
    ) -> Result<SocketAddr> {
        socket.send((BindingRequest, server)).await?;
        let (message, _server) = socket.next().await.ok_or_eyre("No response")??;
        Ok(message.address)
    }

If you look at how the underlying async primitives are implemented, they look pretty similar to what you;ve implemented. sink.send is just a future for Option<SomeMessage>, a future is just something that can be polled at some later point, which is exactly equivalent to your event loop constructing the StunBinding and then calling poll_transmit to get the next message. And the same goes with the stream.next call, it's the same as setting up a state machine that only proceeds when there is a next item that is being fed to it. The Tokio runtime is your event loop, but just generalized.

Restated simply: stun function above returns a future that that combines the same methods you have with a contract about how that interacts with a standard async event loop.

The above is testable without hitting the network. Just construct the test Stream / Sink yourself. It also easily composes to add timeouts etc. To make it work with the network instead pass in a UdpFramed (and implement codecs to convert the messages to / from bytes).

Adding timeout can be either composed from the outside caller if it's a timeout imposed by the application, or inside the function if it's a timeout you want to configure on the call. This can be tested using tokio test-utils and pausing / advancing the time in your tests.

---

The problem with the approach suggested in the article is that it splits the flow (event loop) and logic (statemachine) from places where the flow is the logic (send a stun binding request, get an answer).

Yes, there's arguments to be made about not wanting to use async await, but when you effectively create your own custom copy of async await, just without the syntactic sugar, and without the various benefits (threading, composability, ...), it's worth considering whether you could use async instead.

wh33zle 18 hours ago

This isn't an adequat comparison in my eyes. Your example moves a very critical component out if the "protocol logic": message parsing.
Dealing with invalid messages is crucial in network protocols, meaning the API should be `&[u8]` instead of parsed messages.
In addition, you want to be able to parse messages without copying, which will introduce lifetimes into this that will make it difficult to use an exexutor like tokio to run this future. You can use a structured-concurrency approach instead. I linked to an article from withoutboats about this at the end.
Lastly, and this is where using async falls apart: If you try to model these state machines with async, you can't use `&mut` to update shared state between them AND run multiple of these concurrently.
EDIT: Just to be clear, I'd love to use the Rust compiler to generate these state machines but it simply can't do what I want (yet). I wrote some pseudo-code somewhere else in this thread of how this could be done using co-routines. That gets us closer to this but I've not seen any movement on the front of stabilising coroutines.