this post was submitted on 18 May 2025
16 points (100.0% liked)
Programming
20226 readers
213 users here now
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Rules
- Follow the programming.dev instance rules
- Keep content related to programming in some way
- If you're posting long videos try to add in some form of tldr for those who don't want to watch videos
Wormhole
Follow the wormhole through a path of communities [email protected]
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Hi KRAW,
it's basically shared memory and lock-free queues. While that helps a lot with latency, we have been working on these topics for almost eight years, and there are a ton of things one can do wrong. For comparison, the first incarnation of iceoryx has a latency of around 1 microsecond in polling mode and with iceoryx2, we achieve 100 nanoseconds on some systems.
The payload size is always 8 bytes since we only push memory offsets to a shared memory segment.
The trick is to reduce contention as much as possible and have cache locality. With iceoryx classic, we used MPMC queues to support multiple publisher for the same topic and used reference counting across process boundaries to free the memory chunks once they were no longer used. With iceoryx2, we moved to SPSC queues, mainly to improve robustness, and solved the multi-publishing problem differently. Instead of reference counting across process boundaries for lifetime handling of the memory, we use SPSC completion queues to send the freed data back to the producer process. This massively reduced memory contention and made the whole transport mechanism simpler. There is a ton of other stuff going on to make all of this safe and also to be able to recover memory from crashed applications.
Thanks for the details! I have done MPI work in the past, so I was curious how an MPI implementation and iceoryx2 might be similar/different regarding local IPC transfers. It'd be interesting to do a detailed review of the two to see if they can benefit from each other.