You've clearly thought about the problem, so the solutions should be relatively obvious. Some less obvious ones:
- It is impossible to make TCP reliable no matter how hard you try, because anybody can inject an RST flag at any time and cut off your connections (this isn't theoretical, it's actually quite common for long-lived gaming connections). That leaves UDP, for which there are several reliability layers, but most of them are not battle-tested - remember, TCP is most notable for congestion-control! HTTP3 is probably the only viable choice at scale, but beware that many implementations are very bad (e.g. not even supporting
recvmmsg
/sendmmsg
which are critical for performance unlike with TCP; note the extram
) - If you don't encrypt all your packets, you will have random middleware mess with their data. Think at least a little about key rotation.
- To avoid application-centric DoS, make sure the client always does "more" than the server; this extends to e.g. packet sizes.
- Prefer to ultimately define things in data, not code (e.g. network packet layouts). Don't be afraid to write several bespoke code-generators; many real-world serialization formats in particular have unacceptable tradeoffs. Make sure the core code doesn't care about the details (e.g. make every packet physically variable-length even if logically it is always fixed-length; you can also normalize zero-padding at this level for future compatibility. I advise against delta-compression at this level because that's extra processing you don't need).
- Make sure the client only has to connect to a single server. If you have multiple servers internally, have a thin bouncer/proxy that forwards packets appropriately. This also has benefits for the inevitable DDoS attacks.
- Latency is a bitch and has far-ranging effects, though this is highly dependent on not just genre but also UI. For example "hold down a key to move continuously through the world" is problematic whereas "click to move to a location" is not.
- Beware quadratic complexity, e.g. if every player must send a location update to every player.
- Think not only about the database, but how to back up the database and how to roll back in case of catastrophe or exploit. An append-only flat file has a lot going for it; only periodic repacking is needed and you can keep the old version for a while with a guarantee that it'll replay to identical state to the initial version of the new file. Of course, the best state is no state at all. You will need to consider the notion of "transaction" at many levels, including scripting (you must give me 20 bear asses for me to give), trading between players, etc.
- You will have abuse in chat. You will also have cybersex. It's possible to deal with this in a privacy-preserving way by merely signing chat, not logging it, so the player can present evidence only if they wish, but there are a lot of concerns about e.g. replays, selective message subsets, etc.
- There will be bots, especially if the official client isn't good enough.
- It's $CURRENTYEAR; write code for IPv6 exclusively. There are sockopts for transparently handling legacy IPv4 clients.
- Client IP address is private information. It is also the only way to deal with certain kinds of abuse. Sometimes, you just have to block all of Poland.
- Note that routing in parts of the world is really bad. Sometimes setting up your own dedicated connection chain between datacenters can improve performance by orders of magnitude, rather than letting clients use whatever their ISP says. If nesting proxies be sure to correctly validate IPs.
- Life is simpler if internal stuff listens on a separate port than external stuff, but still verify your peer. IP whitelisting is useless except for localhost (which, mind, is all of 127.0.0.0/8 for IPv4 - about the only time IPv4 is actually useful rather than a mere mirage).