Ok, so I have some code to crawl a posting of a community and compare two servers for comments missing. It looks bad today. Both of these servers are version 0.18.0 and have been upgraded for several days.
missing 0 unequal 0 11 on https://lemmy.ml/ vs. 11 on https://sh.itjust.works/
missing 35 unequal 1 48 on https://lemmy.ml/ vs. 14 on https://sh.itjust.works/
missing 4 unequal 0 9 on https://lemmy.ml/ vs. 5 on https://sh.itjust.works/
missing 6 unequal 0 9 on https://lemmy.ml/ vs. 3 on https://sh.itjust.works/
missing 1 unequal 0 1 on https://lemmy.ml/ vs. 0 on https://sh.itjust.works/
missing 6 unequal 0 12 on https://lemmy.ml/ vs. 6 on https://sh.itjust.works/
missing 3 unequal 0 8 on https://lemmy.ml/ vs. 5 on https://sh.itjust.works/
missing 3 unequal 0 6 on https://lemmy.ml/ vs. 4 on https://sh.itjust.works/
missing 22 unequal 0 42 on https://lemmy.ml/ vs. 20 on https://sh.itjust.works/
missing 5 unequal 0 15 on https://lemmy.ml/ vs. 10 on https://sh.itjust.works/
missing 8 unequal 2 17 on https://lemmy.ml/ vs. 9 on https://sh.itjust.works/
missing 3 unequal 0 3 on https://lemmy.ml/ vs. 0 on https://sh.itjust.works/
missing 0 unequal 0 10 on https://lemmy.ml/ vs. 10 on https://sh.itjust.works/
missing 11 unequal 0 24 on https://lemmy.ml/ vs. 13 on https://sh.itjust.works/
missing 1 unequal 0 2 on https://lemmy.ml/ vs. 1 on https://sh.itjust.works/
missing 13 unequal 0 37 on https://lemmy.ml/ vs. 24 on https://sh.itjust.works/
missing 3 unequal 0 7 on https://lemmy.ml/ vs. 4 on https://sh.itjust.works/
missing 0 unequal 0 10 on https://lemmy.ml/ vs. 10 on https://sh.itjust.works/
missing 60 unequal 2 186 on https://lemmy.ml/ vs. 126 on https://sh.itjust.works/
missing 10 unequal 2 51 on https://lemmy.ml/ vs. 41 on https://sh.itjust.works/
missing 16 unequal 0 51 on https://lemmy.ml/ vs. 36 on https://sh.itjust.works/
missing 31 unequal 3 128 on https://lemmy.ml/ vs. 97 on https://sh.itjust.works/
missing 0 unequal 0 4 on https://lemmy.ml/ vs. 4 on https://sh.itjust.works/
missing 2 unequal 0 5 on https://lemmy.ml/ vs. 3 on https://sh.itjust.works/
missing 15 unequal 1 67 on https://lemmy.ml/ vs. 52 on https://sh.itjust.works/
missing 4 unequal 0 53 on https://lemmy.ml/ vs. 49 on https://sh.itjust.works/
missing 0 unequal 0 5 on https://lemmy.ml/ vs. 5 on https://sh.itjust.works/
missing 0 unequal 0 0 on https://lemmy.ml/ vs. 0 on https://sh.itjust.works/
missing 1 unequal 0 19 on https://lemmy.ml/ vs. 18 on https://sh.itjust.works/
missing 0 unequal 0 2 on https://lemmy.ml/ vs. 2 on https://sh.itjust.works/
missing 0 unequal 0 22 on https://lemmy.ml/ vs. 22 on https://sh.itjust.works/
missing 0 unequal 0 16 on https://lemmy.ml/ vs. 18 on https://sh.itjust.works/
missing 0 unequal 0 7 on https://lemmy.ml/ vs. 7 on https://sh.itjust.works/
missing 3 unequal 0 27 on https://lemmy.ml/ vs. 24 on https://sh.itjust.works/
missing 2 unequal 0 32 on https://lemmy.ml/ vs. 30 on https://sh.itjust.works/
missing 3 unequal 0 21 on https://lemmy.ml/ vs. 18 on https://sh.itjust.works/
missing 3 unequal 1 16 on https://lemmy.ml/ vs. 13 on https://sh.itjust.works/
missing 3 unequal 1 47 on https://lemmy.ml/ vs. 44 on https://sh.itjust.works/
missing 1 unequal 0 24 on https://lemmy.ml/ vs. 23 on https://sh.itjust.works/
The number of comments is based on loading comments, not the counts at the top of the posting.
I think I saw one of your earlier posts and I really appreciate you chasing this down and raising awareness. As a relatively savvy user this is definitely something I've noticed and I share your concern that it will slowly erode user's trust in the concept of federation.
Technically, can you trace where the comments are dropped? does the target receive the response but fails to process it, or does it break somewhere at the network layer? if so, is there no receiver "ack" built into the protocol? sorry for asking a bunch of questions and feel free to ignore (I'm an engineer but I don't know much about the federation protocol...)
There are multiple timing and resource issues with the way content is sent. Every single vote and comment has a lot of overhead, and the Lemmy servers are causing each other to slow down with the overhead of it all. There are even very tight security timings that have been hit causing rejection. And there system has no automated way to repair missing content, it just tires to keep up with each new posting, comment, vote.
How much of this is just the nature of activity pub and federation over it?
Putting aside whether lemmy is doing a good or bad job, it seems like an issue at the protocol level? For instance, if lemmy were to implement some additional procedures as you hint at, would they work for federation outside of lemmy or would they even cause problems or bugs?
I’m pretty sure I’ve seen things get dropped in similar ways on other micro-blog platforms. It could be that the community/group based structure just surfaces the issues more because you can compare whole communities instead of individual reply threads in microblogging. Maybe activity pub isn’t appropriate for federated group activity?
Have you looked at similar issues with kbin?
Also, thanks for this work. High level end-to-end testing like this is probably invaluable!!
Just to pose these in a similar thread, I have a few questions as a casual observer, some of which I'm unclear if they're handled at the protocol or Lemmy level.
Sorry I don’t know the answers to these questions. Also, I don’t think the OP will get a notification for your content (?) unless you reply to them directly, just in case you want to ask them.