this post was submitted on 12 Jun 2023
12 points (80.0% liked)

Selfhosted

40006 readers
705 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

Correct me if I'm wrong. I read ActivityPub standards and dug a little into lemmy sources to understand how federation works. And I'm a bit disappointed. Every server just has a cache and the ability to fetch something from another known server. So if you start your own instance, there is no profit for the whole network until you have a significant piece of auditory (e.g. private instances or servers with no users). Are there any "balancers" to utilize these empty instances? Should we promote (or create in the first place) a way how to passively help lemmy with such fast growth?

top 50 comments
sorted by: hot top controversial new old
[–] [email protected] 17 points 1 year ago (3 children)

You are right. On the one hand, it's kind of bad, naive distributed architecture (my day job), it could have been done much better. On the other hand, the more important point is that it demonstrates an alternative to centralized. We'll learn a lot about usage patterns here, get new ideas, and either improve Lemmy or build something better from the ground up. Big thanks to Reddit for driving users this way to test scalability and get much better knowledge of usage.

[–] [email protected] 6 points 1 year ago (1 children)

It's not distributed architecture as you normally think it - it's a decentralised federation. It's an important distinction from your typical distributed architecture app.

[–] [email protected] 1 points 1 year ago (2 children)

Can you explain what's the difference?

[–] [email protected] 3 points 1 year ago* (last edited 1 year ago)

A distributed architecture generally refers to a single application or service designed to be resilient to individual data center failures. For example, Reddit, a centralized application controlled by Reddit itself, operates data centers around the world to process user transactions. In the event of an outage in a specific location, such as California, Reddit would still be able to function because its infrastructure for handling user requests and serving data would automatically switch to other functioning data centers elsewhere, like Nevada, Arizona, or Washington. This is an example of a distributed architecture.

On the other hand, a decentralized federation does not consist of a single application. Instead, it involves a software platform like Lemmy, which is hosted on multiple individual hosts. When a user signs up with one host, they can interact with users from other hosts, but each host manages its own infrastructure. For instance, someone could host a Lemmy instance on an old laptop they found in their closet and name it ballsuckers.com, while another person could host a Lemmy instance in the cloud with a properly designed distributed architecture and name it bingbong.com. Each host is responsible for managing its own instance. Users from both instances can interact with each other, but if, for example, the hard drive of ballsuckers.com were to fail, the entire ballsuckers.com instance would go down. However, this would not affect bingbong.com because its infrastructure is separate and managed independently.

I hope this helps!

[–] [email protected] 2 points 1 year ago (1 children)

What makes a distributed system good that Lemmy hasn't done? Seems like a pretty robust system to me, seems like scaling issues are on the instance host themself. With Reddit's experience, I don't see how there are issues

[–] [email protected] 1 points 1 year ago

If there was an easy solution that balanced decent UX and performance, we'd have it by now!

[–] [email protected] 1 points 1 year ago (1 children)

it could have been done much better.

Care to expand on this point?

[–] [email protected] 3 points 1 year ago (2 children)

Disclaimer: I've only looked a bit at the protocols and high levels descriptions of how it works, and this is just my understanding of it. But it seems to track.

let's take .. [email protected] for example. Right now lemmy.world is the Source of Truth on this, which means if you sign up for it on a different host, let's say myawersomeinstance.com, that first contacts lemmy.world, copies over posts, and then subscribes on new posts for that. Actually not 100% sure if lemmy.world contacts myawersomeinstance.com when there's a new post, or myawersomeinstance.com polls lemmy.world.. But anyway, point is, lemmy.world is authority on it. myawersomeinstance.com also have [email protected] data, but it's a copy of it. And lemmy.world is only authority. So if you post something, your server then sends it to lemmy.world and waits a reply. Then lemmy.world contacts all instances that has at least one user following this to tell about the new post. And that new post now exists on a few hundred databases.

The problem is the scaling is whack. Okay, you can have 5000 federated servers with users subscribing to [email protected], but that means lemmy.world needs to update 5000 servers per post, and there'll be 5000x storage used for that post, and ALL 5000 servers contacts lemmy.world to get the new good stuff.

Frankly, it's a scaling nightmare. As for a different approach, you could have private / public keys and sign updates from lemmy.world and allow the other instances to fetch the new data from each other. That would also allow more relaxed caching, since it would be generally lower cost to re-fetch the data. Now you need aggressive caching because you don't want lemmy.world to keel over and die form every server on the planet wanting to hear the latest and greatest posts all the time.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

Thanks for the in depth write up! I haven't looked too far into the docs or the subscription model, but is this a fault on Lemmy's end, or is this a function of how activity pub handles federated communication? (I'm very new to activity pub/federation, just now reading through the activity pub docs)

I do like your idea of distributed replication via keys,much better than what I had brainstormed

Edit: yeah it does look like it's a function of activity pub, wonder if theres a more scalable federation protocol out there

[–] [email protected] 1 points 1 year ago (1 children)

Could lemmy.world put a load balancer in front and use that to direct requests to different instances of lemmy.world? Not sure if that question is dumb I'm not a technical guy.

[–] [email protected] 1 points 1 year ago

It's not dumb at all, and it's a common scaling technique. But the software needs to support it, and I have no idea if lemmy has support for running multiple instances for one server.

[–] [email protected] 4 points 1 year ago

Are all these thousands of lemmy servers useless?

almost. It's actually worse than that - when you subscribe to a community from your server it will fetch like 20 posts and that's it, you'll get only new stuff after that, so there's no possibility to do a full mirror of selfhosted, for example, if you started your instance today and didn't fetch posts and comments manually.

ActivityPub per se is just a spec on s2s/s2c communication, which is not a great thing since in many cases it assumes single source of truth, which potentially puts huge load on more popular instances.

I think a quick and dirty hack to this could be the following - each linked instance may maintain cache of announces (so there would be benefit of just forwarding original http signed requests w/o being afraid of malicious actor), which your instance could pull, this way you could populate your mirror without overloading the original source.
Distributed activities propagation though... Let's say there are some design steps involved to make this truly distributed, however I feel like it's possible.

[–] [email protected] 4 points 1 year ago (1 children)

What's the alternative? You go full-banana decentralised or mega-site Reddit. I think Lemmy is a nice middle ground

[–] [email protected] 9 points 1 year ago (1 children)

Proper data model would be a start, i.e. public-key based identities instead of just the old name@server. That way you could hop from server to server and still be the same account. Would make the whole thing a hell of a lot more robust, as in case of server failure could just continue as if nothing happened on another server.

[–] [email protected] 2 points 1 year ago

That's a pretty cool idea! Keybase and SIWE were getting there, but hasn't really taken off in a big way yet

[–] [email protected] 3 points 1 year ago* (last edited 1 year ago)

A network of (“thousands of”) servers has — like most things — pros and cons.

Some of the pros are:

  • The network is more resiliant against outages. If lemmy.ml is down, all other users can still access the network.
  • It's hard to take legal action against the network or to buy it out (like Big Players™ like to do to get rid of potential competitors).
  • It allows various similar or even conflicting moderation policies. The network, i.e. the infrastructure doesn't allow or prohibit any specific opinion (the communities do).
  • It allows for different ways to pay the bills: goodwill of the admin, donaitions, ads, fee or selfhosting. The latter also allows great control over the data so you control your privacy.

Some of the cons are:

  • Content is replicated across servers, which increases the total amount of data stored.
  • Latency and speed suffer.
  • Interoperability with the wider Fediverse is less than 100%, which can create confusion and frustration.
  • Discovery is more difficult.
[–] [email protected] 3 points 1 year ago

I'm quite worried of how well this federation system will work in the long run, especially when more people coming from Rexxit. As people make more post/comments, every federated instance will have to cache more redundant contents from each other, which also will use more storage thus increasing the fee of every instance hoster. There's also another problem of visibility in search engines. Because Lemmy/Kbin can be hosted by anyone, it makes searching on a specific domain impossible, unlike how I can just add "reddit" in the search query. Also since there are multiple Lemmy/Kbin instances, there's a chance there'll be similar communities spread over, fragmenting the communities even further. Until they can find a way to fix those problem, I don't think federation is suited for large scale communities.

As for fragmentation problem, maybe adding a global search for communities like this will help reducing fragmentation. Users can still make their own community in their instance, while other people who don't need to can easily find the community they want.

[–] [email protected] 3 points 1 year ago

They are not useless, if the users would actually spread out among them. Each server has its limits.

[–] [email protected] 3 points 1 year ago (1 children)

I just commented on this in another thread: https://lemmy.world/comment/76011

TL;DR: The server-to-client interactions on Lemmy are a lot heavier than the server-to-server interactions, so even if you're just using your own server to interact with communities on other servers, it should still take load off of the servers you would have been using directly.

[–] [email protected] 3 points 1 year ago

That's news to me. I thought serverto-server interactions would be heavier since other instances will keep fetching contents from your instance once they start federating. I guess it's better to join less populated instances instead of crowding on a single instance.

[–] [email protected] 2 points 1 year ago

Based on the bit of research I have done, along with creating https://lemmyonline.com/

It seems you are correct. A small handful of servers contains roughly 95% of the user-base.

I think the intended way for this to work, certain communities can be hosted on their own servers. However, it appears most of the popular communities migrating away from reddit, all flocked to lemmy.world, which is likely contributing to it being overloaded.

[–] [email protected] 2 points 1 year ago (2 children)

This has definitely been a problem with communities being created on the bigger instances and not utilising smaller instances. Happy for someone to say I'm wrong etc, but I think there would be merit in capping instances to x number of users or communities, to force the user base to spread out.

Also, the way signups work, (ie you find a community you like then click sign up but that signs you up to that instance), further exacerbates the issue and the confusion around how federation works. The sign up links on each instance should lead either to a page with an instance finder, or to a random instance that matches the profile of, and is already federated with, the instance you were on. Otherwise the larger instances have a monopoly and are just going to lead to a bad user experience when they can't cope with the traffic.

It's a self defeating prophecy if users only want to sign up to the instances with the big communities, because then everyone is going to keep creating communities there and nobody is going to want to join a smaller instance.

I might be talking nonsense and am happy to be told why that is all wrong :)

[–] [email protected] 1 points 1 year ago

If that cap idea was to exist, it would make sense to have it based on the balance of users across the federated servers, so of there's enough with a similar amount it raises the cap

[–] [email protected] 1 points 1 year ago

Yes, there should be instance caps, and they should be visible to users.

That way users can scale, choose, without much thinking.

This same techinque works everywhere, for example MMO games. You have availability visible and choose servers according to it.

This would fix scaling partially without much technical changes.

[–] [email protected] 1 points 1 year ago (1 children)

I just spun up my own instance as well and it does feel a bit like I'm just pulling from the biggest instances and feeding my own without really being able to give much back.

[–] [email protected] 2 points 1 year ago

You're reducing load on the bigger instances by not using them directly, which is giving something back

[–] [email protected] 1 points 1 year ago

I don't think that there are thousands. The fediverse stat's show about 300 servers, 200 or so made in the last week.

At that rate, it is not too bad. I expect there will be a plateau at some point, relatively soon, where the need for new ones stop, and the experimental ones disappear.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

I've suggested a routing protocol to the lemmy devs - to use federated instances to route all the messages to other federated instances. The idea was received with some interest, but it seems that people believe that there's still a ton of performance that can be squeezed out from the current architecture through optimisations.

load more comments
view more: next ›