79
submitted 3 months ago* (last edited 3 months ago) by supakaity@lemmy.blahaj.zone to c/main@lemmy.blahaj.zone

Hey all! You asked for it, so here it is. I'ma drop a diagram here and will refer to it throughout the rest of this post:

Our infrastructure

In simple terms, the way it all works is we have our protected servers in the backend.

PERSISTENCE

We run our postgres database defensively with 3 dedicated (metal) servers, the main servers each have 2× 1.92TB enterprise-grade NVMe drives mirrored (RAID-1).

We use patroni + percona postgres to manage the cluster. Each instance has a software watchdog to trip-wire a failure.

Scripts automate promotion to make sure under normal circumstances (healthy cluster) that the main server, our big EPYC 7401P 128GiB DDR4(ECC) i350 NIC is the primary.

We have a second server which is a less-powerful i9-12900K 128Gib DDR4(non-ECC) i219v NIC that acts as a read-only secondary, and gets promoted to primary on failover.

Then there's the little server that replicates for backups and potential dual-failure disaster scenarios, a meager Ryzen 7 3700X 64GiB DDR4(ECC) i210 NIC and it has 2× 8TB enterprise-grade spinny disks striped (RAID-0), to keep the DB and backups safe.

It will become primary if the other 2 servers cannot.

The database servers are not connected to directly by any of our services, there's instead a load balancer in front which targets the 3× server's HAProxy instances that have one port directed at the current primary and another port for the current secondary.

This database infrastructure is the biggest, costliest and most engineered part of our setup (which is very frugal for what it is). It's taken me through a near catastrauphic situation (double server failure). You can ask Ada... I was freaking the fuck out.

Additionally we have a single Redis instance which runs on a quad CPU ARM (Ampere) 8GiB RAM 80GB SSD.

  • Primary server: €83.70/m
  • Secondary server: €61.70/m
  • Backup server: €40.70/m
  • Load balancer: €5.39/m
  • Redis: €6.49/m
  • Total: €197.98/m (AUD ~$350) for the persistence layer.

APPLICATION

We then have a prometheus/grafana monitoring box at another €6.49/m.

Then we have the services which run just like everyone else on various pieces of hardware, in docker containers for most, but some run standalone on their own hosts. Sharkey €24.49, Lemmy €12.49/m, Synapse €12.49/m, everything else smaller (pyfedi, pixelfed, friendica, photon frontend, various different static frontends, Ada's latest project of the week) runs on a single dedicated docker host costing €38.70/m.

That brings us to another €101.15/m for the application layer.

EDGE

Then under that we now have the edge nodes, which are completely standalone and do not have internal VPCs etc, it's TLS in, TLS out for these machines.

Each node is generally around 2× vCPU 1GB RAM 25GB SSD 2TB bandwidth for around €6/m. We have 4, so that's around €24/m all up.

End user flow

So when you vist the site, you will first typically hit our DNS server. Our DNS server uses the requesting IP, or EDNS client subnet provided to work out which healthy node is closest to your region, and gives you back that IP address.

Side note: It's important to realise that up until this point, what domain you requested is known to yourself and the DNS provider you used. If you care about privacy, you should install a pihole (you can install it in a docker container, you don't need a raspberry pi), and set it's upstream DNS to Quad9 with ECS and DNSSEC enabled (or if you're really adventurous, install your own unbound server and bypass intermediaries completely).

We run a dual-stack IPv4/IPv6 network all the way through so if you're on IPv6, you'll get our v6 IPs, and hopefully enjoy a less NATted, wider MTU'd, packet un-fragmented journey through the internet.

Once you have the IP address from the DNS server, you will connect to the caching caddy server on the closest healthy node to your geographical location and it will terminate your TLS session, decrypting your request.

At this point we'll see if we have the asset to the request you made cached locally, if so we can send it straight back to you, super quickly!

If not then it will connect to the upstream server in our core location over a (presumably) much larger, less latent and more resilient trunk pipe than most consumer-grade bandwidth will provide. (Unless you run your own redundant-path dark fibre into your house, which I'm not discounting entirely... I know people.)

The returned response will then be evaluated for cacheability and sent back to you.

The build

We wanted to make sure that if one of these endpoints degrade / get attacked / shut down / die, that we can spin up a new one really quick. So the deployment and configuration is completely managed by an Ansible configuration. The deployment of a single node or even a complete replacement of all nodes takes about 10 minutes.

Our nodes run 2 pieces of third party software, and a few scripts to manage things.

DNS

For the DNS resolution we run gdnsd with 2 plugins, http_status and geoip.

The http_status plugin monitors the health of the other nodes to make sure it's not sending people to nodes that don't respond.

The geoip plugin uses the requesting IP to determine what region/country you're from, and select a priority list of nodes closes to that region. The first healthy node in that list is the node that's selected.

CACHING REVERSE-PROXY

For the web-serving component, we run a custom xcaddy compiled caddy server with a few modules included: cache-handler, otter storage, coraza WAF, ratelimit, layer4, maxmind and crowdsec.

At the moment only the first 3 are in use, but the other 4 are included in case we need to mitigate attacks or other edge cases in the future.

And that's pretty much it!

If anyone wants any help with setting up their own version of this, or needs more details, let me know. I'd be happy to help.

If a lot of people are interested (which I doubt at this stage, but who knows?) I'd even be willing to create a project or make it dockerisable etc, but I suspect that it's something that most people would just use Cloudflare et. al. for, if the privacy aspect wasn't such a concern.

175
submitted 3 months ago* (last edited 3 months ago) by supakaity@lemmy.blahaj.zone to c/main@lemmy.blahaj.zone

I've recently found myself without much to do (short version: the company that my company was contracting to went into voluntary administration just before Christmas, while Ada and I were away in Melbourne), so I've had a little bit of time on my hands to do some work on the site infrastructure, free from meetings and corporate wankery. YAY!

One of the things I've wanted to do for a while now is setup some form of edge-node caching and geo-DNS to get the various sites we host closer to you folks who use our instances.

And yes, there's Cloudflare... and Akamai... and Bunny.net... however as a safe-haven for vulnerable minorities, and with the geo-political situation the way it is these days, we really need to be super careful about where we terminate your connections. Who are the intermediate people who can see and collect your data. Who can switch our servers off at a moment's notice, suspend the domain names, shut us down?

Until recently we've known that we are slow on the edge, but we controlled all our own hardware, and we've not had the capacity to do much about it.

So over the last few days, I've taken the time to setup a bunch of edge nodes, migrate DNS away from third party providers, move domain name registrars.

The end result is that (with a few minor site interruptions) now we have our own CDN that we control all the way from DNS resolution until you hit the database on our dedicated servers. Your traffic is encrypted all the way through, our core infrastructure isn't exposed to people who sniff around to see who they can try to report us and shut us down, nobody else can see your browsing it in transit, and for people not in or around Finland, it's noticeably faster to load the site and click around.

To make sure you're all fully informed, I'll carefully disclose our decisions and new structure.

Firstly our edge servers are on Vultr and DigitalOcean. These 2 providers from our research seem to be quite neutral and non-politically aligned, and neither one by themselves can take us entirely down, and neither one of them are where our core infrastructure is located.

Secondly our edge locations have been carefully chosen to be regions that are outside jurisdictions where we can currently see political turmoil, overly zealous conservatism and fascist activity. We've chosen Toronto Canada, Sydney Australia, Amsterdam Netherlands and Frankfurt Germany as our edge node and DNS server locations.

Thirdly we've moved our domains into EuroDNS registrar to minimize the chance that the USA pressures companies to take action against our domains. EuroDNS is a large company headquartered in Luxembourg, and with no ties to the US itself, it's parent company or any sibling companies, this gives us comfort that they can resist any political pressure which may be applied.

If there's any interest in how we setup the infrastructure, let me know and I can make a separate technical post about it.

EDIT - here it is: https://lemmyverse.link/lemmy.blahaj.zone/post/36690717

[-] supakaity@lemmy.blahaj.zone 28 points 2 years ago* (last edited 2 years ago)

The pict-rs upgrade is ongoing.

From what I can tell it'll be about another 5 hours. I'm going to have to go to bed and check on it in the morning.

Unfortunately the stock-standard lemmy-ui doesn't like it that pict-rs is migrating to a new version of database and not serving images, so it's stubbornly just not working at all.

100
Lemmy updated to v0.19.5 (lemmy.blahaj.zone)

Hey all!

Our lemmy.blahaj.zone has been updated to v0.19.5.

Let us know if you notice any issues with the upgrade!

[-] supakaity@lemmy.blahaj.zone 24 points 2 years ago

Our best haj, Shonky (they/them) is available now, over at their own Github repository for use when referring to Blåhaj Lemmy.

I'm guessing we'll need them to make an appearance for the Canvas template.

50
Test upload image (lemmy.blahaj.zone)

Just testing image upload.

114
Alternative frontends (lemmy.blahaj.zone)

Hi all our lovely users,

Just a quick post to let you all know that along-side the upgrade to 0.19.3, we've also added a couple of alternate UIs to the Blåhaj lemmy for you.

Obviously the default lemmy-UI at https://lemmy.blahaj.zone still exists and has been updated to 0.19.3 alongside the lemmy server update.

There's also now an Alexandrite UI at https://alx.lemmy.blahaj.zone which is a more modern, smoother UI, written in svelte, by sheodox.

And then for those who are nostalgic for reddit days of yore, and memories of when PHP websites last ruled the earth, there's MLMYM (courtesy of rystaf) at https://mlmym.lemmy.blahaj.zone.

Please enjoy, and I hope the upgrades work well for you.

25
¡La mariposa, muy bonita! (lemmy.blahaj.zone)

Esto es una prueba

46
Testing image upload (lemmy.blahaj.zone)

Test

[-] supakaity@lemmy.blahaj.zone 30 points 2 years ago

Migration has been completed!

115
submitted 2 years ago* (last edited 2 years ago) by supakaity@lemmy.blahaj.zone to c/main@lemmy.blahaj.zone

We're currently in the process of migrating our pict-rs service (the thing responsible for storing media/images/uploads etc) to the new infrastructure.

This involves an additional step of moving our existing file-based storage to object storage, so this process will take a little time.

New images/uploads may not work properly during this migration, however existing images should continue to load. We expect this migration to take about an hour.

[EDIT]

Migration has completed.

685,271 files / 153.38 GB were migrated. Copying to object storage took about 1.5 hours. Starting service back up on new server and debugging took another 30 minutes.

Timeline:

  • Migration started at 2023-10-01 22:43 UTC.
  • [+1h32m] Objects finished uploading to object storage at 2023-10-02T00:15 UTC.
  • [+2h06m] Migration was completed at 2023-10-02 00:46 UTC.
[-] supakaity@lemmy.blahaj.zone 22 points 2 years ago

You are super welcome, lovely.

It brings Ada and I a whole heap of pleasure running these instances and it's largely knowing that we're making a difference to our users, and providing a safe space for you all to grow and flourish that makes it all worth it for us.

blobhaj, hug, tinybla

74
submitted 2 years ago* (last edited 2 years ago) by supakaity@lemmy.blahaj.zone to c/main@lemmy.blahaj.zone

Blåhaj Lemmy will be down for database migration to the new servers in approximately 1.5 hours from now (06:00 UTC).

Downtime is estimated at under an hour.

I will have more details on the maintenance page during the migration and update the status as the migration progresses.

[-] supakaity@lemmy.blahaj.zone 70 points 2 years ago

I have been watching my love tie herself in knots over the last several days, having to deal with the drama that has been brought on, trying her best to bring everyone back together.

There's been bad behaviour from both sides, and I'm really disappointed to see that some of the worst of it came from our users, who didn't keep to the moral high ground, disregarded our instance rules and stoopped to levels of behaviour worse than that leveled against them.

There have been accusations against us (or Ada specifically) that we are a safe harbour for bad behaviour and cause harm to trans people through our inaction.

This is perhaps the cruelest accusation they could have leveled at Ada, as she works tirelessly to maintain a safe space for our community, and while I was hoping, for all the effort that she was investing into this issue, that she could make it work despite my own reservations, this last attack on her impeccable morality has made me very angry.

I'm sorry for those that wanted to remain federated, sorry that it came to this, but I am glad it's over now, purely for the mental health of my precious beloved.

[-] supakaity@lemmy.blahaj.zone 25 points 2 years ago

Okay, so that was way more painful than expected... /sigh

57

The server will be briefly down while we install a new updated version of lemmy and restart it.

The maintenance window is 15 minutes, but should be much shorter.

246
State of the shork! (lemmy.blahaj.zone)

So it's been a few days, where are we now?

I also thought given the technical inclination of a lot of our users that you all might be somewhat interested in the what, how and why of our decisions here, so I've included a bit of the more techy side of things in my update.

Bandwidth

So one of the big issues we had was the heavy bandwidth caused by a massive amount of downloaded content (not in terms of storage space, but multiple people downloading the same content).

In terms of bandwidth, we were seeing the top 10 single images resulting in around 600GB+ of downloads in a 24 hour period.

This has been resolved by setting up a frontline caching server at pictrs.blahaj.zone, which is sitting on a small, unlimited 400Mbps connection, running a tiny Caddy cache that is reverse proxying to the actual lemmy server and locally caching the images in a file store on its 10TB drive. The nginx in front of lemmy is 301 redirecting internet facing static image requests to the new caching server.

This one step alone is saving over $1,500/month.

Alternate hosting

The second step is to get away from RDS and our current fixed instance hosting to a stand-alone and self-healing infrastructure. This has been what I've been doing over the last few days, setting up the new servers and configuring the new cluster.

We could be doing this cheaper with a lower cost hosting provider and a less resiliant configuration, but I'm pretty risk averse and I'm comfortable that this will be a safe configuration.

I woudn't normally recommend this setup to anyone hosting a small or single user instance, as it's a bit overkill for us at this stage, but in this case, I have decided to spin up a full production grade kubernetes cluster with a stacked etcd inside a dedicated HA control plane.

We have rented two bigger dedicated servers (64GB, 8 CPU, 2TB RAID 1, 1 GBPS bandwidth) to run our 2 databases (main/standby), redis, etc on. Then a the control plane is running on 3 smaller instances (2GB, 2 CPU each).

All up this new infrastructure will cost around $9.20/day ($275/m).

Current infrastructure

The current AWS infrastructure is still running at full spec and (minus the excess bandwidth charges) is still costing around $50/day ($1500/m).

Migration

Apart from setting up kubernetes, nothing has been migrated yet. This will be next.

The first step will be to get the databases off the AWS infrastucture first, which will be the biggest bang for buck as the RDS is costing around $34/day ($1,000/m)

The second step will be the next biggest machine which is our Hajkey instance at Blåhaj zone, currently costing around $8/day ($240/m).

Then the pictrs installation, and lemmy itself.

And finally everything else will come off and we'll shut the AWS account down.

[-] supakaity@lemmy.blahaj.zone 38 points 2 years ago

So, one thing I'd mention is the systems and admin work involved in running an instance.

This is on top of the community moderation, and involves networking with other instance admins, maintaining good relations, deciding who to defeferate from, dealing with unhappy users, etc.

Then there's the setup and maintenance of the servers, security, hacks, DDoSing, backups, redundancy, monitoring, downtime, diagnosis, fixing performance issues, patching, coding, upgrades etc.

I wouldn't be here doing this without @ada. We make a formidable team, and without any self effacement, we are both at the top of our respective roles with decades of experience.

Big communities also magnify the amount of work involved. We're almost at the point where we are starting to consider getting additional people involved.

Moreover we're both here for the long haul, with the willingness and ability to personally cover the shortfall in hosting costs.

I'm not trying to convince you to stay here. But in addition to free hardware, you're going to need a small staff to do these things for you, so my advice is to work out if you have reliable AND trustworthy people (because these people will have access to confidential user data) who are committed to do this work long term with you. Where will you be in 3 years, 5, 10?

[-] supakaity@lemmy.blahaj.zone 81 points 2 years ago

To be clear, $3k is an accurate, but unacceptable amount.

As in that's what it's actually costing us, but it's not what it should be costing. I'd imagine more like $250 is what we should be paying if I wasn't using AWS in the silly way I am.

I'm admitting up front that I've been more focused on developing rather than optimising operating costs because I could afford to be a little frivolous with the cost in exchange for not having to worry about doing server stuff.

Even when the Reddit thing happened I was wilfully ignoring it, trying to solve the scaling issues instead of focusing on the increased costs.

And so I didn't notice when Lemmy was pushing a terabyte of data out of the ELB a day. And that's what got me.

About half that $3k is just data transfer costs.

Anyhow the notice was just to let our users know what is going on and that there'll be some maintenance windows in their future so it doesn't surprise anyone.

We have a plan and it will all work out.

Don't panic or have any kneejerk reactions, it's just an FYI.

[-] supakaity@lemmy.blahaj.zone 49 points 2 years ago

Just want to say, I don't blame anyone else but myself.

I certainly don't blame anyone at 196.

I hope I'm really clear about that. It's one of the reasons I specifically didn't name 196 in my announcement.

We've got a solution planned, we've already started to implement it and have the image transfer issue solved already.

We can afford to cover this ridiculous AWS bill, I just need to do some maintenance work so this doesn't continue because I can't continue to line Jeff Bezos' pockets like this indefinitely.

126

Discussion of the current situation with the Blåhaj instances, and upcoming maintenance.

[-] supakaity@lemmy.blahaj.zone 28 points 2 years ago

Migration complete.

[-] supakaity@lemmy.blahaj.zone 28 points 2 years ago

It wasn't an actual emojo. The script processed the SQL header column names as an emojo and tried to add them. Unfortunately publicUrl is not a valid URL, so lemmy's /api/v3/site metadata endpoint started returning an error relative URL without a base instead of the JSON that the website was expecting and so the site just stopped working for everyone the next time it tried to load that url.

[-] supakaity@lemmy.blahaj.zone 30 points 2 years ago* (last edited 2 years ago)

How do you know I haven't always been the hacker who's in control? :D

view more: next ›

supakaity

0 post score
0 comment score
joined 3 years ago
MODERATOR OF