admin

joined 1 year ago
MODERATOR OF
[–] [email protected] 1 points 1 year ago (1 children)
[–] [email protected] 2 points 1 year ago

Heya,

I still need to create some tools to make to easily add new subreddits to the bot. I'll probably get around to that this weekend, and then I'll add /r/theyknew and notify you. As far as I'm concerned, it's a great contender for synchronisation/archiving.

[–] [email protected] 3 points 1 year ago

Can't blame you for that. Personally, I still think it excels at content where communication with OP is irrelevant, like [email protected], [email protected] or [email protected]. And by far best example of this, if you look at the subscriber count, is nsfw content.

[–] [email protected] 2 points 1 year ago

Nope. That would be very hard to implement, and probably very confusing and disliked by other lemmy users.

[–] [email protected] 4 points 1 year ago

I don’t know how the karma thresholds work behind the scenes, but might I suggest for the bot to do a “top for” sort instead? Like it will only repost top content for the past 6 hours only. This will also help get more quality content as well and avoid reposting low effort/quality posts.

This is effectively already kinda how it works. For each subreddit it periodically (anywhere between every 30 minutes to every 12 hours, based on subscriber count and posts per day) requests the "hot" content feed. It then checks each post if it has at least 20 upvotes, and a 80% upvote to downvote ratio. Those numbers are configurable, but that's what they're currently set to - I believe they're a good mix between filtering out the complete garbage while still making sure it doesn't miss good content is.

 

A few months ago, I launched the Lemmit instance and bot (@[email protected]). Primarily, this was to help me stay up to date with some of the content I'd leave behind on Reddi. Additionally, I wanted to give back to the community, so I made it possible for anyone to request the archiving of subreddits to the Lemmit instance.

However, this came with some unintended consequences. Notably, the most subscribed community on the instance has been [email protected]. Even though it should have been obvious that there is no way to communicate with the Original Poster, given they're on Reddit.

The pushback against the bot and the instance has increased over time. A recent post, This bot is bad for Lemmy, highlighted these concerns. I've also received similar feedback from admins of major Lemmy Instances and through direct PMs.

As a response, last week I stopped accepting requests for archiving new subreddits. This weekend, I went a step further by discontinuing the archiving of a large amount of "interactive subreddits"—communities primarily centered around Q&A or communication with the Original Poster. This includes subs like [email protected] and [email protected], as well as niche and support communities. Such discussions are better hosted on Reddit or Lemmy's equivalent spaces.

I've also adjusted the post karma thresholds to curb spam posts. While this probably won't appease everyone, it should reduce the bot's posting frequency.

Perhaps this might prompt some admins to rethink their choice to defederate from the Lemmit instance, or the banning of the bot. I'm not expecting anyone to, and won't take it personally if you don't, but I wanted to give the community this update nonetheless.

In [email protected] there's a sticky post of all the Actively archived communities on the server (including NSFW ones, since that is not public without logging in), as well as the list of communities for which archiving is now disabled.

Cheers!

[–] [email protected] 2 points 1 year ago (4 children)

What.

You want to mirror a Lemmy community onto Lemmit? :s

Also, see sidebar.

 

I think there's enough for now. If anything - there's going to be some heavy pruning in the amount of subs that are being maintained.

[–] [email protected] 1 points 1 year ago
[–] [email protected] 3 points 1 year ago (1 children)

Oh hey, it's you!

Thanks for the .rocks universe :)

[–] [email protected] 2 points 1 year ago

Good news, everyone! The plexsubs subreddit was banned and this had caused the lemmit bot to get stuck in an infinite loop.

Wait, that's not good news at all!

Well, at least the bug has now been fixed.

 

As discussed here, I have implemented a minimum level of upvotes that a post needs to have on reddit, as well as a minimum ratio of upvotes to downvotes.

Right now I have those configured to require at least 5 upvotes, and more upvotes than downvotes (0.51). At first glance this already seems to be great improvement. There might be some tweaking later.

As a side note I have now switched from using the reddit RSS feed, to using the JSON feed. This was required in order to get easy access to the upvote/ratio properties. So there might be some new and interesting new bugs introduced because of that. It's a brave new world.

Needless to say, the first thing I'll do after releasing this, is plop down on the couch with a beer, and hope this doesn't crash. Fingers crossed!

[–] [email protected] 1 points 1 year ago

Because it already exists, you dolt.

[–] [email protected] 1 points 1 year ago

[email protected] is already a thing brah.

[–] [email protected] 6 points 1 year ago

Personally I'd be fine with allowing it in bios only. If people want to see more, they'll check out the bio, and see the link there. In other cases someone will just be like "... Nice." without feeling advertised to.

In the end, it's all about the rules the community itself puts up. Personally, I get more enjoyment out of fewer "real" (imperfect/amateur) out-of-love quality, than more perfect/fitgirl for-profit quantity. But I'm aware this is generally a minority opinion.

 

I'd like to hear some feedback on this, or approach vectors.

Right now the bot is rather spammy. I was hoping that by using Reddits HOT feed, it would return have some level of quality control (I know, right?). Unfortunately, it seems that in most cases, it will just return anything that's new. The downside of this is that a lot of garbage gets through, and the bot spends a lot of time scraping the underlying page to get the details.

I propose to only archive reddit posts that have a karma score of 5 or higher. In case of subs that hide the karma scores of posts for a certain time, they'd have to be at least 2 hours old, so that the Reddit moderators can weed out garbage on our behalf.

Do you folks have any thoughts on this?

Secondly, I want to put sticky comments on each community, with links to native Lemmy communities that cover the same subject. For this I would need some kind of API, or a master list of... oh, I see sub.rehab has just the thing I need. So expect that somewhere this week :).

 

See you on the other side!


So the update is done, but the bot was offline for 6 hours, and needed to catch up.

Unfortunately, another update slipped through, which switched the default feed from www.reddit.com to old.reddit.com, which has the side effect of changing all the urls in the posts as well. On one hand this is great, because new reddit sucks. On the other hand, this is terrible, because for every post the bot encounters, it checks if it already exists on lemmit... based on the url.

So for every post the bot encountered, it went like "old.reddit.com/r/blabla/123? Haven't seen that one yet, there's an www.reddit.com/r/blabla/123, but that must be something completely different, let's post it again!"

This also meant that the bot took over a minute and a half to update each community because it takes a couple of second per post. When I went to bed last night, I figured it was just posting a lot of content because it had so much catching up to do. But this morning I figured something was off because it still hadn't caught up.

Anyway, the fix is out now. Sorry for all the duplicates. I need coffee now.

 

ChatGPT, write a post for the stuff that I have in my head and want to get out as an update.

Hmm. No brain implant yet. Guess I'll have to write this the hard way.

Syncing update

It has been an eventful week. I successfully deployed the initial version of smarter content syncing, and have made some adjustments to algorithm since then. Most notably, communities with only 1 subscriber (the bot) will no longer receive updates, and communities with fewer than 5 subscribers or with a low posting frequency will only be updated twice a day. Furthermore, for the highest update priority (every 10 minutes), a community must have a minimum of 50 subscribers. Implementation details can be found in the decide_interval() method over here.

Being a developer is fun

Meanwhile... Damnit, bot is stuck again.

2023-07-08 10:13:39,945 - utils.syncer - INFO - Scraping subreddit: bustynaturals. Last time  2:30:48 ago, interval 120 minutes
2023-07-08 10:13:40,653 - utils.syncer - INFO - 'latina bodies are the best' at https://www.reddit.com/r/BustyNaturals/comments/14twww8/latina_bodies_are_the_best/ updated: 2023-07-08 07:14:13+00:00
2023-07-08 10:13:45,324 - utils.syncer - ERROR - Error trying to retrieve post details, try again in a bit; Couldn't retrieve post detail page
2023-07-08 10:13:46,333 - utils.syncer - INFO - Scraping subreddit: bustynaturals. Last time  2:30:54 ago, interval 120 minutes
2023-07-08 10:13:48,581 - utils.syncer - INFO - 'latina bodies are the best' at https://www.reddit.com/r/BustyNaturals/comments/14twww8/latina_bodies_are_the_best/ updated: 2023-07-08 07:14:13+00:00
2023-07-08 10:13:51,227 - utils.syncer - ERROR - Error trying to retrieve post details, try again in a bit; Couldn't retrieve post detail page
...

1 bugfix and deployment later:

2023-07-08 10:46:42,836 - utils.syncer - INFO - Scraping subreddit: bustynaturals. Last time  3:03:51 ago, interval 120 minutes
2023-07-08 10:46:43,573 - utils.syncer - INFO - 'latina bodies are the best' at https://www.reddit.com/r/BustyNaturals/comments/14twww8/latina_bodies_are_the_best/ updated: 2023-07-08 07:14:13+00:00
2023-07-08 10:46:48,327 - utils.syncer - ERROR - Couldn't find post on https://old.reddit.com/r/BustyNaturals/comments/14told8/latina_bodies_are_the_best/, skipping.

Defederation

Meanwhile, the folks at https://lemmy.world reached out to me to tell me they're defederating Lemmit. They are not fond of high volume of posts made by the bot, and the fact that there are now (quick check) 462 communities on this server all being moderated by a single person. They have already received a couple of complaints about spam, and it didn't help that some requests for NSFW subreddits were not marked as NSFW. Occasionally, those subreddits had explicit thumbnails that appeared in the 'All feed' without warning.

I had a good talk with the LemmyWorld admin, wherein they explained their point of view, and I explained mine. I understand their decision to disassociate with Lemmit, and appreciate their attempt to contact me. Other instances like Beehaw, and some smaller ones have also reached the same decision.

This does mean that you will no longer be able to get new community updates on those servers. So make sure to check the blocked instances list on your home server if you were subscribed to Lemmit. At the same time I have removed all the subscriptions of users from those servers, in order to not affect the sync priority mentioned above. This does mean, that if LemmyWorld, Beehaw, etc ever decide to connect to Lemmit again (however unlikely), you will need to un- and re-subscribe from there.

Meanwhile, I've added a feature in the bot that will remove request posts for NSFW subreddits, if the post itself is not marked for NSFW. This should prevent explicit thumbnails showing up where they are not wanted.

Server growth

Last night I got an alert from my server monitoring that the disk is 80% full. Unfortunately, the disk is only 60 GB, so that doesn't leave much room for expansion. On the bright side, a good chunk of that is from Lemmys very verbose logging (like, 4 GB a day, which gets cleaned up daily), so it should last throughout the weekend if I tune that down. Furthermore, most of the storage growth is from from pictrs, the image upload part of Lemmy, and that can utilize an S3 bucket, rather than using the VM's storage like it is now. Using an S3 bucket offers a cost-efficient solution for expanding storage. Initial estimates indicate a monthly cost of around $5 for 1000 GB of storage, which should be sufficient for a while *fingers crossed*.

In the early days of Lemmit (literally, as the server is less than a month old) image uploads were limited to a default setting, which was something around 40 megabytes. That did add up quickly (thanks to half-minute porn gifs), and so I had to limit the max filesize to 1 MB, and later 0.5 MB. Once the server has switched to S3 storage, I can probably up that limit a little, although not too much.

Finally, Lemmy v0.18.1 has been released, and it contains even more performance boosts compared to v0.18.0, so if there's time left this weekend (and I can verify the Lemmit Bot is compatible), I will probably perform the upgrade.

 

You know, on account of me upping that one setting in the admin which I should have thought of long ago.

1
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]
 

Okay, this one took me a bit longer than I planned (mostly due to sql fun and trying to use integers as minutes, WEEEE!).

Backdrop: Last week I disabled the mirroring of a couple of subreddits to the database, because they were initially requested but the nobody subscribed to them. At the same time, the bot was just crawling in a loop, starting at todayilearned, ending at latestsubreddit. As more subreddits were requested, this loop took longer and longer (21 minutes before I rolled out this update). This wasn't sustainable.

So here's the new situation. The more popular a community is, the more often it will be updated. In this case popular means a mixture between number of subscribers and the amount of posts it receives per day (Link to relevant snippet of source code).

In short, the most popular subs will be synced every 10 minutes, the next tier ever 30 minutes, 120 minutes and the content with either no posts per day or no subscribers (other than the bot), will only be synced every 12 hours. I hope this will hit a good distribution of updates vs popularity, but it will most likely be refined at some point in the future.

Speaking of distribution, we now have over 300 communities on this server 🥳, and their update intervals are spread out as such:

  • Every 10 minutes: 22
  • Every 30 minutes: 39
  • Every 60 minutes: 55
  • Every 120 minutes: 143
  • Every 720 minutes: 44

With this update running live (I started typing after I deployed it, and it has now gotten through the backlog of 'abandoned' subs), I'm going to step back from feature development for a few days. Any bugs that cause the bot to crash will of course continue to be addressed.

Have a blast!

 

Before was running on the cheapest model (1 core / 1GB mem / 30GB storage) at $12/month. The machine was running pretty low on memory, causing it to start swapping, which in turn caused the cpu to get too busy, and everything to slow down.

Now it has a whopping 2GB of memory, and things seem to have calmed down - cpu is back to around 10-15% usage, and swap is down to 0. Happy times all around.

Because of the amount of subs being archived, it now takes about 15 minutes between updates for each sub (was 18 before I updated the VM).

I'm planning to build some kind of scoring system, based on the amount of posts per subreddit (per day?), and amount of subscribers on the lemmy community. That way communities with little subscribers or that don't see many posts per day, will only be updated once per hour.

At the same time, I feel that subs like AskReddit, OutOfTheLoop and other "question-based" subreddits shouldn't be archived by Lemmit. In my opinion those kind of posts are useless without those answers, but please let me know if you disagree.

 
  • Fixed a bug where posts would not be submitted because the title didn't contain long enough words.
  • Fixed a bug where posts would not be submitted because the url was too long.
  • Fixed a bug where posts would not be submitted when it was linking to a /user subreddit.
  • Fixed a bug where the bot would think Every Post Everywhere was a subreddit request, and would reply to it.
  • Fixed a bug where the bot would crash without recovering whenever something went wrong during new subreddit requests

A fruitful day all in all, I'd say.

view more: next ›