this post was submitted on 30 Apr 2024
1437 points (98.9% liked)

Reddit

17665 readers
220 users here now

News and Discussions about Reddit

Welcome to !reddit. This is a community for all news and discussions about Reddit.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules


Rule 1- No brigading.

**You may not encourage brigading any communities or subreddits in any way. **

YSKs are about self-improvement on how to do things.



Rule 2- No illegal or NSFW or gore content.

**No illegal or NSFW or gore content. **



Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.



Rule 4- No self promotion or upvote-farming of any kind.

That's it.



Rule 5- No baiting or sealioning or promoting an agenda.

Posts and comments which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.



Rule 6- Regarding META posts.

Provided it is about the community itself, you may post non-Reddit posts using the [META] tag on your post title.



Rule 7- You can't harass or disturb other members.

If you vocally harass or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.



Rule 8- All comments should try to stay relevant to their parent content.



Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.



:::spoiler Rule 10- Majority of bots aren't allowed to participate here.

founded 1 year ago
MODERATORS
 

For the threads with the older one on the left: https://lemmy.world/post/14859950

(Thank you @[email protected] )

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 8 points 6 months ago* (last edited 6 months ago) (1 children)
  1. To compare every comment on reddit to every other comment in reddit's entire history would require an index, and if you want to find similar comments instead of exact matches, it becomes a lot harder to do that efficiently. ElasticSearch might be able to do it, but then you need to duplicate all of that data in a separate database and keep it in sync with your main database without affecting performance too much when people are leaving new comments, and that would probably be expensive.
  2. Comparing combinations of comments is probably impossible. Reddit has a massive number of comments to begin with, and the number of possible subtrees of those comments would just be absurd. If you only care about comparing entire threads and not subtrees, then this doesn't apply, but I don't know how useful that will be.
  3. Programmers just do what they're told. If the managers don't care about something, the programmers won't work on it.
[–] [email protected] 0 points 6 months ago* (last edited 6 months ago) (1 children)

To compare every comment on reddit to every other comment in reddit's entire history would require an index

You think in Reddit's 20 year history no one has thought of indexing comments for data science workloads? A cursory glance at their engineering blog indicates they perform much more computationally demanding tasks on comment data already for purposes of content filtering

you need to duplicate all of that data in a separate database and keep it in sync with your main database without affecting performance too much

Analytics workflows are never run on the production database, always on read replicas which are taken asynchronously and built from the transaction logs so as not to affect production database read/write performance

Programmers just do what they're told. If the managers don't care about something, the programmers won't work on it.

Reddit's entire monetization strategy is collecting user data and selling it to advertisers - It's incredibly naive to think that they don't have a vested interest in identifying organic engagement

[–] [email protected] 4 points 6 months ago* (last edited 6 months ago)

You think in Reddit’s 20 year history no one has thought of indexing comments for data science workloads?

I'm sure they have, but an index doesn't have anything to do with the python library you mentioned.

Analytics workflows are never run on the production database, always on read replicas

Sure, either that or aggregating live streams of data, but either way it doesn't have anything to do with ElasticSearch.

It's still totally possible to sync things to ElasticSearch in a way that won't affect performance on the production servers, but I'm just saying it's not entirely trivial, especially at the scale reddit operates at, and there's a cost for those extra servers and storage to consider as well.

It's hard for us to say if that math works out.

It’s incredibly naive to think that they don’t have a vested interest in identifying organic engagement

You would think, but you could say the same about Facebook and I know from experience that they don't give a fuck about bots. If anything they actually like the bots because it looks like they have more users.