this post was submitted on 23 Jun 2023
12 points (100.0% liked)

Reddit Migration

16 readers
2 users here now

### About Community Tracking and helping #redditmigration to Kbin and the Fediverse. Say hello to the decentralized and open future. To see latest reeddit blackout info, see here: https://reddark.untone.uk/

founded 1 year ago
 

I didn't believe that reddit would "undeleted" comments. Well, now, I know that it's true.

I deleted everything (Posts, Comments) using Shreddit a few day ago (Shreddit edits the comments with a Lorem ipsum, then delete it), but yesterday I noticed some were not deleted, trying to run Shreddit again didn't work, they weren't detected by Shreddit, so I thought of it as a small bug.

So I deleted every comment left manually, editing and deleting them. This was yesterday.

Now it's tomorrow and some comment were back online. But this time Shreddit was able to delete them.

Feel free to check my profile today and come back in a few days to see if comments are coming back (I'll monitor this closely):

https://www.reddit.com/user/Kraftingg/comments/

I'm glad we can have a thriving platform on the fediverse now, it feel more like home here!

Edit: some comment might be from reopened subreddits afterr the blackout. So maybe not totally right after all.

top 7 comments
sorted by: hot top controversial new old
[–] [email protected] 4 points 1 year ago

Looks like it may be a combo of subs going dark then coming back plus the 1000 indexing limit as mentioned in https://kbin.social/m/RedditMigration/t/47320/PSA-If-you-have-more-than-1000-posts-more-than

[–] [email protected] 2 points 1 year ago (2 children)

I like the solution someone gave, so instead of deleting the comments, we rewrite everything with garbage, because garbage data is worse than no data.

Here the link https://lemmy.world/comment/286942

[–] [email protected] 2 points 1 year ago (1 children)

Remember though most automated solutions can't overcome the 1000 index limit, even when overwriting. Even doing it manually may not do the job.

[–] [email protected] 0 points 1 year ago (1 children)

It's this a daily limit, or a request limit? Because we may only split the changes if necessary

[–] [email protected] 1 points 1 year ago

Neither. It's an indexing limit. Basically you can only see the 1000 most recent posts, the 1000 most upvoted posts, and the 1000 most downvoted posts in a sub at best. (But there may also be overlap, so the total number of unique posts thru all three methods is less than 3000.) So you could do part the job on different days, have others help you in splitting the requests up, etc. None of it would help bypass that limit. It's like a limit on what you can see in the table of contents, but also if books didn't have page numbers and you couldn't get to a specific page unless you either found it in the TOC or else you had memorized the 19 digit access number.

I wrote about how to overcome it (see https://kbin.social/m/RedditMigration/t/65260/PSA-Here-s-exactly-what-to-do-if-you-hit-the ) but this only works for the comments and posts of the past. Now that pushshift was shutdown we won't have access to such data going forward.

[–] [email protected] 0 points 1 year ago (1 children)

Editing comments doesn't devalue the data for reddit since they still have all the original data, it's only problematic for people trying to scrape the data from the public UI who are the exact people reddit wants to charge big bucks for API access so idk it this is hurting them in the way people seem to think

[–] [email protected] 1 points 1 year ago

The word is that reddit deletes are soft-deletes and a copy is still saved on reddit's database (but not publicly accessible to anyone outside of reddit), however the same word is that overwriting does in fact destroy reddit's copy of the original data.

As for people who want to get my posts and comments - this is exactly why I saved a copy of everything before overwriting. They can still find it, they will just have to scrape lemmy/kbin - or use the lemmy or kbin API - to get at it.

Also, the internet archive (who is registered as an actual library iirc) has a copy of the pushshift torrents covering all reddit posts and comments from 2005 to March 2023. So the librarians and historians who want to research this stuff will be fine.