this post was submitted on 03 Jan 2024
137 points (100.0% liked)

hexbear

10261 readers
2 users here now

Now that the old Hexbear fork has been officially abandoned, this community will be used as a space for meta-discussion on the site itself.

founded 4 years ago
MODERATORS
 

I've noticed a rise in people sharing links to YouTube, Instagram, Twitter, TikTok, and reddit that include tracking parameters in the URL.

It might largely be harmless for now, but it's not good to let companies build a web of links between users of this site, and to link the usernames of users on this site to their off-site accounts, which may include sensitive info.

SM URL Part Appearance in URL Filtration technique
Youtube Query ?si=* Remove query string
Instagram Query ?igshid=* Remove query string
Twitter Query ?t= Remove query string
Tiktok Subdomain and path (vm/vt).tiktok.com/(random_string) Block
reddit Path /(sub_name)/s/(random_string) Block

This site should only allow canonical links to the content to limit the information exposed.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 41 points 10 months ago* (last edited 10 months ago) (1 children)

Theres a url, say peepee.com. So far this is the routing portion of the url that says how to find the web server, basically saying "ask .com how to find peepee", and that gives us the ip address of the server.

Everything that comes after that, is information for the server itself. So to navigate to a resource, say poopoo, that lives on the server, they would navigate to peepee.com/poopoo.

But sometimes you want to navigate to that resource and also communicate some bit of information to the server, say a login token so the server knows who is accessing that resource. This is communicated via a URL parameter, and looks like ?userid=abcd1234, or in the full url: peepee.com/poopoo?userid=abcd1234. So the user is still accessing the same resource, but has provided additional metadata to the server.

These parameters can be abused to identify who knows who and who communicates with who by attaching a tracking id parameter to the URL, so when you share a link it includes that tracking parameter and anyone who clicks on it, well now the server knows that the originator of the tracking ID (well, the first person to be assigned it) shared it with this other person. This can be combined with other collected info to build a map and social graph of actual people, e.g. we know dave is at this ip, and jane is at this other ip, and we put a tracking parameter in daves url and we saw jane use that same tracking parameter in her url, so we know that dave shared this url with jane.

So to answer your question, a canonical link is a link to a resource without the unneeded url parameters.