27

Hello! I have a server that runs 24/7, and have recently started doing some stuff that requires scraping the web. The websites are detecting the server’s IP to not be residential though, and it’s causing issues.

I’d like to host a proxy server on the small server I have running 24/7 in my house, so that everything for that 1 page could be proxied through it. Does anyone have any idea how I’d set up a server like that? Thanks.

you are viewing a single comment's thread
view the rest of the comments
[-] Max_P@lemmy.max-p.me 1 points 2 years ago

Cloudflare tunnels won't work as Cloudflare won't tunnel HTTP proxy traffic, at least as far as I know.

What you can do however is have your home server VPN into your remote server, then your remote server will have no problem connecting to Squid over the VPN link. WireGuard is very simple to configure like that, probably 5-10 lines of config on each end. You don't need any routing or forwarding or anything, just a plain VPN with 2 peers that can ping eachother, so no ip_forward or iptables -j MASQUERADE needed or anything that most guides would include. You can also use something like Tailscale, anything that will let the two machines talk to eachother.

Depending on the performance and reliability needs, you could even just forward a port with SSH. Connect to your remote server from the home server with something like ssh -N -R localhost:8088:localhost:8080 $remoteServer and port 8088 on the remote will forward to port 8080 on the home server as long as that SSH connection is up. -N simply makes SSH not open a shell on the remote, dedicating the SSH session to the forwarding. Nice and easy, especially for prototyping.

[-] neoney@lemmy.neoney.dev 1 points 2 years ago

That seems overcomplicated for me honestly, but now I just thought that I actually can host the scraper on the home server, as the scraper itself only scrapes simple data, and the downloads are by a separate program.

[-] neoney@lemmy.neoney.dev 1 points 2 years ago

The downloader talks to the scraper through HTTP, which I can publish through CF Tunnels, so it’s perfect.

this post was submitted on 31 Jul 2023
27 points (93.5% liked)

Selfhosted

60093 readers
776 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam.

  3. Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.

  4. Don't duplicate the full text of your blog or git here. Just post the link for folks to click.

  5. Submission headline should match the article title.

  6. No trolling.

  7. Promotion posts require your active participation in selfhosting or related communities, or the post will be removed. No more than 10% of your posts or comments may be self-promotional, or your post will be removed. F/LOSS Exception: If your post is about a project that is completely open source & can be self-hosted in full without payment, and your account is at least 30 days old, your post is exempt from this rule as long as you continue to engage in comments.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago
MODERATORS