33
submitted 1 week ago by [email protected] to c/[email protected]

Hello! I'm evaluating tools to track changes in:

  • Government/legal PDFs (new regulations, court rulings)
  • News sites without reliable RSS
  • Tender portals
  • Property management messages (e.g. service notices)
  • Bank terms and policy updates

Current options I've tried:
• Huginn — Powerful but requires significant setup, no unified feed • Changedetection-io — good for HTML, limited for documents

Key needs:
✓ Local processing (no cloud dependencies)
✓ Multi-page PDF support
✓ Customizable alert rules
✓ Trying to reduce manual monitoring overhead — looking for robust, offline-first approaches

What's working well for others? Especially interested in:

  1. Solutions combining OCR + text analysis
  2. Experience with local LLMs for this (NLP, not just diff)
  3. Creative workarounds you've built

(P.S. Testing a deep scraping + LLM pipeline — if results look promising, will share.)

you are viewing a single comment's thread
view the rest of the comments
[-] [email protected] 5 points 1 week ago* (last edited 1 week ago)

Started to test changedetection (https://github.com/dgtlmoon/changedetection.io) for similar usecases (monitoring government grant webpages), it can also detect change in pdf, but I didn't test that feature that much. Worked fine so far for me.

[-] [email protected] 1 points 1 week ago

Can you point me to a tutorial how to setup that up properly for websites? I tried it a while ago and could not get it to work...

[-] [email protected] 2 points 1 week ago

Hello! For changedetection.io there is setup instruction with PIP install: https://github.com/dgtlmoon/changedetection.io/wiki/Microsoft-Windows What is your use case?

load more comments (3 replies)
this post was submitted on 04 Aug 2025
33 points (100.0% liked)

Selfhosted

50478 readers
388 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS