LLM Assistant for Markdown Documents (quokk.au)

submitted 1 week ago by fizzle@quokk.au to c/fosai@lemmy.world

8 comments fedilink hide all child comments

I keep a lot of notes in markdown files, and I'd like an LLM to assist.

I regularly use Open WebUI with with inference routed through huggingface. Open WebUI kind of has this functionality like you can upload a markdown file and prompt it to improve it in whatever way, but of course that's a fairly clunky workflow.

I really want something built into the editor, that can use RAG to consider other files in context.

I also don't want to be locked in to a specific LLM or provider, I'd like to be able to link it to OpenRouter or similar.

you are viewing a single comment's thread
view the rest of the comments

[-] muntedcrocodile@hilariouschaos.com 2 points 1 week ago

Rag is an outdated mechanism full agentic workflow is much better imo. I've written my own custom thing that uses a matrix account, pi, a vector embedder via local ollama, and vector store of chroma, the agent has custom tools to query the vector store, run bash etc. I have my logseq notes sync to my server via syncthing and I have a file watcher that updates the vector store as my notes change. The agent can edit notes like its any file. I then simply have a matrix client I can communicate to the agent with. I have it so the fuel what her looks for "/sydney" (that's my agents name) and it will send a message via matrix to get the agent to go look at that file/note and make changes as requested via the command. Its kinda openclawish but a lot less context heavy and doesn't run forever unless triggered.

[-] fizzle@quokk.au 2 points 1 week ago

This sounds really cool. I hadn't heard of a vector embedder / vector store before. Definitely need to look into those.

Do you have a big GPU to run local ollama ?

[-] muntedcrocodile@hilariouschaos.com 1 points 1 week ago

So I do inference over api to open routers I wish I had the GPU or an apple to run a decent LLM locally. Embedding is very cheap comparatively, I use a clip embedding model so I can have images and text in the same vectorspace.

[-] fizzle@quokk.au 1 points 1 week ago

I'm out of my depth here but trying to piece this together.

If I understand correctly the first component of this workflow is to use an inference API (like huggingface or so) to convert each file from your notes into semantic vectors and store them in chromadb, ready to be used in future prompts.

Are you using any software to do that or have you written some code to load the files from disk, call the API, and store the response?

[-] muntedcrocodile@hilariouschaos.com 1 points 1 week ago

So my notes are just a directory of thousands of MD files. I wrote some code that watches the files in this dir to see when anything changes and when it does it will do the following:

Splits the file into chunks with some overlap on each side it does something like 300token chunks with a 25token overlap this is done by loading the model tokeniser via the huggiface python library and using the huggingface chunker (this happens locally).
I send each chunk to my local ollama instance that converts it to a semantic vector (just another local docker container)
I then delete all semantic vectors in chromadb for that file and create new entries for the updated file.
If /sydney is contained within the file it sends a message to the matrix chat as the user saying "read and follow the instructions provided by the /sydney command" the agent manager will then get this message and pass it off to an agent to handle. All this happens locally.

My ai agent is a separate component (just another docker container, with the notes dir mounted as a volume) using pi which uses an LLM via remote api (openrouters). I have a custom tool for that agent where the agent can write a text search that returns the top n most semantically similar chunks of text (along with some metadata notably the filename and line numbers where this chunk came from). The vectors are never seen by the LLM they exists purely for the search ranking. The agent also has file editing capabilities so it can then go read that file or modify that file like any coding agent. The agent also has a tool to send messages via matrix.

I have a service that watches a specific matrix chat and if a message is recieved does 1 of 2 things: Option 1: if an agent is already running it will pass the message into the existing agent as a user message. Option 2: if no agent is running it will start a new agent instance and pass the message into the agent as the user message. This agent manager service is the same docker image that runs the agent. This is the same docker container that runs the agent when the agent finishes running it takes and final agent output and sends that to the matrix chat as the agents matrix user.

I got an agent to write all this code so its probably dodgy as shit with all sorts of security holes hence I haven't published it on github (security through obscurity etc etc lol).

I also have a searxng instance running accessible to the agent via MCP. And I have a chrome MCP allowing the agent to do things from inside a virtual chrome browser.

this post was submitted on 22 Apr 2026

10 points (85.7% liked)

Free Open-Source Artificial Intelligence

4661 readers

19 users here now

Welcome to Free Open-Source Artificial Intelligence!

We are a community dedicated to forwarding the availability and access to:

Free Open Source Artificial Intelligence (F.O.S.A.I.)

More AI Communities

LLM Leaderboards

Developer Resources

GitHub Projects

GitHub Stars

FOSAI Time Capsule

founded 2 years ago

MODERATORS

Blaed@lemmy.world

fosai@lemmy.world