cross-posted from: https://piefed.zip/c/commandline/p/1389995/cli-based-bookmark-manager-based-on-indexing-visited-sites-and-search-engine-like-queries
Funny thing: I just discovered Piefed doesn't implement cross-posting. So, crossposting from an alt.
Most of the cases where most people use bookmarks, I want a search engine based on only sites I've visited. I don't know whether I'm dramatically different from other people, but by the time I'm looking for a site, I've forgotten the most unique attributes, and even my own tagging often ends up tagging the wrong sorts of attributes. Tagging is still better for me than hierarchical organization, but what I really want is a sort of command-line search engine that searches only sites I've visited before.
I've frequently thought about building such a thing, but every time I do I think, "someone must have already built this." So:
Does anyone know of a tool like bmm or buku, but which indexes the URL's main page, and has a command-line tool for keyword querying the DB like a search engine? As in, performing stemming and lemmatization? It'd be like bmm/buku's tag search, only the tags would be a search engine index of the page.
What I do not want is
- a self-hosted, web-based UI search engine
- a self-hosted bookmark manager; buku and bmm are already both fine tools, and I'm not trying to solve "access all my bookmarks from everywhere". That latter I can do with rsync or syncthing.
- a command-line bookmark manager... unless it conforms to the constraints above: queries should function on a full-text index of selected web pages. Again, buku and bmm would be fine if my tagging skills were better.
- a crawler-based search engine
I do want:
- the convenience of giving the tool a URL and having it auto-tag. buku does this, except that IME the resulting tags correlate even less well to how I remember things when I want to search than my manual tagging does.
- some fuzziness in the search; my current problem is how constrained the searches are. This isn't their failing; I simply have obtuse recollection skills. I tag "dog,pet,animal", but when I'm looking for it, what I remember is "it's got four legs".
- local, command-line
- indexing a page of a given URL. Recursive is optional; I probably wouldn't use it, but if it's there that's fine. I just want to be able to limit the indexing to a single page.
This is my last ditch effort to find an existing tool; otherwise, I'm going to build it, because it's not a hard problem. Which is in part why I'm having trouble believing someone hasn't already built it.
That's what mobile phones are for.
And, also... has it really gotten to the point where developers are unable to write code without the internet?