I have yet to meet a LLM that works decently locally. Wizard Uncensored is the closest, but the context length is too short, it keeps repeating itself after some time
Technology
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
Have you seen the great gatspy with Wizard too? That's what always comes up when mine goes too far. I'm working on compiling llama.cpp from source today. I think that's all I need to be able to use some of the other models like Llama2-70B derivatives.
The code for llama.cpp is only an 850 line python file (not exactly sure how python=CPP yet but YOLO I guess, I just started reading the code from a phone last night). This file is where all of the prompt magic happens. I think all of the easy checkpoint model stuff that works in Oobabooga uses python-llama-cpp from pip. That hasn't had any github repo updates in 3 months, so it doesn't work with a lot of newer and larger models. I'm not super proficient with Python. It is one of the things I had hoped to use AI to help me learn better, but I can read and usually modify someone else's code to some extent. It looks like a lot of the functionality (likely) built into the more complex chat systems like Tavern AI are just mixing the chat, notebook, and instruct prompt techniques into one 'context injection' (-if that term makes any sense).
The most information I have seen someone work with independently offline was using langchain with a 300 page book. So I know at least that much is possible. I have also come across a few examples of people using langchain with up to 3 PDF files at the same time. There is also the MPT model with up to 32k context tokens but it looks like it needs server machine ram in the hundreds of GB to function.
I'm having trouble with distrobox/conda/nvidia on Fedora Workstation. I think I may start over with Nix soon, or I am going to need to look into proxmox, virtualization or go back to an immutable base to ensure I can fall back effectively. I simply can't track down where some dependencies are getting stashed and I only have 6 distrobox containers so far. I'm only barely knowledgeable enough in Linux to manage something like this well enough for it to function. - suggestions welcome
I would caution against using an LLM alone for individualized curriculum. It can be a tool to assist with learning, but it's unreliable enough that you may find yourself being taught incorrect information, or stuck in a situation where the AI is unable to help you understand a concept due to being incapable of understanding you (or anything for that matter, LLMs don't "understand" anything).
If you're looking for a simulated experience, you won't be able to provide all the learning materials from a university as context. It's just too much info (and at least right now technically infeasible from what I know). Instead, you'd want to provide only relevant snippets of information and use those for generation. How you determine which snippets are relevant is up to you, but will most likely require an understanding of the subjects you want it to teach you. Maybe along the process of making this AI, you'll end up just learning the materials you wanted it to teach you anyway though.
Continuing from my PC, if you wanted a simulated experience watching a lecture and answering quizzes and such, it might be that watching the lecture is more than enough, especially if you have the quiz answers and test answers. Strategies like this are not new, not AI-powered, and have been decently successful without needing to pay for any courses directly.
However, if you wanted a way to ask questions to a Q&A bot while the lecture is running, you could use a combination of some sort of semantic retrieval (where you're retrieving any relevant learning materials that are expected of you to explore as a student of the course) and providing the most recent lecture contents as context to the LLM.
For the retrieval part, I'd recommend looking at a vector database like Weaviate (potentially offline) or something like Azure Cognitive Search (online/cloud) to store snippets of the learning material - maybe sections of chapters or such - along with their embeddings (other options exist, but these are two that I've personally used). Note that the embeddings these databases use often come from an LLM, so for example with Weaviate, you'll need access to something for embedding generation. Then, you'd use the question to query the database (either keyphrases, or possibly directly as is) for the relevant snippets, and have some number of those as one part of your context. You can use a transcription of the lecture to provide the second part of the context. Then finally the third part of your context could be the actual question, along with the format you want it to respond in. This way you can limit the amount of context you need to provide to the LLM (instead of needing to provide the entire set of learning materials as context).
This would be a pretty complicated project though. It's not as simple as going on character.ai or ChatGPT and creating a carefully-crafted prompt :)
Edit: for limiting the knowledge of the LLM, this might just come down to selecting the right prompt, and even then it seems like it'd be a difficult challenge. I'm not sure that you'll have much success here with current LLMs to be honest, but play around and see if you can get it to avoid generating answers off of materials you shouldn't have learned yet.