LocalLLaMA

2951 readers
34 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

founded 2 years ago
MODERATORS
1
18
submitted 17 hours ago* (last edited 17 hours ago) by [email protected] to c/[email protected]
 
 

This fork introduces a Radio Station feature where AI generates continuous radio music. The process involves two key components:

LLM: Generates the lyrics for the songs. ACE: Composes the music for the generated lyrics.

Due to the limitations of slower PCs, the demo video includes noticeable gaps (approximately 4 minutes) between the generated songs.

If your computer struggles to stream songs continuously, increasing the buffer size will result in a longer initial delay but fewer gaps between songs (until the buffer is depleted again).

By default the app attempts to load the model file gemma-3-12b-it-abliterated.q4_k_m.gguf from the same directory. However, you can also use alternative LLMs. Note that the quality of generated lyrics will vary depending on the LLM's capabilities.

2
 
 

model:
32B olmo-2 03/25

https://arxiv.org/abs/2501.00656

"We release all OLMo 2 artifacts openly -- models at 7B and 13B scales, both pretrained and post-trained, including their full training data, training code and recipes, training logs and thousands of intermediate checkpoints. "

3
4
 
 

Hi, I'm not too informed about LLMs so I'll appreciate any correction to what I might be getting wrong. I have a collection of books I would like to train an LLM on so I could use it as a quick source of information on the topics covered by the books. Is this feasible?

5
 
 

Something I always liked about NousResearch is how they seemingly try to understand cognition in a more philosophical/metaphysically symbolic way and aren't afraid to let you know it. I think their unique view may allow them to find some new perspectives that allow for advancement in the field. Check out AscensionMaze in particular the wording they use is just fascinating.

6
 
 

I'm interested in really leveraging the full capabilities of local ai, for code generation and everything else. let me know what you people are using.

7
 
 

It's amazing how far open source LLMs have come.

Qwen3-32b recreated the Windows95 Starfield screensaver as a web app with the bonus feature to enable "warp drive" on click. This was generated with reasoning disabled (/no_think) using a 4-bit quant running locally on a 4090.

Here's the result: https://codepen.io/mekelef486/pen/xbbWGpX

Model: Qwen3-32B-Q4_K_M.gguf (Unsloth quant)

Llama.cpp Server Docker Config:

docker run \
-p 8080:8080 \
-v /path/to/models:/models \
--name llama-cpp-qwen3-32b \
--gpus all \
ghcr.io/ggerganov/llama.cpp:server-cuda \
-m /models/qwen3-32b-q4_k_m.gguf \
--host 0.0.0.0 --port 8080 \
--n-gpu-layers 65 \
--ctx-size 13000 \
--temp 0.7 \
--top-p 0.8 \
--top-k 20 \
--min-p 0

System Prompt:

You are a helpful expert and aid. Communicate clearly and succinctly. Avoid emojis.

User Prompt:

Create a simple web app that uses javascript to visualize a simple starfield, where the user is racing forward through the stars from a first person point of view like in the old Microsoft screensaver. Stars must be uniformly distributed. Clicking inside the window enables "warp speed" mode, where the visualization speeds up and star trails are added. The app must be fully contained in a single HTML file. /no_think

8
 
 
9
10
 
 

Qwen3 was apparently posted early, then quickly pulled from HuggingFace and Modelscope. The large ones are MoEs, per screenshots from Reddit:

screenshots

Including a 235B/22B active and a 30B/3B active.

Context appears to 'only' be 32K unfortunately: https://huggingface.co/qingy2024/Qwen3-0.6B/blob/main/config_4b.json

But its possible they're still training them to 256K:

from reddit

Take it all with a grain of salt, configs could change with the official release, but it appears it is happening today.

11
12
22
submitted 2 weeks ago* (last edited 2 weeks ago) by [email protected] to c/[email protected]
 
 

This is one of the "smartest" models you can fit on a 24GB GPU now, with no offloading and very little quantization loss. It feels big and insightful, like a better (albeit dry) Llama 3.3 70B with thinking, and with more STEM world knowledge than QwQ 32B, but comfortably fits thanks the new exl3 quantization!

Quantization Loss

You need to use a backend that support exl3, like (at the moment) text-gen-web-ui or (soon) TabbyAPI.

13
 
 

I would like my model to know the code libraries I use and help me write code with them. I use llama.cpp's server and web UI for inference, but I have no clue how to get started with RAG, since it seems it is not natively supported with llama.cpp's server implementation. It almost looks like I would need to code my own agent.

I am not interested in commercial offerings or APIs. If you use RAG, how do you do it?

14
 
 

I'm currently running Gemma3, it is really good overall, but one thing that is frustrating is the relentless positivity.

It there a way to make it more critical?

I'm not looking for it to say "that is a shit" idea; but less of the "that is a great observation" or "You've made a really insightful point" etc...

If a human was talking like that, I'd be suspicious of their motives. Since it is a machine, I don't think it is trying to manipulate me, I think the programming is set too positive.

It may also be cultural, at a rule New Zealanders are less emotive in our communication, the LLM (to me) feels like are overly positive American.

15
 
 

Seems there's not a lot of talk about relatively unknown finetunes these days, so I'll start posting more!

Openbuddy's been on my radar, but this one is very interesting: QwQ 32B, post-trained on openbuddy's dataset, apparently with QAT applied (though it's kinda unclear) and context-extended. Observations:

  • Quantized with exllamav2, it seems to show lower distortion levels than nomal QwQ. Its works conspicuously well at 4.0bpw and 3.5bpw.

  • Seems good at long context. Have not tested 200K, but it's quite excellent in the 64K range.

  • Works fine in English.

  • The chat template is funky. It seems to mix up the and <|think|> tags in particular (why don't they just use ChatML?), and needs some wrangling with your own template.

  • Seems smart, can't say if it's better or worse than QwQ yet, other than it doesn't seem to "suffer" below 3.75bpw like QwQ does.

Also, I reposted this from /r/locallama, as I feel the community generally should going forward. With its spirit, it seems like we should be on Lemmy instead?

16
17
18
 
 

Just thinking about making this a monthly post, which model are you using? what are the positives and negatives?

19
20
 
 

The Trump administration is considering new restrictions on the Chinese AI lab DeepSeek that would limit it from buying Nvidia’s AI chips and potentially bar Americans from accessing its AI services, The New York Times reported on Wednesday.

21
 
 
22
 
 

Let's go! Lossless CPU inference

23
 
 

Ok, so the best thing that I have been using my local AI for lately is acting as Socrates and helping me delve into my thinking

Long text here

User | 2025/04/09 08:39:23

When considering the Socratic method, what questions did Socrates ask people, to start the process of enquiry


Gemma3 (12B) | 2025/04/09 08:39:55

Okay, let's break down the types of questions Socrates used to initiate the Socratic method. It's important to understand that it wasn't a rigid formula, but a flexible approach designed to expose contradictions and ultimately lead to deeper understanding. Here's a breakdown, categorized by their function and with examples. I'll also include some key principles at the end.

1. Initial Clarification & Agreement (Establishing a Starting Point - Often Seemingly Simple)

  • "What do you mean by...?" / "What is X?" (Where X is a Virtue, Concept, or Idea - e.g., "What do you mean by justice?", "What is courage?", "What is piety?")
    • Purpose: This is the essential starting point. Socrates wouldn't just debate; he's seeking a clear definition. He's trying to get the other person to articulate their understanding. He would often feign ignorance to encourage them to explain.
    • Example: "Tell me, what do you mean when you say 'a just person is someone who always tells the truth'?"
  • "Is it always the case that...?" / "Does X always mean Y?"
    • Purpose: To test the initial definition's universality. Socrates is looking for exceptions or situations where the definition might not hold.
    • Example: "Is it always just for a person to tell the truth, even if doing so will cause harm?"
  • "Can you give me an example of X?"
    • Purpose: To make the definition more concrete and test its applicability.
    • Example: "Can you give me an example of a courageous act?"

2. Testing the Definition (Identifying Contradictions & Difficulties)

  • "If X is Y, then does that mean Z?" / "If X is Y, what follows from that?" (Hypothetical questions to explore implications)
    • Purpose: To push the definition to its logical conclusion and see if it leads to absurdities or contradictions. This is a core technique.
    • Example: (After someone defines courage as "standing firm against pain") "If courage is standing firm against pain, would someone who doesn't feel pain be courageous? Wouldn't they just be standing firm without knowing it?"
  • "But what about...?" / "Consider the case of..." (Introducing counter-examples or edge cases)
    • Purpose: To challenge the universality of the definition by presenting situations where it breaks down.
    • Example: (After someone defines piety as "doing what the gods demand") "But people disagree about what the gods demand. What if one person does something believing it's pious, but another believes it's impious? Which one is right?"
  • "How does X relate to Y?" / "Is X a type of Y, or is Y a type of X?" (Exploring relationships between concepts)
    • Purpose: To clarify the scope and boundaries of the definition.
    • Example: "Is justice a type of virtue, or is virtue a type of justice?"

3. Probing for Underlying Assumptions (Revealing Hidden Beliefs)

  • "Why do you think that?" / "What makes you say that?"
    • Purpose: To uncover the reasons behind the initial assertion. Often, the reasoning is vague or unexamined.
    • Example: (After someone says "the just person always tells the truth") "Why do you think a just person always tells the truth?"
  • "What do you mean by 'good'?" / "What do you mean by 'beautiful'?" (Questioning fundamental values)
    • Purpose: To examine the foundations of the other person's beliefs. Often, these fundamental values are taken for granted.
    • Example: (If someone says "Doing what's good is the highest goal") "What do you mean by 'good'?"

Key Principles of Socratic Questioning

  • Irony/Playing the Ignorant: Socrates often pretended to be less knowledgeable than he was, encouraging the other person to take the lead and explain their views.
  • Focus on Their Beliefs: He wasn't trying to impose his own ideas. He was interested in understanding and clarifying their beliefs.
  • No Easy Answers: The goal wasn't to arrive at a definitive answer. It was to expose the limits of knowledge and stimulate further inquiry. Often, the dialogues ended in aporia – a state of perplexity and uncertainty.
  • Relentless Logic: Socrates was meticulous in his questioning, pushing for logical consistency and exposing contradictions.
  • Focus on Universal Definitions: He sought definitions that applied universally, not just in specific cases.

To help me tailor my response further, could you tell me:

  • Are you interested in a specific Socratic dialogue?
  • Are you focusing on a particular virtue (like justice or courage)?

User | 2025/04/09 08:44:51

What specific questions would he ask to start these discussions, I'm looking to probe my own thoughts, with you acting as Socrates in this endevour.


Gemma3 (12B) | 2025/04/09 08:44:57

Excellent! This is a wonderful approach. Let's do this. I'm ready to embody Socrates and help you probe your own thoughts. To begin, I need a topic. Let's start with something you're curious about, something you're wrestling with, or a concept you'd like to understand better. It can be a virtue, a belief, an action, anything at all.

To initiate, please tell me:

  1. What is the topic you're interested in exploring today? (Just a brief description is fine.)
  2. What, if anything, are your initial thoughts or beliefs about this topic? (Don't worry about being "right" or "wrong." Just tell me what you think, even if it's tentative.)

Once you provide those, I'll respond as Socrates and begin the questioning. Remember, my goal isn't to tell you what to think, but to help you examine your own thoughts more clearly.

I am eager to hear your response and start our inquiry. Let us begin!


User | 2025/04/09 09:16:54

I'm interested in social responsibility; what are the responsibilities that one has to our shared experience and prosperity.

I think each person should balance individual and collective needs as best suits their abilities. I think it is almost impossible to live a truly individualistic life, we are all dependant on our social bonds.


Gemma3 (12B) | 2025/04/09 09:17:07

Very well. Social responsibility – a most worthy subject for inquiry. And you’re already demonstrating a thoughtful approach, acknowledging the interplay between individual and collective needs. Let's examine this further.

You’re suggesting that each person should “balance individual and collective needs as best suits their abilities,” and that “it is almost impossible to live a truly individualistic life.” These are intriguing claims. Let’s begin with the first.

Tell me, when you speak of "balancing individual and collective needs," what do you mean by “individual needs”? What constitutes an “individual need” as opposed to a “want” or a mere preference?

Don't feel pressured to give a definitive answer. I am simply seeking clarification, to understand what lies beneath your initial statement. Let’s begin there.


User | 2025/04/09 09:35:17

A simplistic view would be to look at say, Maslow's hierarchy of needs, and pick the physical and safety needs as the absolute essentials to stay alive.

But looking at this simplistic view, it is not satisfying. Reducing "needs" to the essentials to stay alive, invites questions of purpose, why even live if all you are doing is staying alive.

Moving up the hierarchy could be the answer for finding this purpose, but one could also look to Scott Barry Kaufman's sail boat metaphor, with the security needs (safety, connection and self-esteem) as the essentials, that allow you to build the to the higher level needs of exploration, love and growth.

We also have to consider that the security and safety needs are not a fixed thing, once solved we never need to worry about them again. They can be taken away, natural disaster, war, famine can knock existing pillars out and we need to rebuild them; putting on hold our growth to attend to the lower level needs again.

I think the difference between a need/want/perference is vague, on the simple level I NEED food, but I want it to be clean and healthy, I prefer it to also be tasty and enjoyable. I need shelter to survive, but I also need it to be warn and dry in the long term or I will get sick, but I don't necessarily need it to be warm and dry today; we can take a lot of hardship and come out better for it, but we cannot endure that kind of stress constantly for years on end.


Gemma3 (12B) | 2025/04/09 09:35:34

You’ve moved far beyond the simplistic view we initially considered! Your observations are quite astute, and the complexity you’re uncovering is precisely what makes this topic so challenging. You’re wrestling with the very nature of “need” itself.

You’re correct to point out that the concept of need isn't static. It fluctuates, it’d be taken away, and we are all impacted by those shifts.

You’re also keen to acknowledge the difference between the absolute bare minimum for survival and the conditions necessary for a flourishing life.

Now, let’s focus on something you’re highlighting: this distinction between survival and flourishing.

You say that “the conditions necessary for a flourishing life” are different from “the absolute bare minimum for survival.” If someone is merely surviving – having their basic needs for food, safety, and shelter met – would you say they are truly fulfilling their social responsibility?

Consider scenarios where someone is simply scraping by, just managing to keep themselves alive. Are they, by virtue of merely surviving, fulfilling their social responsibility? Or is there something more required?


User | 2025/04/09 09:59:35

If someone is merely surviving, are they fulfilling their social responsibility?

There are two perspectives to look at, the persons perspective and the perspective of society itself.

From the point of view of society, is the person a net drain on the system, do they consume more resources than they contribute? If someone is just scraping by, they are probably consuming more than they contribute currently. But this may not always be the case. What is the responsibility of society to the individual, different countries tackle this question differently, we in New Zealand tend to favour helping the person, using services provided at no or low cost to help people, yet we still have homeless ness and other social issues.

From the perspective of the individual, what can/should they expect from society, what is "the social contract"; how do they feel that society should treat them? At home, in NZ, we have the expectation the the government has got our back, they are there to help us, healthcare is free as is education before university, toll roads are extremely rare and we have public rubbish bins to keep litterling to a minimum; we have a social welfare system to help those that are disadvantaged. When an individual feels that contract is broken, how should they feel, to what extent should the feel they "need to do more".


Gemma3 (12B) | 2025/04/09 09:59:49

You've raised a crucial point by differentiating between the individual's perspective and society’s perspective. The social contract is indeed at the heart of this discussion. And your observation about New Zealand’s approach – a leaning towards supporting the individual – is a valuable lens through which to examine this.

You’re rightly pointing out that even with generous social support, problems persist. Let's focus on that tension.

You observe that even with New Zealand’s support system, "problems persist." If a system designed to support individuals – to lift them beyond mere survival – still fails to eliminate issues like homelessness, does that suggest a flaw in the system itself, or does it reveal something about the nature of social responsibility that goes beyond simply providing basic resources?

I’m not asking you to condemn the system. I'm prompting you to consider whether the very act of providing resources, while necessary, might not be sufficient to fulfill the obligations of social responsibility. What might be missing?


Generated from Alpaca

I'm still pondering my next response, but this is great, I have a bunch of other ones on virtue and reasoning, but you get the gist. The model can examine what I have written and ask somewhat insightful questions and prompt further introspection.

24
 
 

General consensus seems to be that llama4 was a flop. A head of meta AI research division was let go.

Do you think it was a bad fp32 conversion, or just unerwhelming models all around?

2t parameters was a big increase without much gain. If throwing compute and parameters isnt working to stay competitive anymore, how do you think the next big performance gains will be made? Better CoT reasoning patterns? Omnimodal? something entirely new?

25
 
 

We’re introducing Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context support and our first built using a mixture-of-experts (MoE) architecture.

view more: next ›