1
24
submitted 2 days ago* (last edited 2 days ago) by [email protected] to c/[email protected]

WOAH

2
9
submitted 2 days ago by [email protected] to c/[email protected]
3
24
submitted 3 days ago* (last edited 3 days ago) by [email protected] to c/[email protected]

So, ive never really used RAG before. I kept hearing about it and understood it in concept, so wanted to play around with it. I play a table top map drawing game called DELVE that has lots of rules to keep track of requiring lots of manual page flips on my slow ereader.

I wanted to see of it was possible to teach my local model about DELVE through my engines basic RAG system that works by providing a big text file and doing some keyword searches in it based off user input. The engine pulls in local chunks of text document around important keywords as context. This is different than more complex and sophisticated versions of RAG that work off fancy vector space math or something like that. Whatever, kobold is my engine of choice and I'm happy to have something that half works and is simple enough to grasp.

So the first step was converting the game manual PDF to text file. I did this using calibres ebook-convert tool. Then I looked through and tried to clean some glaring issues up invdvitably theres gonna be jank converting PDF unicode or tables.

sudo apt install calibre

ebook-convert delve.pdf delve.txt

Once its cleaned up a little I threw it in kobold lites context>textdb tab and enabled it.

Its cool how such a simple process manages to let the model leverage local datasets. I was shocked when it correctly recited proper game rules.

Full Input/Output shown in screenshotInput: How to play delve?

Output:

Okay, I need to figure out how to explain how to play DELVE based on the provided info snippets. Let's start by breaking down the key components mentioned.

First, the materials required are pencil, eraser, grid paper or blank paper, notepad for resources and trade goods, a deck of standard playing cards without jokers initially, and tokens for various purposes like dwarves, enemies, etc. The starting setup involves drawing a stairwell at the top center as the entrance. Resources (hearts ♥) and Trade Goods (diamonds ♦) start at 20 each. Then there's mention of Soldiers being placed at the entrance. Jokers are removed from the deck at first but added later at depth 5.

The turn structure seems to involve the following steps: prepare, explore, retreat. Each turn has phases like exploring, checking threats, downtime. There's talk about expanding the hold by adding rooms, handling threats through combat or avoidance, managing resources and trade goods.

The Bloodrunes are optional challenges that offer boons but also transport to a nightmare layer upon choosing them. Each rune has specific effects. For example, the Rune of Nightmares immediately starts the player in a nightmare layer with rewards for defeating ancient monsters.

There's a part about using cards instead of dice. Drawing suits corresponds to D4 results, while colors correspond to D2. That's important because the game uses these rolls frequently.

Rooms have various functions. Smelters let you convert ore into metal, Armories provide soldiers, Quarries generate stone, etc. Each room has costs and benefits. Some advanced rooms like Deep Delve or Burglar Alarms have special abilities.

Threats include monsters, thieves, and other events. Combat uses Strength and Defense with armor effects. Thieves steal half your trade goods unless you have burglar alarms. Monsters can cause damage or drain resources.

Trade is done by visiting the Marketplace on certain turns where you exchange goods. The Marketplace deck has different cards affecting trade values. You can sell excess goods or buy needed items at varying rates.

Ancient Ones are powerful enemies encountered as you delve deeper. They have high stats and unique abilities. Beating them gives significant rewards but poses risks if not prepared.

Endgame conditions involve reaching depth 10, accumulating wealth, or being overrun by threats. If you win, you get to set up a new hold with better starting conditions.

I need to structure this explanation step-by-step, covering setup, turn phases, actions in each phase, room building, handling threats, trading, and endgame conditions. Also mention the Bloodrunes as optional challenges with their own rules. Highlight how card draws replace dice rolls for D4 and D2. Maybe note that grid paper isn't mandatory, but it helps visualize the hold layout.

How to Play DELVE: Step-by-Step Guide

  1. Setup

    Materials Needed:
    Pencil/eraser, grid/notebook paper, notebook (for tracking resources), standard playing cards (Jokers removed initially), tokens (coins, dice, etc.).
    Start Your Hold:
    Draw a central stairwell on the top row of grid paper as the entrance.
    Resources:
    Begin with 20 Resources (♥) and 20 Trade Goods (♦) in your notebook. Add 5 Soldiers (tokenized) at the entrance.

  2. Deck & Game Layers

    Remove Jokers from the deck. At Depth 5, re-add them for "Legendary Finds."
    Use cards instead of dice:
    Roll 1D4: Suit = ♠=1, ♣=2, ♦=3, ♥=4.
    Roll 1D2: Color = Black=1, Red=2.

  3. Turn Structure

    Prepare Phase:

     Spend resources to build rooms (e.g., Smelter, Armory) or upgrade defenses.  
     Manage soldiers and prepare for threats.  
    

    Explore Phase:

     Draw two cards from the deck: choose one to reveal a room, threat, or resource.  
     Build rooms by placing them on grid paper (cost varies by type).  
    

    Retreat Phase:

     Resolve combat, theft, or other threats. Sold-out rooms generate income (e.g., Quarry → Stone).  
    
  4. Optional Challenges: Bloodrunes

    Place a Bloodrune (e.g., "Rune of Nightmares") on the first row before starting. Each rune triggers immediate effects (e.g., starting in a nightmare layer) but offers boons like combat bonuses.

  5. Rooms & Upgrades

    Basic Rooms:
    Smelter (3♥): Convert ore to metal.
    Armory (5♥): Add soldiers (1 per turn).
    Quarry (2♥): Gather stone.
    Advanced Rooms:
    Deep Delve (10♥): Explore deeper layers faster.
    Burglar Alarms (7♥): Mitigate thief losses.

  6. Threats

    Combat: Roll strength vs. enemy defense (armor reduces damage). Soldiers protect rooms.
    Thieves: Steal half your Trade Goods unless you have Burglar Alarms.
    Ancient Ones: Powerful bosses at deeper levels (e.g., "The Devourer") with unique abilities and high rewards.

  7. Trading & Marketplace

    Visit the Marketplace every 3 turns by drawing cards: adjust supply/demand for resources. Sell excess goods or buy needed items at fluctuating rates.

  8. Victory Conditions

Win by reaching Depth 10, accumulating 200+ Trade Goods, or surviving until the deck runs out. Lose if overrun by threats, depleted of resources, or defeated in combat.

  1. Bloodrune Example: Rune of Nightmares

    Start immediately on a nightmare layer without access to the Marketplace. Defeat ancient monsters for powerful rewards but risk doom.

Tips:

Use grid paper to visualize your hold’s layout.  
Balance room upgrades with threat mitigation.  
Bloodrunes add difficulty but enhance replayability.  

Enjoy delving into the depths! 🌋⚔️🗡️

4
20
submitted 5 days ago by [email protected] to c/[email protected]
5
12
submitted 6 days ago* (last edited 5 days ago) by [email protected] to c/[email protected]

Heres a link to the papers, starting around the end of page 8 is revelant paragraph. Thank you hendrik! https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf

6
19
submitted 1 week ago* (last edited 1 week ago) by [email protected] to c/[email protected]

Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positionates it as the #1 open source model on this benchmark.

It is finetuned from Mistral-Small-3.1, therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from Mistral-Small-3.1 the vision encoder was removed.

Learn more about Devstral in our blog post. Key Features:

Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents.

lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use.

Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes.

Context Window: A 128k context window.

Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size.

Learn more about Devstral in our blog post.

https://mistral.ai/news/devstral

7
3
submitted 1 week ago by [email protected] to c/[email protected]

My groupchats use those react emoji all the time. Maybe they could train a model to classify with those. Then use that classifier to help RL models into being funny.

All my funniest groupchats are on Snapchat.

I don't think this would be ethical, but it could be effective.

8
11
submitted 1 week ago* (last edited 1 week ago) by [email protected] to c/[email protected]

Gemma 3n includes the following key features:

Audio input: Process sound data for speech recognition, translation, and audio data analysis.

Visual and text input: Multimodal capabilities let you handle vision, sound, and text to help you understand and analyze the world around you.

PLE caching: Per-Layer Embedding (PLE) parameters contained in these models can be cached to fast, local storage to reduce model memory run costs. Learn more

MatFormer architecture: Matryoshka Transformer architecture allows for selective activation of the models parameters per request to reduce compute cost and response times. Learn more

Conditional parameter loading: Bypass loading of vision and audio parameters in the model to reduce the total number of loaded parameters and save memory resources. Learn more

Wide language support: Wide linguistic capabilities, trained in over 140 languages. 32K token context: Substantial input context for analyzing data and handling processing tasks.

9
30
submitted 1 week ago by [email protected] to c/[email protected]

Some days ago ROCm 6.4 was officially added to the Arch repositories - which is great - but it made my current setup completely explode - which is less great - and currently I don't have the necessary will to go and come back from gdb hell...

So I've taken this opportunity to set up a podman (docker alternative) container to use the older, and for me working, ROCm 6.3.3. On the plus side this has made it even easier to test new things and do random stuff: I will probably port my Vulkan setup too, at a later date.

Long story short I've decided to clean it up a bit, place a bunch of links and comments, and share it with you all in the hope it will help someone out.

You still need to handle the necessary requirements on your host system to make everything work, but I've complete trust in you! Even if it doesn't work, it is a starting point that I hope will give some direction on what to do.

BTW I'm not an expert in this field, so some things can be undoubtedly improved.

Assumptions

  • To make this simpler I will consider, and advice to use, this kind of folder structure:
base_dir
 ├─ROCm_debian_dev
 │  └─ Dockerfile
 └─llamacpp_rocm6.33
    ├─ logs
    │   └─ logfile.log
    ├─ workdir
    │   └─ entrypoint.sh
    ├─ Dockerfile
    └─ compose.yaml
  • I've tested this on Arch Linux. You can probably make it work on basically any current, and not too old distro, but it's untested.

  • You should follow the basic requirements from the AMD documentation, and cross your fingers. You can probably find a more precise guide on your distro wiki. Or just install any and all ROCm and HIP related SDKs. Sigh.

  • I'm using podman, which is an alternative to docker. It has some idiosyncrasies - which I will not get into because they would require another full write-up, so if you use docker it is possible you'll need to modify some things. I can't help you there.

  • This is given with no warranty: if your computer catches on fire, it is on you (code MIT/Apache 2 license, the one you prefer; text CC BY-SA 4.0). More at the end.

  • You should know what 'generation' of card yours is. ROCm works in mysterious ways and each card has its problems. Generally you can just steam roll forward, with no care, but you still need to find which HSA_OVERRIDE_GFX_VERSION your card needs to run under. For example for a rx6600xt/rx6650xt it would be gfx1030 and HSA_OVERRIDE_GFX_VERSION=10.3.0. Some info here: Compatibility Matrix You can (not so) easily search for the correct gfx and HSA codes on the web. I don't think the 9xxx series is currently supported, but I could be wrong.

  • There's an official Docker image in the llama.cpp repository, you could give that one a go. Personally I like doing them myself, so I understand what is going on when I inevitably bleed on the edge - in fact I didn't even consider the existence of an official Dockerfile until after writing this post.. Whelp. Still, they are two different approaches, pick your poison.

Dockerfile(s)

These can, at the higher level, be described as the recipe with which we will set up the virtual machine that will compile and run llama.cpp for us.

I will put here two Dockerfile, one can be used as a fixed base, while the second one can be re-built everytime you want to update llama.cpp.

Now, this will create a new container each time, we could use a volume (like a virtual directory shared between the host machine and the container) to just git pull the new code instead of cloning, but that would almost completely disregard the pro of running this in a container. TLDR: For now don't overthink it and go with the flow.

Base image

This is a pretty basic recipe, it gets the official dev-ubuntu image by AMD and then augment it to be suitable for our needs: you can easily use other versions of ROCm (for example dev-ubuntu-24.04:6.4-complete) or even ubuntu. You can find the filtered list of the images here: Link

Could we use a lighter image? Yes. Should we? Probably. Maybe next time.

tbh I've tried other images with no success, or they needed too much effort for a minimal reward: this Just Works™. YMMV.

base_dir/ROCm_debian_dev/Dockerfile

# This is the one that currently works for me, you can
# select a different one:
#   https://hub.docker.com/r/rocm/dev-ubuntu-24.04/tags
FROM docker.io/rocm/dev-ubuntu-24.04:6.3.3-complete
# 6.4.0
# FROM docker.io/rocm/dev-ubuntu-24.04:6.4-complete

# We update and then install some stuff.
# In theory we could delete more things to make the final
# image slimmer.
RUN apt-get update && apt-get install -y \
    build-essential \
    git \
    cmake \
    libcurl4-openssl-dev \
    && rm -rf /var/lib/apt/lists/*

It is a big image, over 30GB (around 6 to download for 6.3.3-complete and around 4 for 6.4-complete) in size.

Let's build it:

cd base_dir/ROCm_debian_dev/
podman build -t rocm-6.3.3_ubuntu-dev:latest .

This will build it and add it to your local images (you can see them with podman images) with the name rocm-6.3.3_ubuntu-dev and the tag latest. You can change them as you see fit, obviously. You can even give multiple tags to the same image, a common way is to have a more specific tag and then add the tag latest to the last one you have generated, so you don't have to change the other scripts that reference it. More info here: podman tag

The real image

The second image is the one that will handle the llama.cpp[server|bench] compilation and then execution, and you need to customize it:

  • You should modify the number after the -j based on the number of virtual cores that your CPU has, minus one. You can probably use nproc in a terminal to check for it.
  • You have to change the AMDGPU_TARGETS code based on your gfx version! pay attention, because the correct one is probably not the one returned by rocminfo, for example the rx6650xt is gfx1032, but that is not directly supported by ROCm. You have to use the supported (and basically identical) gfx1030 instead.

If you want to compile with a ROCm image after 6.3 you need to swap the commented lines. Still, no idea if it works or if it is even supported by llama.cpp.

More info, and some tips, here: Link

base_dir/llamacpp_rocm6.33/Dockerfile

FROM localhost/rocm-6.3.3_ubuntu-dev:latest

# This could be shortened, but I like to have multiple
# steps to make it clear, and show how to achieve
# things in different ways.
WORKDIR /app
RUN git clone https://github.com/ggml-org/llama.cpp.git
WORKDIR /app/llama.cpp
RUN mkdir build_hip
WORKDIR build_hip
# This will run the cmake configuration.
# Pre  6.4 -DAMDGPU_TARGETS=gfx1030
RUN HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" cmake -S .. -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release
# Post 6.4 -DGPU_TARGETS=gfx1030
# RUN HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" cmake -S .. -DGGML_HIP=ON -DGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release
# Here we build the binaries, both for the server and the bench.
RUN cmake --build . --config Release -j7 --target llama-server
RUN cmake --build . --config Release -j7 --target llama-bench

To build this one we will need to use a different command:

cd base_dir/llamacpp_rocm6.33/
podman build --no-cache -t rocm-6.3.3_llamacpp:b1234 .

As you can see we have added the --no-cache long flag, this is to make sure that the image gets compiled, otherwise it would just keep outputting the same image over and over from the cache - because the recipe didn't change. This time the tag is a b1234 placeholder, you should use the current release build number or the current commit short hash of llama.cpp (you can easily find them when you start the bin, or by going on the github page) to remember at which point you have compiled, and use the dynamic latest tag as a supplementary bookmark. The current date is a good candidate too.

If something doesn't feel right - for example your GPU is not running when you make a request to the server - you should try to read the configuration step logs, to see that everything required has been correctly set up and there are no errors.

Let's compose it up

Now that we have two images that have compiled without any kind of error we can use them to reach our goal. I've heavily commented it, so just read and modify it directly. Don't worry too much about all the lines, but if you are curious - and you should - you can easily search for them and find a bunch of explanations that are surely better than what I could write here without occupying too much space.

Being a yaml file - bless the soul of whoever decided that - pay attention to the whitespaces! They matter!

We will use two Volumes, one will point to the folder where you have downloaded your GGUF files. The second one will point to where we have the entrypoint.sh file. We are putting the script into a volume instead of backing it into the container so you can easily modify it, to experiment.

A small image that you could use as a benchmark to see if everything is working is Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf.

base_dir/llamacpp_rocm6.33/compose.yaml

# Benchmark image: https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/blob/main/Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf
# bechmark command:
#    ./bin/llama-bench -t 7 -m /app/models/Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf -ngl 99 -fa 1 -ctk q4_0 -ctv q4_0
#    ./bin/llama-bench -t 7 -m /app/models/Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf -ngl 99
services:
    llamacpp-server:
        # If you have renamed the image, change it here too!
        image: localhost/rocm-6.3.3_llamacpp:latest
        # The subsequent two lines are needed to enter the image and directly use bash:
        # start it with [podman-compose up -d|docker compose up -d]
        # and then docker attach to the container with
        # [podman|docker] attach ID
        # You'll need to change the entrypoint.sh file too, just with the
        # shebang and a line straight up calling `bash`, as content.
        stdin_open: true
        tty: true
        # end bash section, Comment those two lines if you don't need shell
        # access. Or leave them.
        group_add:
            # The video group is needed on most distros to access the GPU
            # the render group is not present in some and needed
            # in others. Try it out.
            - "video" # 985 # video group - "render" # 989 # render
        environment:
            # FIXME: Change this with the right one!
            # If you have a wrong one it will _not work_.
            - HSA_OVERRIDE_GFX_VERSION=10.3.0
        devices:
            - /dev/kfd:/dev/kfd
            - /dev/dri:/dev/dri
        cap_add:
            - SYS_PTRACE
        logging:
            # The default logging driver is journald, which I despise
            # because it can pollute it up pretty hard.
            #
            # The none driver will not save the logs anywhere.
            # You can still attach to the container, but you will lose
            # the lines before the attachment.
            # driver: none
            #
            # The json-file option is deprecated, so we will use the
            # k8s-file one.
            # You can use `podman-compose logs -f` to keep tabs, and it will not
            # pollute the system journal.
            # Remember to `podman-compose down` to stop the container.
            # `ctrl+c`ing the logs will do nothing.
            driver: k8s-file
            options:
                max-size: "10m"
                max-file: "3"
                # You should probably use an absolute path.
                # Really.
                path: ./logs/logfile.log
        # This is mostly a fix for how podman net stack works.
        # If you are offline when starting the image it would just not
        # start, erroring out. Making it in host mode solves this
        # but it has other cons.
        # Reading the issue(https://github.com/containers/podman/issues/21896) it is
        # probably fixed, but I still have to test it out.
        # It meanly means that you can't have multiple of this running because they will
        # take the same port. Lucky you you can change the port from the llama-server
        # command in the entrypoint.sh script.
        network_mode: "host"
        ipc: host
        security_opt:
            - seccomp:unconfined
        # These you really need to CHANGE.
        volumes:
            # FIXME: Change these paths! Only the left side before the `:`.
            #        Use absolute paths.
            - /path/on/your/machine/where/the/ggufs/are:/app/models
            - /path/to/rocm6.3.3-llamacpp/workdir:/workdir
        # It doesn't work with podman-compose
        # restart: no
        entrypoint: "/workdir/entrypoint.sh"
        # To make it easy to use I've added a number of env variables
        # with which you can set the llama.cpp command params.
        # More info in the bash script, but they are quite self explanatory.
        command:
            - "${MODEL_FILENAME:-Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf}"
            - "${GPU_LAYERS:-22}"
            - "${CONTEXT_SIZE:-8192}"
            - "${CALL_TYPE:-bench}"
            - "${CPU_THREADS:-7}"

Now that you have meticulously modified the above file let's talk about the script that will launch llama.cpp.

base_dir/llamacpp_rocm6.33/workdir/entrypoint.sh

#!/bin/bash
cd /app/llama.cpp/build_hip || exit 1
MODEL_FILENAME=${1:-"Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf"}
GPU_LAYERS=${2:-"22"}
CONTEXT_SIZE=${3:-"8192"}
CALL_TYPE=${4:-"server"}
CPU_THREADS=${5:-"7"}

if [ "$CALL_TYPE" = "bench" ]; then
  ./bin/llama-bench -t "$CPU_THREADS" -m /app/models/"$MODEL_FILENAME" -ngl "$GPU_LAYERS"
elif [ "$CALL_TYPE" = "fa-bench" ]; then
  ./bin/llama-bench -t "$CPU_THREADS" -m /app/models/"$MODEL_FILENAME" -ngl "$GPU_LAYERS" -fa 1 -ctk q4_0 -ctv q4_0
elif [ "$CALL_TYPE" = "server" ]; then
  ./bin/llama-server -t "$CPU_THREADS" -c "$CONTEXT_SIZE" -m /app/models/"$MODEL_FILENAME" -fa -ngl "$GPU_LAYERS" -ctk q4_0 -ctv q4_0
else
  echo "Valid modalities are \"bench\", \"fa-bench\" or \"server\""
  exit 1
fi

exit 0

This is straightforward. It enters the folder (inside the container) where we built the binary and then calls the right command, decided with an env var. I've set it up to handle some common options, so you don't have to change the script every time you want to run a different model or change the number of layers loaded on VRAM.

The beauty of it is that you could put a .env file in the llamacpp_rocm6.33 folder with the params you want to use, and just start the container.

An example .env file could be:

base_dir/llamacpp_rocm6.33/.env

MODEL_FILENAME=Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf
GPU_LAYERS=99
CONTEXT_SIZE=8192
CALL_TYPE=bench
CPU_THREADS=7

Some notes:

  • For now it uses flash attention by default with a quantized context. You can avoid this by deleting the -fa and the -ctk q4_0 -ctv q4_0. Experiment around.
  • You could add more params or environmental variables: it is easy to do. How about one for the port number?
  • Find more info about llama.cpp server here: Link.
  • And the bench here: Link.
  • For now I've set up three commands, one is the server, one is a plain bench and another is a bench with FlashAttention enabled. server, bench, fa-bench.

Time to start it

Starting it is just a command away:

cd base_dir/llamacpp_rocm6.33/
podman-compose up -d
podman-compose logs -f

When everything is completely loaded, open your browser and go to http://127.0.0.1:8080/ to be welcomed by the llama.cpp webui and test if the GPU is being used. (I've my fingers crossed for you!)

Now that everything is working, have fun with your waifus and/or husbandos! ..Sorry, I meant, be productive with your helpful assistant!

When you are done, in the same folder, run podman-compose down to mercilessly kill them off.

Licensing

I know, I know. But better safe than sorry.

All the code, configurations and comments in them not otherwise already under other licenses or under copyright by others, are dual licensed under the MIT and Apache 2 licenses, Copyright 2025 [[email protected]](https://feddit.it/u/Mechanize) . Take your pick.

All the other text of the post © 2025 by [email protected] is licensed under CC BY-SA 4.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/4.0/

10
6
submitted 1 week ago by [email protected] to c/[email protected]

If you are an agent builder, these three protocols should be all you need

  • MCP gives agents tools
  • A2A allows agents to communicate with other agents
  • AG-UI brings your agents to the frontend, so they can engage with users.

Is there anything I'm missing?

11
67
submitted 1 week ago by [email protected] to c/[email protected]

"While the B60 is designed for powerful 'Project Battlematrix' AI workstations sold as full systems ranging from $5,000 to $10,000, it will carry a roughly $500 per-unit price tag."

12
81
submitted 1 week ago by [email protected] to c/[email protected]
13
184
submitted 2 weeks ago by [email protected] to c/[email protected]

from 10b0t0mized: I miss the days when I had to go through a humiliation ritual before getting my questions answered.

Now days you can just ask your questions from an infinitely patient entity, AI is really terrible.

14
13
submitted 2 weeks ago by [email protected] to c/[email protected]
15
15
submitted 2 weeks ago by [email protected] to c/[email protected]
16
5
submitted 2 weeks ago by [email protected] to c/[email protected]

I took a practice test (math) and would like to have it be graded by a LLM since I can't find the key online. I have 20GB VRAM, but I'm on intel Arc so I can't do gemma3. I would prefer models from ollama.com 'cause I'm not deep enough down the rabbit hole to try huggingface stuff yet and don't have time to right now.

17
19
submitted 3 weeks ago* (last edited 3 weeks ago) by [email protected] to c/[email protected]

This fork introduces a Radio Station feature where AI generates continuous radio music. The process involves two key components:

LLM: Generates the lyrics for the songs. ACE: Composes the music for the generated lyrics.

Due to the limitations of slower PCs, the demo video includes noticeable gaps (approximately 4 minutes) between the generated songs.

If your computer struggles to stream songs continuously, increasing the buffer size will result in a longer initial delay but fewer gaps between songs (until the buffer is depleted again).

By default the app attempts to load the model file gemma-3-12b-it-abliterated.q4_k_m.gguf from the same directory. However, you can also use alternative LLMs. Note that the quality of generated lyrics will vary depending on the LLM's capabilities.

18
20
32B olmo-2 03/25 (huggingface.co)
submitted 3 weeks ago by [email protected] to c/[email protected]

model:
32B olmo-2 03/25

https://arxiv.org/abs/2501.00656

"We release all OLMo 2 artifacts openly -- models at 7B and 13B scales, both pretrained and post-trained, including their full training data, training code and recipes, training logs and thousands of intermediate checkpoints. "

19
19
Show HN: Clippy, 90s UI for local LLMs (felixrieseberg.github.io)
submitted 3 weeks ago by [email protected] to c/[email protected]
20
13
submitted 3 weeks ago by [email protected] to c/[email protected]

Hi, I'm not too informed about LLMs so I'll appreciate any correction to what I might be getting wrong. I have a collection of books I would like to train an LLM on so I could use it as a quick source of information on the topics covered by the books. Is this feasible?

21
12
submitted 3 weeks ago by [email protected] to c/[email protected]

Something I always liked about NousResearch is how they seemingly try to understand cognition in a more philosophical/metaphysically symbolic way and aren't afraid to let you know it. I think their unique view may allow them to find some new perspectives that allow for advancement in the field. Check out AscensionMaze in particular the wording they use is just fascinating.

22
18
submitted 4 weeks ago* (last edited 4 weeks ago) by [email protected] to c/[email protected]

I'm interested in really leveraging the full capabilities of local ai, for code generation and everything else. let me know what you people are using.

23
27
submitted 4 weeks ago* (last edited 3 weeks ago) by [email protected] to c/[email protected]

It's amazing how far open source LLMs have come.

Qwen3-32b recreated the Windows95 Starfield screensaver as a web app with the bonus feature to enable "warp drive" on click. This was generated with reasoning disabled (/no_think) using a 4-bit quant running locally on a 4090.

Here's the result: https://codepen.io/mekelef486/pen/xbbWGpX

Model: Qwen3-32B-Q4_K_M.gguf (Unsloth quant)

Llama.cpp Server Docker Config:

docker run \
-p 8080:8080 \
-v /path/to/models:/models \
--name llama-cpp-qwen3-32b \
--gpus all \
ghcr.io/ggerganov/llama.cpp:server-cuda \
-m /models/qwen3-32b-q4_k_m.gguf \
--host 0.0.0.0 --port 8080 \
--n-gpu-layers 65 \
--ctx-size 13000 \
--temp 0.7 \
--top-p 0.8 \
--top-k 20 \
--min-p 0

System Prompt:

You are a helpful expert and aid. Communicate clearly and succinctly. Avoid emojis.

User Prompt:

Create a simple web app that uses javascript to visualize a simple starfield, where the user is racing forward through the stars from a first person point of view like in the old Microsoft screensaver. Stars must be uniformly distributed. Clicking inside the window enables "warp speed" mode, where the visualization speeds up and star trails are added. The app must be fully contained in a single HTML file. /no_think

24
94
Technically correct (aussie.zone)
submitted 1 month ago by [email protected] to c/[email protected]
25
43
Qwen3 officially released (qwenlm.github.io)
submitted 1 month ago by [email protected] to c/[email protected]
view more: next ›

LocalLLaMA

3006 readers
7 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

founded 2 years ago
MODERATORS