What model to grade practice test? (sh.itjust.works)

submitted 1 month ago by [email protected] to c/[email protected]

13 comments fedilink hide all child comments

I took a practice test (math) and would like to have it be graded by a LLM since I can't find the key online. I have 20GB VRAM, but I'm on intel Arc so I can't do gemma3. I would prefer models from ollama.com 'cause I'm not deep enough down the rabbit hole to try huggingface stuff yet and don't have time to right now.

you are viewing a single comment's thread
view the rest of the comments

[-] [email protected] 1 points 1 month ago

What is your GPU? To be blunt, there is no Arc card with 20GB of VRAM, so that may actually be your IGP.

[-] [email protected] 1 points 1 month ago

B580+a750. They do work together.

[-] [email protected] 1 points 1 month ago* (last edited 1 month ago)

Oh yeah, presumably through SYCL or Vulcan splitting.

Id try Qwen3 30B, maybe a custom quantization if it doesn’t quite fit in your vram pool (as it should be very close). It should be very fast and quite smart.

Qwen3 32B would fit too (a fully dense model), but you would definitely need to tweak the settings without it being really slow.

[-] [email protected] 1 points 1 month ago

Qwen3 also doesn't work because I'm using the ipex llm docker container which has ollama 5.8 or something. It doesn't matter now because I have taken the test I was practicing for since posting this. Playing with qwen3 on CPU, it seems good but the reasoning feels like most open reasoning models where it gets the right answer then goes "wait that's not right..."

[-] [email protected] 2 points 1 month ago* (last edited 1 month ago)

Yeah it does that, heh.

The Qwen team recommend a fairly high temperature, but I find it's better with modified sampling (lower temperature, 0.1 MinP, a bit of rep penalty or DRY). Then it tends to not "second guess" itself and take the lower probability choice of continuing to reason.

If you're looking for alternatives, Koboldcpp does support Vulkan. It may not be as fast as the (SYCL?) docker container, but supports new models and more features. It's also precompiled as a one click exe: https://github.com/LostRuins/koboldcpp

this post was submitted on 11 May 2025

5 points (69.2% liked)

LocalLLaMA

3297 readers

59 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago

MODERATORS

[email protected]