projectmoon

joined 1 year ago
[–] [email protected] 23 points 1 week ago (1 children)

https://agnos.is/posts/tech-recruitment-is-out-of-control.html

This was my experience at the beginning of 2024. It was bad enough that I had to write a blog post about it.

[–] [email protected] 2 points 1 week ago

Have you tried Matrix?

[–] [email protected] 5 points 1 week ago (1 children)

LLMs are statistical word association machines. Or tokens more accurately. So if you tell it to not make mistakes, it'll likely weight the output towards having validation, checks, etc. It might still produce silly output saying no mistakes were made despite having bugs or logic errors. But LLMs are just a tool! So use them for what they're good at and can actually do, not what they themselves claim they can do lol.

[–] [email protected] 1 points 2 weeks ago

OpenWebUI connected tabbyUI's OpenAI endpoint. I will try reducing temperature and seeing if that makes it more accurate.

[–] [email protected] 1 points 2 weeks ago (2 children)

Context was set to anywhere between 8k and 16k. It was responding in English properly, and then about halfway to 3/4s of the way through a response, it would start outputting tokens in either a foreign language (Russian/Chinese in the case of Qwen 2.5) or things that don't make sense (random code snippets, improperly formatted text). Sometimes the text was repeating as well. But I thought that might have been a template problem, because it seemed to be answering the question twice.

Otherwise, all settings are the defaults.

[–] [email protected] 1 points 2 weeks ago (4 children)

I tried it with both Qwen 14b and Llama 3.1. Both were exl2 quants produced by bartowski.

[–] [email protected] 3 points 2 weeks ago

Perplexica works. It can understand ollama and custom OpenAI providers.

[–] [email protected] 1 points 2 weeks ago (6 children)

Super useful guide. However after playing around with TabbyAPI, the responses from models quickly become jibberish, usually halfway through or towards the end. I'm using exl2 models off of HuggingFace, with Q4, Q6, and FP16 cache. Any tips? Also, how do I control context length on a per-model basis? max_seq_len in config.json?

[–] [email protected] 2 points 3 weeks ago

Seems to be the only necessary thing in my case! Thanks.

[–] [email protected] 2 points 4 weeks ago (2 children)

Yeah I definitely have the default GTK chooser. Guess I have some config playing to do later.

[–] [email protected] 1 points 4 weeks ago (6 children)

Can you explain a bit more about this and how to configure it? When I use FF on gnome, the save dialogue just looks like other dialogues?

[–] [email protected] 23 points 1 month ago (1 children)

Not necessarily. While of course in many many cases, open source is a volunteer effort, there's usually some implicit transaction going on. Whether that's improving the software for yourself and passing that on to others, being a business and improving a library or something you use that helps your project generate revenue, or even a straight up commercial transaction.

But in all these cases, the open source project can be taken by you (or others) and you can do whatever you want with it. In the case of Winamp here, you cannot do any of that. It would be different if they were paying for contributions. But they're not, so.

 

Over the weekend (this past Saturday specifically), GPT-4o seems to have gone from capable and rather free for generating creative writing to not being able to generate basically anything due to alleged content policy violations. It'll just say "can't assist with that" or "can't continue." But 80% of the time, if you regenerate the response, it'll happily continue on its way.

It's like someone updated some policy configuration over the weekend and accidentally put an extra 0 in a field for censorship.

GPT-4 and GPT 3.5 seem unaffected by this, which makes it even weirder. Switching to GPT 4 will have none of the issues that 4o is having.

I noticed this happening literally in the middle of generating text.

See also: https://old.reddit.com/r/ChatGPT/comments/1droujl/ladies_gentlemen_this_is_how_annoying_kiddie/

https://old.reddit.com/r/ChatGPT/comments/1dr3axv/anyone_elses_ai_refusing_to_do_literally_anything/

 

Current situation: I've got a desktop with 16 GB of DDR4 RAM, a 1st gen Ryzen CPU from 2017, and an AMD RX 6800 XT GPU with 16 GB VRAM. I can 7 - 13b models extremely quickly using ollama with ROCm (19+ tokens/sec). I can run Beyonder 4x7b Q6 at around 3 tokens/second.

I want to get to a point where I can run Mixtral 8x7b at Q4 quant at an acceptable token speed (5+/sec). I can run Mixtral Q3 quant at about 2 to 3 tokens per second. Q4 takes an hour to load, and assuming I don't run out of memory, it also runs at about 2 tokens per second.

What's the easiest/cheapest way to get my system to be able to run the higher quants of Mixtral effectively? I know that I need more RAM Another 16 GB should help. Should I upgrade the CPU?

As an aside, I also have an older Nvidia GTX 970 lying around that I might be able to stick in the machine. Not sure if ollama can split across different brand GPUs yet, but I know this capability is in llama.cpp now.

Thanks for any pointers!

 

Not sure if this has been asked before or not. I tried searching and couldn't find anything. I have an issue where any pictures from startrek.website do not show up on the homepage. It seems to only affect startrek.website. Going to the link directly loads the image just fine. Is this something wrong with lemm.ee?

2
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]
 

For the past few days, the android app has been very slow. The app itself loads fine and is responsive, but it takes many seconds to load messages, sometimes up to 30 seconds. At first I thought it was a blip, but it's been going on for a few days now. Anyone else have this problem?

Edit: clearing cache in the app settings (not system settings) fixed it.

 

This has probably already been asked before, but:

The magazines of kbin federate as Lemmy communities, but is the microblog section of a kbin magazine accessible via Lemmy?

view more: next ›