micheal65536

joined 1 year ago
[–] [email protected] 5 points 1 year ago (1 children)

IMO, local LLMs lack the capabilities or depth of understanding to be useful for most practical tasks (e.g. writing code, automation, language analysis). This will heavily skew any local LLM "usage statistics" further towards RP/storytelling (a significant proportion of which will always be NSFW in nature).

[–] [email protected] 4 points 1 year ago

Stable Diffusion 2 base model is trained using what we would today refer to as a "censored" dataset. Stable Diffusion 1 dataset included NSFW images, the base model doesn't seem particularly biased towards or away from them and can be further trained in either direction as it has the foundational understanding of what those things are.

[–] [email protected] 1 points 1 year ago

This sounds like a timing issue to me. The thread bunching up may be due to the hook not grabbing the thread or the take-up lever not taking up the slack at the correct time. If it's missing stitches in zig-zag mode then that would also be due to either hook timing or possibly needle bar alignment.

Simple things to check:

  • Make sure that the needle is installed correctly, especially that it is oriented the right way and inserted all the way in

  • Make sure that the take-up lever is threaded correctly

Assuming these are both correct, you can try the following:

  • If possible, insert a fresh needle (at least, you will need a needle that is undamaged and not bent from the shank up to the eye)

  • Remove the plate, leave the machine unthreaded

  • On the straight stitch setting, turn the hand wheel slowly and check that the eye of the needle is exactly level with the hook as they pass each other (this should happen close to the bottom of the needle's stroke but may not be exactly at the bottom)

  • On the widest zig-zag stitch setting, again turn the hand wheel slowly and check that the eye of the needle passes closely to the hook (it won't be exact because the needle has moved, but it should be just slightly early on one side and just slightly late on the other, not noticeably early or late on one side) and also check that the needle is not colliding with any solid parts of the machine on either side

If the eye and the hook are not aligned as they pass each other, then you have either a timing or a needle height alignment issue. If they pass correctly on the straight stitch but the needle is noticeably early or late on one side of the zig-zag stitch (and fine on the other side) then you have an issue with the horizontal alignment of the zig-zag stitch.

[–] [email protected] 1 points 1 year ago

I haven't come across any significant discussion surrounding this before and I wouldn't recommend choosing a machine on this basis.

A front-loading bobbin is only an advantage for changing mid-task if you catch it before the thread runs out, otherwise you'll be backtracking and starting again anyway once you've replaced it. I suppose if there is a viewing window and you can see when it is about to run out then this is an advantage, otherwise you won't know when to stop and change it anyway until you notice that it has already run out.

In terms of speed I doubt you will find any typical sewing machine "too slow" unless you plan to sew a lot and you want it finished quickly. For a few repairs or alterations and the occasional custom piece speed is not a priority, most of the time you will want to go slower anyway for more control/accuracy.

I think you need to put less thought into what machine you get and more thought into getting some machine and start sewing without thinking so much about details like how the bobbin is loaded. As a beginner these things don't matter, and by the time you are non-beginner enough for them to matter then you will know what aspects are important to you and if you want to upgrade. As it is, you can't really jump to making "expert-level" choices because you don't have the experience to know, for example, if speed is even a priority to you.

[–] [email protected] 1 points 1 year ago (4 children)

So... If this doesn't actually increase the context window or otherwise increase the amount of text that the LLM is actually able to see/process, then how is it fundamentally different to just "manually" truncating the input to fit in the context size like everyone's already been doing?

[–] [email protected] 1 points 1 year ago

I tried getting it to write out a simple melody using MIDI note numbers once. I didn't think of asking it for LilyPond format, I couldn't think of a text-based format for music notation at the time.

It was able to produce a mostly accurate output for a few popular children's songs. It was also able to "improvise" a short blues riff (mostly keeping to the correct scale, and showing some awareness of/reference to common blues themes), and write an "answer" phrase (which was suitable and made musical sense) to a prompt phrase that I provided.

[–] [email protected] 3 points 1 year ago (3 children)

To be honest, the same could be said of LLaMa/Facebook (which doesn't particularly claim to be "open", but I don't see many people criticising Facebook for doing a potential future marketing "bait and switch" with their LLMs).

They're only giving these away for free because they aren't commercially viable. If anyone actually develops a leading-edge LLM, I doubt they will be giving it away for free regardless of their prior "ethics".

And the chance of a leading-edge LLM being developed by someone other than a company with prior plans to market it commercially is quite small, as they wouldn't attract the same funding to cover the development costs.

[–] [email protected] 1 points 1 year ago (5 children)

IMO the availability of the dataset is less important than the model, especially if the model is under a license that allows fairly unrestricted use.

Datasets aren't useful to most people and carry more risk of a lawsuit or being ripped off by a competitor than the model. Publishing a dataset with copyrighted content is legally grey at best, while the verdict is still out regarding a model trained on that dataset and the model also carries with it some short-term plausible deniability.

[–] [email protected] 3 points 1 year ago (2 children)

So you know how on motorcycles people have those hard "cases" at the back which close with a flap and keep the contents fully contained?

Yeah, I wish someone made those but for a bicycle. It annoys me having to worry about my stuff falling out of the basket into the road every time I go over a bump.

[–] [email protected] 2 points 1 year ago

Yeah, I think you need to set the contextsize and ropeconfig. Documentation isn't completely clear and in some places sort of implies that it should be autodetected based on the model when using a recent version, but the first thing I would try is setting these explicitly as this definitely looks like an encoding issue.

[–] [email protected] 2 points 1 year ago

I would guess that this is possibly an issue due to the model being a "SuperHOT" model. This affects the way that the context is encoded and if the software that uses the model isn't set up correctly for it you will get issues such as repeated output or incoherent rambling with words that are only vaguely related to the topic.

Unfortunately I haven't used these models myself so I don't have any personal experience here but hopefully this is a starting point for your searches. Check out the contextsize and ropeconfig parameters. If you are using the wrong context size or scaling factor then you will get incorrect results.

It might help if you posted a screenshot of your model settings (the screenshot that you posted is of your sampler settings). I'm not sure if you configured this in the GUI or if the only model settings that you have are the command-line ones (which are all defaults and probably not correct for an 8k model).

[–] [email protected] 4 points 1 year ago (1 children)

TBH my experience with SillyTavern was that it merely added another layer of complexity/confusion to the prompt formatting/template experience, as it runs on top of text-generation-webui anyway. It was easy for me to end up with configurations where e.g. the SillyTavern turn template would be wrapped inside the text-generation-webui one, and it is very difficult to verify what the prompt actually looks like by the time it reaches the model as this is not displayed in any UI or logs anywhere.

For most purposes I have given up on any UI/frontend and I just work with llama-cpp-python directly. I don't even trust text-generation-webui's "notebook" mode to use my configured sampling settings or to not insert extra end-of-text tokens or whatever.

 

You are probably familiar with the long list of various benchmarks that new models are tested on and compared against. These benchmarks are supposedly designed to assess the model's ability to perform in various aspects of language understanding, logical reasoning, information recall, and so on.

However, while I understand the need for an objective and scientific measurement scale, I have long felt that these benchmarks are not particularly representative of the actual experience of using the models. For example, people will claim that a model performs at "some percentage of GPT-3" and yet not one of these models has ever been able to produce correctly-functioning code for any non-trivial task or follow a line of argument/reasoning. Talking to GPT-3 I have felt that the model has an actual in-depth understanding of the text, question, or argument, whereas other models that I have tried always feel as though they have only a superficial/surface-level understanding regardless of what the benchmarks claim.

My most recent frustration, and the one that prompted this post, is regarding the newly-released OpenOrca preview 2 model. The benchmark numbers claim that it performs better than other 13B models at the time of writing, supposedly outperforms Microsoft's own published benchmark results for their yet-unreleased model, and scores an "average" result of 74.0% against GPT-3's 75.7% while the LLaMa model that I was using previously apparently scores merely 63%.

I've used GPT-3 (text-davinci-003), and this model does not "come within comparison" of it. Even giving it as much of a fair chance as I can, giving it plenty of leeway and benefit of the doubt, not only can it still not write correct code (or even valid code in a lot of cases) but it is significantly worse at it than LLaMa 13B (which is also pretty bad). This model does not understand basic reasoning and fails at basic reasoning tasks. It will write a long step-by-step explanation of what it claims that it will do, but the answer itself contradicts the provided steps or the steps themselves are wrong/illogical. The model has only learnt to produce "step by step reasoning" as an output format, and has a worse understanding of what that actually means than any other model does when asked to "explain your reasoning" (at least, for other models that I have tried, asking them to explain their reasoning produces at least a marginal improvement in coherence).

There is something wrong with these benchmarks. They do not relate to real-world performance. They do not appear to be measuring a model's ability to actually understand the prompt/task, but possibly only measuring its ability to provide an output that "looks correct" according to some format. These benchmarks are not a reliable way to compare model performance and as long as we keep using them we will keep producing models that score higher on benchmarks and claim to perform "almost as good as GPT-3" but yet fail spectacularly in any task/prompt that I can think of to throw at them.

(I keep using coding as an example however I have also tried other tasks besides code as I realise that code is possibly a particularly challenging task due to requirements like needing exact syntax. My interpretation of the various models' level of understanding is based on experience across a variety of tasks.)

view more: next ›