638

What is the most difficult problem that you have fixed in linux? (lemmy.world)

submitted 1 year ago by [email protected] to c/[email protected]

158 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] [email protected] 73 points 1 year ago* (last edited 1 year ago)

I manage a machine that runs both media transcodes and some video game servers.

The video game servers have to run in real-time, or very close to it. Otherwise players using them suffer noticeable lag.

Achieving this at the same time that an ffmpeg process was running was completely impossible. No matter what I did to limit ffmpegs use of CPU time. Even when running it at lowest priority it impacted the game server processes running at top priority. Even if I limited it to one thread, it was affecting things.

I couldn't understand the problem. There was enough CPU time to go around to do both things, and the transcode wasn't even time sensitive, while the game server was, so why couldn't the Linux kernel just figure it out and schedule things in a way that made sense?

So, for the first time I read up on how computers actually handle processes, multi-tasking and CPU scheduling.

As FFMPEG is an application that uses ALL available CPU time until a task is done, I came to the conclusion that due to how context switching works (CPU cores can only do one thing, they just switch out what they do really fast, but this too takes time) it was causing the system to fall behind on the video game processes when the system was operating with zero processing headroom. The scheduler wasn't smart enough to maintain a real-time process in the face of FFMPEG, which would occupy ALL available cycles.

I learned the solution was core pinning. Manually setting processes to run on certain cores of the CPU. I set FFMPEG to use only one core, since it doesn't matter how fast it completes. And I set the game processes to use all but that one core, so they don't accidentally end up queueing for CPU time on a core that doesn't have the headroom to allow the task to run within a reasonable time range.

This has completely solved the problem, as the game processes and FFMPEG no longer wait for CPU cycles in the same queue.

[-] [email protected] 7 points 1 year ago

Well that's interesting.... I'd have thought, possibly naively, that as long as a thread had work to do it would essentially behave like ffmpeg does?

Perhaps there's something about the type of work though, that it's very CPU-bound or something?

[-] [email protected] 11 points 1 year ago* (last edited 1 year ago)

I think the difference is simply that most processes only have a certain amount that needs accomplishing in a given unit of time. As long as they can get enough CPU time, and do so soon enough after getting in line for it, they can maintain real-time execution.

Very few workloads have that much to do for that long. But I would expect other similar workloads to present the same problem.

There is a useful stat which Linux tracks in addition to a simple CPU usage percentage. The "load average" represents the average number of processes that have requested CPU time, but have to queue for it.

As long as the number is lower than the available number of cores, this essentially means that whenever one process is done running a task, the next in line can get right on with theirs.

If the load average is less than the number of cores available, that means the cores have idle time where they are essentially just waiting for a process to need them for something. Good for time-sensitive processes.

If the load average is above the number of cores, that means some processes are having to wait for several cycles of other processes having their turn, before they can execute their tasks. Interestingly, the load average can go beyond this threshold way before the CPU hits 100% usage.

I found that I can allow my system to get up to a load average of about 1.5 times the number of cores available, before you start noticing it when playing on one of the servers I run.

And whenever ffmpeg was running, the load average would spike to 10-20 times the number of cores. Not good.

[-] [email protected] 5 points 1 year ago

That makes complete sense - if you've got something 'needy', as soon as it's queuing up, I imagine it snowballs, too...

10-20 times the core count is crazy, but I guess it's had a lot of development effort into parallelizing it's execution, which of course goes against what your use case is :)

[-] [email protected] 7 points 1 year ago* (last edited 1 year ago)

Theoretically a load average could be as high as it likes, it's essentially just the length of the task queue, after all.

Processes having to queue to get executed is no problem at all for lots of workloads. If you're not running anything latency-sensitive, a huge load average isn't a problem.

Also it's not really a matter of parallelization. Like I mentioned, ffmpeg impacted other processes even when restricted to running in a single thread.

That's because most other processes will do work in small chunks that complete within nanoseconds. Send a network request, parse some data, decode an image, poll HID device, etc.

A transcode meanwhile can easily have a CPU running full tilt for well over a second, working on just that one thing. Most processes will show up and go "I need X amount of CPU time" while ffmpeg will show up and go "give me all available CPU time" which is something the scheduler can't actually quantify.

It's like if someone showed up at a buffet and asked for all the food that no-one else is going to eat. How do you determine exactly how much that is, and thereby how much it is safe to give this person without giving away food someone else might've needed?

You don't. Without CPU headroom it becomes very difficult for the task scheduler to maintain low system latency. It'll do a pretty good job, but inevitably some CPU time that should have gone to other stuff, will go the process asking for as much as it can get.

[-] [email protected] 4 points 1 year ago

This reminded me of how I disabled processor cores in Process Lasso for programs

this post was submitted on 26 Mar 2024

638 points (96.4% liked)

linuxmemes

25823 readers

866 users here now

Hint: :q!

Sister communities:

Community rules (click to expand)

1. Follow the site-wide rules

Instance-wide TOS: https://legal.lemmy.world/tos/
Lemmy code of conduct: https://join-lemmy.org/docs/code_of_conduct.html

2. Be civil

Understand the difference between a joke and an insult.

Do not harrass or attack users for any reason. This includes using blanket terms, like "every user of thing".

Don't get baited into back-and-forth insults. We are not animals.

Leave remarks of "peasantry" to the PCMR community. If you dislike an OS/service/application, attack the thing you dislike, not the individuals who use it. Some people may not have a choice.

Bigotry will not be tolerated.

3. Post Linux-related content

Including Unix and BSD.

Non-Linux content is acceptable as long as it makes a reference to Linux. For example, the poorly made mockery of sudo in Windows.

No porn, no politics, no trolling or ragebaiting.

4. No recent reposts

Everybody uses Arch btw, can't quit Vim, <loves/tolerates/hates> systemd, and wants to interject for a moment. You can stop now.

5. 🇬🇧 Language/язык/Sprache

This is primarily an English-speaking community. 🇬🇧🇦🇺🇺🇸

Comments written in other languages are allowed.

The substance of a post should be comprehensible for people who only speak English.

Titles and post bodies written in other languages will be allowed, but only as long as the above rule is observed.

6. (NEW!) Regarding public figures

We all have our opinions, and certain public figures can be divisive. Keep in mind that this is a community for memes and light-hearted fun, not for airing grievances or leveling accusations.

Keep discussions polite and free of disparagement.

We are never in possession of all of the facts. Defamatory comments will not be tolerated.

Discussions that get too heated will be locked and offending comments removed.

Please report posts and comments that break these rules!

Important: never execute code or follow advice that you don't understand or can't verify, especially here. The word of the day is credibility. This is a meme community -- even the most helpful comments might just be shitposts that can damage your system. Be aware, be smart, don't remove France.

founded 2 years ago

MODERATORS

[email protected]