CPU ans O3AS and X3D CPUs

seewo
seewo
Joined: 7 Oct 22
Posts: 14
Credit: 103362746
RAC: 268153
Topic 230430

As you see, O3AS takes almost half of its time running purely CPU. Though there're some tricks like staggering multiple tasks, I don't think that's a nice enough solution (as many GPU does not have enough VRAM to do so). 

First, on my laptop it takes about 1min to load the data into the VRAM. That's about 100MB/s. What is the bottleneck, CPU, PCIe bandwidth or SSD bandwidth? If the latter, I should consider buying a better one.

Besides, in the final period it takes long to sort the candidates. However, it is not that necessary to consume so much time. Sorting can be parallelized (please refer to https://github.com/axel92b/multithreaded_sort). And there's not need to sort exactly: the final few recursions can be skipped to save much time of function calls (if use quicksort rather than mergesort). Besides, many are bad candidates that can be simply abandoned without sorting (which I believe you should already be using).

One final question: is it better to use a X3D CPU? It will be A few hundred MHz slower in frequency but larger Cache than non-X3D versions.

B.I.G
B.I.G
Joined: 26 Oct 07
Posts: 111
Credit: 987257296
RAC: 875600

Thank you for these quesions

Thank you for these quesions I've been asking some of them myselfes and already suggested if it's possible to include more CPU cores into the GPU crunching.

However since you wrote about laptop. I am crunching on a very modern Desktop with the 7800 X3D as a CPU. And on a HP Z-Book Fury which has really good cooling capacity for a notebook. While the laptop get's more credits with the GW App than with the BRP7 one, I realised that it's on its thermal limit. There is no point in getting it more efficient as laptops in general can't handle the thermals of both GPU and CPU being at 100%.

And that is with an additional big 30cm Noctua fan blowing cold air onto the bottom of the laptop.

 

With the desktop it's a different story, I have no comparision to non X3D CPUs, but while the gap in which the GPU has less to do is still there and not insignifican, its much smaller than on the notebook. And with the new tasks ending with a -2 using less VRAM so by running 3 at a time I managed to get + 60k credits a day compared to running them 2x

 

I don't know where the bottelnek is but I noticed the more modern the hardware the much faster the load times and. My HP laptop is from 2021, it takes about 10 seconds to load the VRAM, my desktop system I bought this year and load times are on instant.

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 216
Credit: 8461777586
RAC: 2377857

seewo wrote: First, on my

seewo wrote:

First, on my laptop it takes about 1min to load the data into the VRAM. That's about 100MB/s. What is the bottleneck, CPU, PCIe bandwidth or SSD bandwidth? If the latter, I should consider buying a better one.

 

If willing, show your computer(s) in your account so the actual specs can be seen. It could be the CPUs speed (thermal throttle is very real with laptops and this type of work), or it could be the memory bandwidth and/or speed of the GPU. It is probably not PCIe bandwidth or SSD bandwidth. 

seewo
seewo
Joined: 7 Oct 22
Posts: 14
Credit: 103362746
RAC: 268153

B.I.G wrote: Thank you for

B.I.G wrote:

Thank you for these quesions I've been asking some of them myselfes and already suggested if it's possible to include more CPU cores into the GPU crunching.

However since you wrote about laptop. I am crunching on a very modern Desktop with the 7800 X3D as a CPU. And on a HP Z-Book Fury which has really good cooling capacity for a notebook. While the laptop get's more credits with the GW App than with the BRP7 one, I realised that it's on its thermal limit. There is no point in getting it more efficient as laptops in general can't handle the thermals of both GPU and CPU being at 100%.

And that is with an additional big 30cm Noctua fan blowing cold air onto the bottom of the laptop.

 

With the desktop it's a different story, I have no comparision to non X3D CPUs, but while the gap in which the GPU has less to do is still there and not insignifican, its much smaller than on the notebook. And with the new tasks ending with a -2 using less VRAM so by running 3 at a time I managed to get + 60k credits a day compared to running them 2x

 

I don't know where the bottelnek is but I noticed the more modern the hardware the much faster the load times and. My HP laptop is from 2021, it takes about 10 seconds to load the VRAM, my desktop system I bought this year and load times are on instant.

Well, my laptop is 13500H+4060 Laptop. It's cold here in winter and I encounter no thermal limits. The GPU is even below 65 ℃ when working on Einstein with 3000rpm fan. You see that it's not even reaching 50W when hitting the voltage limit even when I overclocked 237MHz. It's way too much easy for a laptop.

In power config I set PCIe settings to maximum battery life(minimum performance). I wonder whether that's relevant. In GPU-Z the bus interface load is always low. I don't know what that exactly means but I guess the PCIe lane is not the bottleneck in loading anyway. I still quess the SSD may be the problem, the single thread random read is only 64MB/s.

When you say "but while the gap in which the GPU has less to do is still there and not insignifican, its much smaller than on the notebook", do you take CPU frequency into account? Anyway I'm gonna ask the other projects and then decide.

seewo
seewo
Joined: 7 Oct 22
Posts: 14
Credit: 103362746
RAC: 268153

Boca Raton Community HS

Boca Raton Community HS wrote:

seewo wrote:

First, on my laptop it takes about 1min to load the data into the VRAM. That's about 100MB/s. What is the bottleneck, CPU, PCIe bandwidth or SSD bandwidth? If the latter, I should consider buying a better one.

 

If willing, show your computer(s) in your account so the actual specs can be seen. It could be the CPUs speed (thermal throttle is very real with laptops and this type of work), or it could be the memory bandwidth and/or speed of the GPU. It is probably not PCIe bandwidth or SSD bandwidth. 

I don't think it's either thormal limit or memory bandwidth. The CPU is 13500H on a fixed 3GHz. That will make it about 75℃. CPU speed may slower the postprocessing but I don't think it matters a lot in loading the data. The disk seq read is 2000MB/s but rand read is only 65MB/s. If the loading is seq read, it is not the problem. On memory, the CPU memory is 16Gx2 DDR4 3200, the VRAM is 8GB GDDR6 set to 1875 MHz(7500MT/s  perhaps?).

B.I.G
B.I.G
Joined: 26 Oct 07
Posts: 111
Credit: 987257296
RAC: 875600

seewo wrote: In power config

seewo wrote:

In power config I set PCIe settings to maximum battery life(minimum performance). I wonder whether that's relevant.

Yes, that is relevant, set it to maximum performance and see if there is a difference.

seewo wrote:

When you say "but while the gap in which the GPU has less to do is still there and not insignifican, its much smaller than on the notebook", do you take CPU frequency into account? Anyway I'm gonna ask the other projects and then decide.

CPU Frequency alone doesn't explain it but a CPU is much more than it's frequency. The notebook with an Intel 10750H boosts up to 4Ghz one a single core, the desktop is running at 4.8 Ghz and because of better cooling I use 7 of the 8 cores. But there is no doubt that Intel chips stand no chance against AMD chips currently. Even back then the 10750 stood no chance against it's AMD counterpart. And maybe your i5 chip lacks features that the i7 has that are important.

Also RAM speed is significantly higher on the desktop (4800mhz vs 2667 mhz). In addition my observation outside of Einstein@home, for my daily work, is that latency can have a huge impact on system performance, And maybe there also is a difference between NVIDIA and AMD.

In theory your notebook should be much more potent than mine. What is the run time of a single task on your notebook?

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 216
Credit: 8461777586
RAC: 2377857

seewo wrote: Boca Raton

seewo wrote:

Boca Raton Community HS wrote:

seewo wrote:

First, on my laptop it takes about 1min to load the data into the VRAM. That's about 100MB/s. What is the bottleneck, CPU, PCIe bandwidth or SSD bandwidth? If the latter, I should consider buying a better one.

 

If willing, show your computer(s) in your account so the actual specs can be seen. It could be the CPUs speed (thermal throttle is very real with laptops and this type of work), or it could be the memory bandwidth and/or speed of the GPU. It is probably not PCIe bandwidth or SSD bandwidth. 

I don't think it's either thormal limit or memory bandwidth. The CPU is 13500H on a fixed 3GHz. That will make it about 75℃. CPU speed may slower the postprocessing but I don't think it matters a lot in loading the data. The disk seq read is 2000MB/s but rand read is only 65MB/s. If the loading is seq read, it is not the problem. On memory, the CPU memory is 16Gx2 DDR4 3200, the VRAM is 8GB GDDR6 set to 1875 MHz(7500MT/s  perhaps?).

 

I still do not think it would be bottlenecked at the disk. What about the GPU memory bandwidth? That is actually what I was referring to. Also, are you running other tasks at the same time (for E@H or other projects)?

seewo
seewo
Joined: 7 Oct 22
Posts: 14
Credit: 103362746
RAC: 268153

Boca Raton Community HS

Boca Raton Community HS wrote:

I still do not think it would be bottlenecked at the disk. What about the GPU memory bandwidth? That is actually what I was referring to. Also, are you running other tasks at the same time (for E@H or other projects)?

I only run 1 project at a time. The memory bandwidth is 240GB/s.

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 216
Credit: 8461777586
RAC: 2377857

seewo wrote:Boca Raton

seewo wrote:

Boca Raton Community HS wrote:

I still do not think it would be bottlenecked at the disk. What about the GPU memory bandwidth? That is actually what I was referring to. Also, are you running other tasks at the same time (for E@H or other projects)?

I only run 1 project at a time. The memory bandwidth is 240GB/s.

 

That VRAM bandwidth might be the issue. That is just my suggestion though. 

B.I.G
B.I.G
Joined: 26 Oct 07
Posts: 111
Credit: 987257296
RAC: 875600

Boca Raton Community HS

Boca Raton Community HS wrote:

That VRAM bandwidth might be the issue. That is just my suggestion though. 

I doubt that, my W5500m has a memory bandwith of 224GB/s yet it loads into the VRAM much faster.

I can't imagine a 4060 performing worse than a W5500.

Such things can be a pain to figgure out. I wonder if it has to do with the i5 CPU maybe performing worse with certain calculations?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.