[overture] P++ code scaling on a large processor count

  • From: Viacheslav Merkin <vgm@xxxxxx>
  • To: overture@xxxxxxxxxxxxx
  • Date: Fri, 1 May 2009 11:31:30 -0400

Hi,

I have been running my MHD code, which uses P++, on kraken, the new Cray XT5 machine at NICS. I am getting a really poor scaling results beyond 128 processors. I could give the actual performance numbers, but the bottom line is that I can definitely see that it is P++ array operations that slow the code down. Surprisingly, even those operations that should not require any communication at all (for instance, X = 0., where X is a distributed array) take much time. I can see that by turning off all P++ functions and then turning them on one-by-one and seeing how they affect the wall-clock time per time step. To try to remedy the problem, I have grabbed Bill's copy function from ParallelUtility.C and used it to do any assignment operations like the one above. Doing so eliminates the problem completely.

Is this a behavior one expects from P++ when going to large core counts or is it, perhaps, indicative of a corrupted P++ installation that we've implemented on the machine? I heard that P++ had a hardwired limit on the number of processors so I checked it in our installation and its 1024, so it should not be a problem. If it is really a problem with the P++ code itself, how is this problem solved in the parallel Overture?

Thanks very much in advance,
Slava Merkin


---------------------------------------------------------------
Viacheslav Merkin
---------------------------------------------------------------
Senior Research Associate
Astronomy Department and
Center for Integrated Space Weather Modeling
Boston University

e-mail: vgm at bu.edu
phone: (617) 358-3441
fax: (617) 358-3242
---------------------------------------------------------------






Other related posts: