Hi,I have been running my MHD code, which uses P++, on kraken, the new Cray XT5 machine at NICS. I am getting a really poor scaling results beyond 128 processors. I could give the actual performance numbers, but the bottom line is that I can definitely see that it is P++ array operations that slow the code down. Surprisingly, even those operations that should not require any communication at all (for instance, X = 0., where X is a distributed array) take much time. I can see that by turning off all P++ functions and then turning them on one-by-one and seeing how they affect the wall-clock time per time step. To try to remedy the problem, I have grabbed Bill's copy function from ParallelUtility.C and used it to do any assignment operations like the one above. Doing so eliminates the problem completely.
Is this a behavior one expects from P++ when going to large core counts or is it, perhaps, indicative of a corrupted P++ installation that we've implemented on the machine? I heard that P++ had a hardwired limit on the number of processors so I checked it in our installation and its 1024, so it should not be a problem. If it is really a problem with the P++ code itself, how is this problem solved in the parallel Overture?
Thanks very much in advance, Slava Merkin --------------------------------------------------------------- Viacheslav Merkin --------------------------------------------------------------- Senior Research Associate Astronomy Department and Center for Integrated Space Weather Modeling Boston University e-mail: vgm at bu.edu phone: (617) 358-3441 fax: (617) 358-3242 ---------------------------------------------------------------