[pskmail] Re: Testing of faster modes

  • From: "remi.chateauneu@xxxxxxxxx" <remi.chateauneu@xxxxxxxxx>
  • To: pskmail@xxxxxxxxxxxxx
  • Date: Tue, 28 Feb 2012 04:51:20 +0000

Hi John;

Funny because the Navtex modem is translated from Java to C++ :). About filters, I just noticed this, but not sure whether this is applicable to Java:

This is slow:
659,570,688  src/./include/filters.h:C_FIR___filter::run(complex&, complex&)

Considering this:
http://stackoverflow.com/questions/6106295/optimizing-array-loop-in-c

Here is the result: For filters with less than 10 components, the speedup is small, but significant for bigger vectors (About twice faster for 30 elements)

File src/include/filters.h:

inline double mac(const double *a, const double *b, unsigned int size) {
        double sum = 0.0;
#ifdef PREVIOUS_mac
        for (unsigned int i = 0; i < size; i++)
                sum += (*a++) * (*b++);
        return sum;
#else
        double sum2 = 0.0;
        double sum3 = 0.0;
        double sum4 = 0.0;
        /// This reduces dependency.
        for (; size > 3; size -= 4, a += 4, b+=4)
        {
                sum  += a[0] * b[0];
                sum2 += a[1] * b[1];
                sum3 += a[2] * b[2];
                sum4 += a[3] * b[3];
        }
        for (; size; --size)
                sum += (*a++) * (*b++);
        return sum + sum2 + sum3 + sum4 ;
#endif
}



Le 27.02.2012 23:07, John Douyere a écrit :
Hello Remi,

Yes I had to do a fair bit if optimisations to fit all the modes into a
phone's CPU's capabilities.

For information Rein and I (and another OM before us) translated some of
the Fldigi modems into Java.

I had two options when developing on Android: Java code or Native
development in C++ with hooks back and forth between the Java part and
the C++ part.

Development time wise the C++ path was probably the fastest but then
each version of CPU needs to have it's compiled version and I didn't
want to be in that position spending a lot of my time playing catch-up
with the latest device/CPU version.

Also I knew that the JIT compiler in Java was reported to bring good
speed improvements over the raw code as it dynamically optimises the
code based on real-time CPU load. And it certainly did bring well over 2
x speed improvements on the same device for these CPU intensive tasks
like the modems processing, especially from Android version 2.3 onwards.

So in the end I will stay with the Java overhead especially that new
dual and quad core CPUs found in these devices are now very capable.

But at the same time the new modes I am working on will most likely
bring extra CPU load as well...so it is a never ending story..hihi

In that regard the speed improvements you are talking about are always
welcomed. It is not a major task to translate them back from C++ to Java.

On the Android version there is a basic profiler and the results are
that most of the processing is done in the FIR filters run for PSK
modes. For the PSKR modes it is first the (de)interleaver, then in the
FIR filters.

So any improvement on the FIR filters run would be a plus. In the "slow
CPU" option I have in the software, I reduced the number of taps of the
FIR filter by half among other things. That produces a slightly larger
passband but it is not really an issue for Pskmail in practice (not like
when using PSK31 in a crowded band).

I also have made the waterfall a temporary feature in the sense that it
has to be called on and then it disappears when moving away from the
modem screen in order to save processing power.

I also reused the FFT processing of the RSID RX modem for the waterfall
so that I don't double up on FFTs (we always have RSID RX ON in the
Pskmail client).

The other point also is that there are no pointers in Java which means
that array processing is slower (unless maybe the JIT compiler takes
care of this which would explain some of the processing gains).

So I would welcome any ideas for speeding up the FIR filters and the
interleavers used in the PSKR modes.

Thanks.

73, John

On Tue, Feb 28, 2012 at 9:00 AM, remi.chateauneu@xxxxxxxxx
<mailto:remi.chateauneu@xxxxxxxxx> <remi.chateauneu@xxxxxxxxx
<mailto:remi.chateauneu@xxxxxxxxx>> wrote:

    Hi John,

    Thanks for the answer.
    This question because there are a couple of speedups (i.e. lower
    power for the same task) which are possible in fldigi. There are not
    really worth for a desktop but might make a difference for a
    portable computer.

    Here are the hungriest function of a run I am working on at the
    moment. I removed everything related to my specific modem (The first
    column is the number of non-cumulated calls):

      893,482,434
      src/waterfall/waterfall.cxx:__WFdisp::update_waterfall()
    [/home/rchateau/RadioAmateurs/__FlDigiMaster/fldigi/src/__fldigi]
      848,167,248  src/fft/fft.cxx:Cfft::cftmdl(__int, int, double*)
      659,570,688  src/./include/filters.h:C_FIR___filter::run(complex&,
    complex&)
      346,500,805  src/waterfall/waterfall.cxx:__WFdisp::processFFT()
      285,200,070  src/fft/fft.cxx:Cfft::cft1st(__int, double*)
      256,435,761  ???:memmove [/lib/i686/libc-2.7.so <http://libc-2.7.so>]
    ...etc...

    I believe there is room for improvement, apart from the Cfft
    function whih seem to be really well tuned (But compiler flags might
    help ?). Ideally we would need a profiler report (gprof, callgrind)
    of a typical AndPskmail execution. How often do you merge your
    version with the main branch ?

    Cheers



Other related posts: