Hi John;Funny because the Navtex modem is translated from Java to C++ :). About filters, I just noticed this, but not sure whether this is applicable to Java:
This is slow: 659,570,688 src/./include/filters.h:C_FIR___filter::run(complex&, complex&) Considering this: http://stackoverflow.com/questions/6106295/optimizing-array-loop-in-cHere is the result: For filters with less than 10 components, the speedup is small, but significant for bigger vectors (About twice faster for 30 elements)
File src/include/filters.h: inline double mac(const double *a, const double *b, unsigned int size) { double sum = 0.0; #ifdef PREVIOUS_mac for (unsigned int i = 0; i < size; i++) sum += (*a++) * (*b++); return sum; #else double sum2 = 0.0; double sum3 = 0.0; double sum4 = 0.0; /// This reduces dependency. for (; size > 3; size -= 4, a += 4, b+=4) { sum += a[0] * b[0]; sum2 += a[1] * b[1]; sum3 += a[2] * b[2]; sum4 += a[3] * b[3]; } for (; size; --size) sum += (*a++) * (*b++); return sum + sum2 + sum3 + sum4 ; #endif } Le 27.02.2012 23:07, John Douyere a écrit :
Hello Remi, Yes I had to do a fair bit if optimisations to fit all the modes into a phone's CPU's capabilities. For information Rein and I (and another OM before us) translated some of the Fldigi modems into Java. I had two options when developing on Android: Java code or Native development in C++ with hooks back and forth between the Java part and the C++ part. Development time wise the C++ path was probably the fastest but then each version of CPU needs to have it's compiled version and I didn't want to be in that position spending a lot of my time playing catch-up with the latest device/CPU version. Also I knew that the JIT compiler in Java was reported to bring good speed improvements over the raw code as it dynamically optimises the code based on real-time CPU load. And it certainly did bring well over 2 x speed improvements on the same device for these CPU intensive tasks like the modems processing, especially from Android version 2.3 onwards. So in the end I will stay with the Java overhead especially that new dual and quad core CPUs found in these devices are now very capable. But at the same time the new modes I am working on will most likely bring extra CPU load as well...so it is a never ending story..hihi In that regard the speed improvements you are talking about are always welcomed. It is not a major task to translate them back from C++ to Java. On the Android version there is a basic profiler and the results are that most of the processing is done in the FIR filters run for PSK modes. For the PSKR modes it is first the (de)interleaver, then in the FIR filters. So any improvement on the FIR filters run would be a plus. In the "slow CPU" option I have in the software, I reduced the number of taps of the FIR filter by half among other things. That produces a slightly larger passband but it is not really an issue for Pskmail in practice (not like when using PSK31 in a crowded band). I also have made the waterfall a temporary feature in the sense that it has to be called on and then it disappears when moving away from the modem screen in order to save processing power. I also reused the FFT processing of the RSID RX modem for the waterfall so that I don't double up on FFTs (we always have RSID RX ON in the Pskmail client). The other point also is that there are no pointers in Java which means that array processing is slower (unless maybe the JIT compiler takes care of this which would explain some of the processing gains). So I would welcome any ideas for speeding up the FIR filters and the interleavers used in the PSKR modes. Thanks. 73, John On Tue, Feb 28, 2012 at 9:00 AM, remi.chateauneu@xxxxxxxxx <mailto:remi.chateauneu@xxxxxxxxx> <remi.chateauneu@xxxxxxxxx <mailto:remi.chateauneu@xxxxxxxxx>> wrote: Hi John, Thanks for the answer. This question because there are a couple of speedups (i.e. lower power for the same task) which are possible in fldigi. There are not really worth for a desktop but might make a difference for a portable computer. Here are the hungriest function of a run I am working on at the moment. I removed everything related to my specific modem (The first column is the number of non-cumulated calls): 893,482,434 src/waterfall/waterfall.cxx:__WFdisp::update_waterfall() [/home/rchateau/RadioAmateurs/__FlDigiMaster/fldigi/src/__fldigi] 848,167,248 src/fft/fft.cxx:Cfft::cftmdl(__int, int, double*) 659,570,688 src/./include/filters.h:C_FIR___filter::run(complex&, complex&) 346,500,805 src/waterfall/waterfall.cxx:__WFdisp::processFFT() 285,200,070 src/fft/fft.cxx:Cfft::cft1st(__int, double*) 256,435,761 ???:memmove [/lib/i686/libc-2.7.so <http://libc-2.7.so>] ...etc... I believe there is room for improvement, apart from the Cfft function whih seem to be really well tuned (But compiler flags might help ?). Ideally we would need a profiler report (gprof, callgrind) of a typical AndPskmail execution. How often do you merge your version with the main branch ? Cheers