I think that the speedup offered by pythran in this simple case, is
mostly due to avoiding unnecessary copies.
But this is very typical of ML graphs, where you have layers upon layers
of computations, each taking the output of the previous one...
J.
On 3/22/19 12:26 PM, Jean Laroche wrote:
Well, I still tried the pythran way, and from my point of view, it's still interesting.
I was able to compile one of the models derived from tensorflow after a few modifications, and this is my results:
Pythran
444 ms ± 9.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Python:
717 ms ± 6.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I don't know why but on my mac, using -march=native makes it slower. Using -DUSE_XSIMD -march=native results in a compiler error
and -fopenmp results in clang: error: unsupported option '-fopenmp'
But I'm already seeing a benefit from using pythran, relative to using python...
Jean
On 3/22/19 8:14 AM, Mehdi AMINI wrote:
On Fri, Mar 22, 2019 at 2:55 AM Serge Guelton <serge.guelton@xxxxxxxxxxxxxxxxxxx <mailto:serge.guelton@xxxxxxxxxxxxxxxxxxx>> wrote:
On Thu, Mar 21, 2019 at 08:56:23AM -0700, Mehdi AMINI wrote:
>
>
> On Wed, Mar 20, 2019 at 11:29 AM Jean Laroche <ripngo@xxxxxxxxx
<mailto:ripngo@xxxxxxxxx>> wrote:
>
> I use tensorflow a lot for training models for machine
learning, that
> part works really well.
> What's more of a pain in the butt is when it comes to
deploying your
> models for inference (i.e., no longer train, but use the
models to
> detect/classify etc), especially if you're trying to get a small
> footprint and fast execution
>
> Tensorflow has a few ways of doing that:
> 1) Keep using the models in python using the tensorflow module.
> 2) Use the serving mechanism offered by tensorflow, this
creates a web
> server which you query by sending your input features and
getting the
> output of the model back.
https://www.tensorflow.org/tfx/guide/serving
> 3) Use Tensorflow lite which target mobile deployment
(android and ios)
> https://www.tensorflow.org/lite/guide
> 4) Use the tensorflow C++ API
(https://www.tensorflow.org/guide/extend/cc)
>
> None of these are satisfactory to me:
> in 1) you must deploy the enormous tensorflow module with
your app.
> in 2) you're relying on a local server, which isn't great for
me either.
> in 3) you're dead in the water if you're not android or ios
> 4) should work but from all accounts it's a bit of a pain to
get to
> work. In particular the C++ API relies on libraries that must be
> compiled using Bazel and the final footprint is not small at all.
>
>
> Have you looked into tfcompile?
> It generate a self-contained binary that does not depend on the
TensorFlow
> runtime.
> The interface is C++, but it may be possible to improve this to
get a Python
> interface to the generated module.
Interesting. I wonder if they have some cross-call optimizations or not?
It is using XLA, so it depends on the XLA target, but in general yes they do.
Either way, as showcased in this post:
http://serge-sans-paille.github.io/pythran-stories/an-incursion-into-basic-ml-gradient-descent-compiled-with-pythran.html
The only current gain of using pythran when chaining tf calls is
likely to be vectorization (which is nice), plus a few temporary
array removal.
That being said, tehre's probably optimization opportunities there,
Jean can you share the Python script generated by your hack?