[pythran] Re: Use pythran to deploy tensorflow models.

  • From: Jean Laroche <ripngo@xxxxxxxxx>
  • To: pythran@xxxxxxxxxxxxx
  • Date: Fri, 22 Mar 2019 13:13:18 -0700

I think that the speedup offered by pythran in this simple case, is mostly due to avoiding unnecessary copies.
But this is very typical of ML graphs, where you have layers upon layers of computations, each taking the output of the previous one...
J.

On 3/22/19 12:26 PM, Jean Laroche wrote:

Well, I still tried the pythran way, and from my point of view, it's still interesting.
I was able to compile one of the models derived from tensorflow after a few modifications, and this is my results:
Pythran
444 ms ± 9.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Python:
717 ms ± 6.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I don't know why but on my mac, using -march=native makes it slower. Using -DUSE_XSIMD -march=native results in a compiler error
and -fopenmp results in clang: error: unsupported option '-fopenmp'

But I'm already seeing a benefit from using pythran, relative to using python...

Jean

On 3/22/19 8:14 AM, Mehdi AMINI wrote:


On Fri, Mar 22, 2019 at 2:55 AM Serge Guelton <serge.guelton@xxxxxxxxxxxxxxxxxxx <mailto:serge.guelton@xxxxxxxxxxxxxxxxxxx>> wrote:

    On Thu, Mar 21, 2019 at 08:56:23AM -0700, Mehdi AMINI wrote:
     >
     >
     > On Wed, Mar 20, 2019 at 11:29 AM Jean Laroche <ripngo@xxxxxxxxx
    <mailto:ripngo@xxxxxxxxx>> wrote:
     >
     >     I use tensorflow a lot for training models for machine
    learning, that
     >     part works really well.
     >     What's more of a pain in the butt is when it comes to
    deploying your
     >     models for inference (i.e., no longer train, but use the
    models to
     >     detect/classify etc), especially if you're trying to get a small
     >     footprint and fast execution
     >
     >     Tensorflow has a few ways of doing that:
     >     1) Keep using the models in python using the tensorflow module.
     >     2) Use the serving mechanism offered by tensorflow, this
    creates a web
     >     server which you query by sending your input features and
    getting the
     >     output of the model back.
    https://www.tensorflow.org/tfx/guide/serving
     >     3) Use Tensorflow lite which target mobile deployment
    (android and ios)
     > https://www.tensorflow.org/lite/guide
     >     4) Use the tensorflow C++ API
    (https://www.tensorflow.org/guide/extend/cc)
     >
     >     None of these are satisfactory to me:
     >     in 1) you must deploy the enormous tensorflow module with
    your app.
     >     in 2) you're relying on a local server, which isn't great for
    me either.
     >     in 3) you're dead in the water if you're not android or ios
     >     4) should work but from all accounts it's a bit of a pain to
    get to
     >     work. In particular the C++ API relies on libraries that must be
     >     compiled using Bazel and the final footprint is not small at all.
     >
     >
     > Have you looked into tfcompile?
     > It generate a self-contained binary that does not depend on the
    TensorFlow
     > runtime.
     > The interface is C++, but it may be possible to improve this to
    get a Python
     > interface to the generated module.

    Interesting. I wonder if they have some cross-call optimizations or not?


It is using XLA, so it depends on the XLA target, but in general yes they do.


    Either way, as showcased in this post:

http://serge-sans-paille.github.io/pythran-stories/an-incursion-into-basic-ml-gradient-descent-compiled-with-pythran.html

    The only current gain of using pythran when chaining tf calls is
    likely to be vectorization (which is nice), plus a few temporary
    array removal.
    That being said, tehre's probably optimization opportunities there,
    Jean can you share the Python script generated by your hack?



Other related posts: