[pythran] Use pythran to deploy tensorflow models.

  • From: Jean Laroche <ripngo@xxxxxxxxx>
  • To: pythran@xxxxxxxxxxxxx
  • Date: Wed, 20 Mar 2019 11:29:19 -0700

I use tensorflow a lot for training models for machine learning, that part works really well.
What's more of a pain in the butt is when it comes to deploying your models for inference (i.e., no longer train, but use the models to detect/classify etc), especially if you're trying to get a small footprint and fast execution

Tensorflow has a few ways of doing that:
1) Keep using the models in python using the tensorflow module.
2) Use the serving mechanism offered by tensorflow, this creates a web server which you query by sending your input features and getting the output of the model back. https://www.tensorflow.org/tfx/guide/serving
3) Use Tensorflow lite which target mobile deployment (android and ios) https://www.tensorflow.org/lite/guide
4) Use the tensorflow C++ API (https://www.tensorflow.org/guide/extend/cc)

None of these are satisfactory to me:
in 1) you must deploy the enormous tensorflow module with your app.
in 2) you're relying on a local server, which isn't great for me either.
in 3) you're dead in the water if you're not android or ios
4) should work but from all accounts it's a bit of a pain to get to work. In particular the C++ API relies on libraries that must be compiled using Bazel and the final footprint is not small at all.

My current solution is to use the python module tfdeploy (https://github.com/riga/tfdeploy) which translates a tensorflow graph into a purely numpy graph that no longer needs the tensorflow library to execute. This is similar to 1) but you avoid the huge tensorflow module deployment.
tfdeploy does not seem to be maintained any longer, but so far it's worked very well for me. It has a low footprint, is easy to use, but it's slow-ish, because it relies on numpy.


So my idea is as follows: how about modifying tfdeploy to analyze the tensorflow graph, convert it in a pure numpy execution graph (which it already does) then converting that graph into a flatten sequential series of calls to numpy operations, suitable for compilation by pythran.

The result:
- A very fast python implementation of your inference, with a very small footprint (thanks to pythran)
- OR: a pure C++ implementation that you can then integrate into your code (using the pythran -e option).

The difficulty is that tfdeploy is class-based and uses decorators heavily, and so not straightforward to feed to pythran.

It seems to me that between tfdeploy and fluidpythran (now transonic https://pypi.org/project/transonic/) we have everything we need...

So what do you think?
Doable?

Jean

Other related posts: