I use tensorflow a lot for training models for machine learning, that
part works really well.
What's more of a pain in the butt is when it comes to deploying your
models for inference (i.e., no longer train, but use the models to
detect/classify etc), especially if you're trying to get a small
footprint and fast execution
Tensorflow has a few ways of doing that:
1) Keep using the models in python using the tensorflow module.
2) Use the serving mechanism offered by tensorflow, this creates a web
server which you query by sending your input features and getting the
output of the model back. https://www.tensorflow.org/tfx/guide/serving
3) Use Tensorflow lite which target mobile deployment (android and ios)
https://www.tensorflow.org/lite/guide
4) Use the tensorflow C++ API (https://www.tensorflow.org/guide/extend/cc)
None of these are satisfactory to me:
in 1) you must deploy the enormous tensorflow module with your app.
in 2) you're relying on a local server, which isn't great for me either.
in 3) you're dead in the water if you're not android or ios
4) should work but from all accounts it's a bit of a pain to get to
work. In particular the C++ API relies on libraries that must be
compiled using Bazel and the final footprint is not small at all.
My current solution is to use the python module tfdeploy
(https://github.com/riga/tfdeploy) which translates a tensorflow graph
into a purely numpy graph that no longer needs the tensorflow library to
execute. This is similar to 1) but you avoid the huge tensorflow module
deployment.
tfdeploy does not seem to be maintained any longer, but so far it's
worked very well for me. It has a low footprint, is easy to use, but
it's slow-ish, because it relies on numpy.
So my idea is as follows: how about modifying tfdeploy to analyze the
tensorflow graph, convert it in a pure numpy execution graph (which it
already does) then converting that graph into a flatten sequential
series of calls to numpy operations, suitable for compilation by pythran.
The result:
- A very fast python implementation of your inference, with a very small
footprint (thanks to pythran)
- OR: a pure C++ implementation that you can then integrate into your
code (using the pythran -e option).
The difficulty is that tfdeploy is class-based and uses decorators
heavily, and so not straightforward to feed to pythran.
It seems to me that between tfdeploy and fluidpythran (now transonic
https://pypi.org/project/transonic/) we have everything we need...
So what do you think?
Doable?
Jean