[qudi-dev] Re: Managing saved measurement data

From: Sami Koho <sami.koho@xxxxxxxxx>
To: qudi-dev@xxxxxxxxxxxxx
Date: Tue, 5 May 2020 13:01:57 +0200

Hi everyone,

Just a thought: it might make sense to save the data in HDF5 files. That way
you would get a single file for all the data in an experiment. Also you can add
the specific metadata that you are talking about, as “attributes” to each
dataset/collection etc. The good thing with HDF is that it can support datasets
of arbitrary size, as basically you only load into memory the section of data
that you are currently working on — hence you do not get in trouble with the
memory.

Best,

S

On 5. May 2020, at 12.31, Dr. Kay Jahnke <kay.jahnke@xxxxxxxxxxxxxxxxx> wrote:

Hi Alrik,

thanks for your reply.

The GUI module you are talking about, would essentially be the GUI to your
own save logic. This is because the GUI should have no functionality by
itself and therefore you will need an additional logic module anyways, that
defines your "required" additional parameters and handles them. This could
for example be done as an Interfuse onto the existing save logic.
Nevertheless, any parameters you define will be quite specific to your
experiment, so the logic will only be valid for you. E.g. you might work with
single diamond sample and single centers while others might work with
different materials or diamond batches and look for regions defined by
position and size, just to take one example. Therefore I would not define a
general parameter framework, that everybody has to use, because qudi is quite
flexible in its use.

Your analysis notebooks we could put into the qudi code e.g. in the folder
notebooks. Your code would then need to be able to at least analyse data
created by the default config generating dummy data. This way other people
can test it and then adapt it for their own purposes. Also someone (Dan
maybe) should please review the code and make sure that it is understandable,
has a good structure and works for them. I would leave it at a notebook level
and not include the analysis scripts into qudi itself, as this is again very
specific to the user.

These are just my thoughts which I discussed a bit with Jan and Niko, but
maybe there are better ideas out there.

Cheers,
Kay

On 04.05.20 19:55, Alrik Durand wrote:

Hi,

We have been using Qudi on most of our setup in our lab in Montpellier
(France) for about a year.
We are using additional parameters a lot via notebook scripts, and it works
well for us. One problem that I have seen with it comes from changes in the
names of the parameters over time. Maybe the typical fields using by people
(ex: 'sample') could be fixed by a new GUI that would interact with
savelogic.

I personally like that fact Qudi use a centralized saving scheme, it helps
when you are trying to explore data from someone else or share codes for
analysis.

Concerning the analysis, we also developed some basic tools to load data
files in Pandas Dataframes, and some other tools to do some very general
things with the pandas objects. Maybe such code could be shared as part of
the Qudi project some way, I know for example we have a method that look at
the parameters before parsing the files, something nice in the "Load all"
strategy.

In the end, all this does not prevent the use of a notebook. We are
experimenting with digital notebook in markdown. This has a few advantages
and is quite good looking with the editor Typora.

I would love feedback on the two mentioned idea, as we have some free time
lately...

Best,
Alrik Durand
PhD student @L2C Montpellier

Le lun. 4 mai 2020 à 19:20, Dr. Kay Jahnke <kay.jahnke@xxxxxxxxxxxxxxxxx
<mailto:kay.jahnke@xxxxxxxxxxxxxxxxx>> a écrit :
Hi Dan,

you are correct, the current save logic only saves hierarchical for dates
and then the modules the saved data was created it.
It was designed that way because qudi is not dictating what modules there
are, which parameters they have or which and how the data is saved. So going
by date was the best thing to get any structure.
There are some ways, that can enhance this structure and give you a way to
access your data more conveniently: The current save_logic has
additional_parameters
(https://github.com/Ulm-IQO/qudi/blob/master/logic/save_logic.py#L633 ;
<https://github.com/Ulm-IQO/qudi/blob/master/logic/save_logic.py#L633>).
These are global and can be set from any module or even from a notebook
script. So here you could save sample, center or any other parameters you
like, into the data files. You should best not save parameters in file
names, as this can lead to massive problems afterwards.

Some of my colleagues then query the whole data directory automatically with
a script and load ALL the data files. These data files can then be pushed
into for example a "pandas" data set and this let's you query the specific
parameters per experiment.
The disadvantage is that the querying of all the data takes a while and you
might run out of RAM, as the data might get big (just imagine you want to
look at all the data from one PhD). Also in principle this has nothing
really to do with qudi, because at this point you are writing you analysis
scripts and should not need the qudi core functionality.

Therefore the much more elegant way of solving your problem would be to
write your own save-logic. The current module is just the default qudi logic
module and can be replaced at any time. You will just need to support the
same functions, but you can freely change what happens in the background.
For your case it would probably be best if you set up a (elastic) database
and write a new save-logic that connects to that database and safes the data
in a clever way. A word of caution: Be sure to put a lot of though into the
design of the database beforehand and define explicitly what you want to
save and how (e.g. which parameters, which modules, pictures or only data
sets, dimensionality of the data).
I tried to write a save-logic for a database connection once, but the users
could not agree on any standardized parameters and structure. So the
database became maximally flexible and therefore was extremely complicated
to query. It never got used productively. Therefore, define what you want
beforehand, put thought in it and then keep to your structure.

And finally, if something good comes out, other people might also want to
use something similar. So please than open a Pull Request and commit back to
Upstream. This also has the advantage that other people might fix bugs for
you or enhance the project.

An additional general remark: It is always very useful to keep some kind of
notebook, even if the data is saved automatically. We had very good
experience by going fully digital on the lab notebooks and using a wiki
system for that (https://www.dokuwiki.org/dokuwiki ;
<https://www.dokuwiki.org/dokuwiki>). The most used plug-in it turns out is
than the one that let's you paste screenshots directly into the wiki page.

If there are more questions, please don't hesitate to ask.

Cheers,
Kay

Am 04.05.2020 um 17:59 schrieb Dan Yudilevich:

Hi everyone,

We are a new group out of the Weizmann Institute (Israel), and we are
slowly but surely getting to know qudi, with most of the important features
running smoothly. The software is impressive, so kudos to the developers.

One thing I am struggling with is data management. Although we are only
beginning to acquire data, I already feel it is getting quite cluttered.
The apparent organization hierarchy of date/modules makes it challenging to
trace specific data later on. I would like, for example, a convenient way
to find data related to a specific sample (or defect); data from specific
types of pulse sequences, etc.

By managing a lab notebook I can, of course, refer to the specific dates
and files, but I feel it somewhat defeats the purpose.

So, I wanted to ask if someone has any recommendation –

Does anyone have a particularly elegant way of organizing the information?

Am I missing something in the save logic, so that I’m under-utilizing this
feature?

Thank you all, and stay healthy,

Dan Yudilevich

Finkler Group | Dept. of Chemical and Biological Physics

Weizmann Institute of Science

--
Dr. Kay Daniel Jahnke

Küfergasse 1
89073 Ulm
[T] +49 176 444 346 51
[@] kay.jahnke@xxxxxxxxxxxxxxxxx <mailto:kay.jahnke@xxxxxxxxxxxxxxxxx>

--
Dr. Kay Daniel Jahnke

Küfergasse 1
89073 Ulm
[T] +49 176 444 346 51
[@] kay.jahnke@xxxxxxxxxxxxxxxxx <mailto:kay.jahnke@xxxxxxxxxxxxxxxxx>

Follow-Ups:
- [qudi-dev] Re: Managing saved measurement data
  - From: Dr. Kay Jahnke

References:
- [qudi-dev] Managing saved measurement data
  - From: Dan Yudilevich
- [qudi-dev] Re: Managing saved measurement data
  - From: Dr. Kay Jahnke
- [qudi-dev] Re: Managing saved measurement data
  - From: Alrik Durand
- [qudi-dev] Re: Managing saved measurement data
  - From: Dr. Kay Jahnke

[qudi-dev] Re: Managing saved measurement data

Other related posts: