[mysql-dde] Re: Documentation of handler changes (fwd)

  • From: "Peter B. Volk" <PeterB.Volk@xxxxxxx>
  • To: <mysql-dde@xxxxxxxxxxxxx>
  • Date: Thu, 15 Jun 2006 18:42:21 +0200

Hey,

Well once again a change. I love distributed code changes.....We've been
working with the 5.2 tree. So we have ahde changes the past 2 weeks anyway.

Peter


----- Original Message ----- 
From: "Lenz Grimmer" <lenz@xxxxxxxxx>
To: <mysql-dde@xxxxxxxxxxxxx>
Sent: Thursday, June 15, 2006 2:56 PM
Subject: [mysql-dde] Documentation of handler changes (fwd)


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Guys,
>
> Monty just recently pushed several changes to the MySQL 5.1 handler
> interface - this might be relevant for your work...
>
> Bye,
> LenZ
> - -- 
>  Lenz Grimmer <lenz@xxxxxxxxx>
>  Community Relations Manager, EMEA
>  MySQL GmbH, http://www.mysql.de/, Hamburg, Germany
>  Visit the MySQL Forge at http://forge.mysql.com/
>
> - ---------- Forwarded message ----------
> Date: Wed, 7 Jun 2006 02:08:53 +0200
> From: Michael Widenius <monty@xxxxxxxxx>
> To:  <dev-private@xxxxxxxxx>
> Subject: Documentation of handler changes
>
>
> Hi!
>
> The following only have to be read by people interested in the handler
> interface.  All changes are pushed in the 5.1 tree and will be
> availabe as soon we have a working 5.1 tree again (should be tomorrow
> evening, EET time)
>
> - -------------
>
> We have now done some critical cleanups in the 5.1 handler interface
> to fix some of the most critical problems that has escalated during
> 5.1 development. The idea is to get fix some burning issues to allow
> us to have a reasonable stable interface until at least 5.2 or 5.3.
>
> All handlers included in the MySQL 5.1 source have been updated
> to work (but not necessary to take advantage of all benefits of the
> new interfaces).
>
> We plan to establish some some communication channels so that all
> storage engine developers can participate and influence future handler
> interface changes to make this process smoother and less unexpected in
> the future. More about this in the near future
>
> The changes that has been done lately has largely been done in 4
> different set of changsets. (This document intend to explain all of
> them):
>
> - - Purge handler extension, to allow MySQL to deal with any 'not
>   anymore needed' log files that the handler generates. (Automatic
>   removing, archiving etc). Done by Oleksandr Byelkin.
> - - New easier to use handlerton interface (to get rid of compiler
>   warnings every time the handlerton interface changes). Done by
>   Sergei Golubchik.
> - - New auto_increment interface, to allow the handler in more detail
>   inform MySQL how the auto_increments are generated (for statement
>   based logging).  With this the handler can generate several
>   ranges of auto_increment keys for a statement, instead of one
>   sequence that was the case before, and still be able to use
>   statement based replication. The new auto_increment interface
>   will also provide hints to the engine of how many auto_increment
>   values it expect has to be generated. Done by Guilhem Bichot.
> - - Some overall handler cleanups to remove things that should have
>   been fixed a long time ago. Done by Michael Widenius.
>
>
> The following document will describe all of the above changes in more
detail
>
> 1) Purge handler (Note: You can ignore this if your handler doesn't
>    need extern log rotation)
>
> - - With the purge handler, we have introduced a 'create_iterator'
>   handlerton call that can be used to create different kind of
>   iterators. For now, we have only created on iterator of type
>   HA_TRANSACTLOG_ITERATOR, that can be used to get information of
>   all logs used by the handler.
> - - The function create_iterator should fill in slots in an iterator
>   object that has function pointers to 'next' and 'destroy' methods.
>   The 'next method' needs to fill in data in a struct, that is
>   depending on the iterator type.
> - - Example of iterator usage, can be found in
>   handler.cc::example_of_iterator_using_for_logs_cleanup() and
>   handler.cc::fl_create_iterator()
> - - We also have a new function, handler.cc::signal_log_not_needed()
>   that a handler should call if wants to inform MySQL that a log
>   file is not in use anymore (so that MySQL can decide if it should
>   delete or archive it).
>
>
> 2) handlerton interface:
>
> - - st_mysql_plugin does not store a pointer to handlerton directly,
>   but to a st_mysql_storage_engine structure, which in turn has a
>   pointer to the handlerton.
> - - handlerton does not longer have MYSQL_HANDLERTON_INTERFACE_VERSION,
>   name, comment, and init() fields. The version is in
>   st_mysql_storage_engine, the other three - only in st_mysql_plugin.
> - - handlerton now _can_ (but not must) be initialized in the plugin init
>   function. Doing it this way will avoid compiler warnings that
>  "statical initializer has less values than the structure has fields".
> - - handlerton init() function was used to return 1 (failure) is storage
>   engine was disabled from command line. plugin init() function should
>   return 0 (success) in this case.
>
>
> 3) New auto_increment interface (WL#3146)
>
> - - The new auto_increment interface is now:
>
>   virtual void get_auto_increment(ulonglong offset, ulonglong increment,
>                                   ulonglong nb_desired_values,
>                                   ulonglong *first_value,
>                                   ulonglong *nb_reserved_values);
>   virtual void release_auto_increment() { return; };
>   virtual void restore_auto_increment();            (old)
>   virtual int reset_auto_increment(ulonglong value) (old)
>
> - - get_auto_increment() is used to tell engine that it should reserve
>   a set of values for the table. The handler is free to reserve a
>   smaller amount of values than what is requested.
>
>   Here follows the documentation for get_auto_increment:
>
>    SYNOPSIS
>     get_auto_increment()
>     offset
>     increment
>     nb_desired_values   how many values we want
>     first_value         (OUT) the first value reserved by the handler
>     nb_reserved_values  (OUT) how many values the handler reserved
>
>   offset and increment means that we want values to be of the form
>   offset + N * increment, where N>=0 is integer.
>   If the function sets *first_value to ~(ulonglong)0 it means an error.
>   If the function sets *nb_reserved_values to ULONGLONG_MAX it means it
has
>   reserved to "positive infinite".
>
> - - release_auto_increment() is called to tell engine that we are done
>   with query and it's free to release any not used auto_increment
>   values for other usage (if it can do that).
>
> NOTE: The handler interface for auto_increment is fully implemented in
>       5.1, but we have not yet updated the statement logging code to
>       log a set of intervals. This will be done shortly (in good time
>       before 5.1 is released as release candidate).
>
>
> 4) Overall cleanup
>
> Changes done that requires code changes in code of other storage engines:
> (Note that all changes are very straightforward and one should find all
issues
> by compiling a --debug build and fixing all compiler errors and all
> asserts in field.cc while running the test suite),
>
> - - New optional handler function introduced: reset()
>   This is called after every DML statement to make it easy for a handler
to
>   statement specific cleanups.
>   (The only case it's not called is if force the file to be closed)
>
> - - handler::extra(HA_EXTRA_RESET) is removed. Code that was there before
>   should be moved to handler::reset()
>
> - - table->read_set contains a bitmap over all columns that are needed
>   in the query.  read_row() and similar functions only needs to read these
>   columns
> - - table->write_set contains a bitmap over all columns that will be
updated
>   in the query. write_row() and update_row() only needs to update these
>   columns.
>   The above bitmaps should now be up to date in all context
>   (including ALTER TABLE, filesort()).
>
>   The handler is informed of any changes to the bitmap after
>   fix_fields() by calling the virtual function
>   handler::column_bitmaps_signal(). If the handler does caching of
>   these bitmaps (instead of using table->read_set, table->write_set),
>   it should redo the caching in this code. as the signal() may be sent
>   several times, it's probably best to set a variable in the signal
>   and redo the caching on read_row() / write_row() if the variable was
>   set.
>
> - - Removed the read_set and write_set bitmap objects from the handler
class
>
> - - Removed all column bit handling functions from the handler class.
>   (Now one instead uses the normal bitmap functions in my_bitmap.c instead
>   of handler dedicated bitmap functions)
>
> - - field->query_id is removed. One should instead instead check
>   table->read_set and table->write_set if a field is used in the query.
>
> - - handler::extra(HA_EXTRA_RETRIVE_ALL_COLS) and
>   handler::extra(HA_EXTRA_RETRIEVE_PRIMARY_KEY) are removed. One should
now
>   instead use table->read_set to check for which columns to retrieve.
>
> - - If a handler needs to call Field->val() or Field->store() on columns
>   that are not used in the query, one should install a temporary
>   all-columns-used map while doing so. For this, we provide the following
>   functions:
>
>   my_bitmap_map *old_map= dbug_tmp_use_all_columns(table,
table->read_set);
>   field->val();
>   dbug_tmp_restore_column_map(table->read_set, old_map);
>
>   and similar for the write map:
>
>   my_bitmap_map *old_map= dbug_tmp_use_all_columns(table,
table->write_set);
>   field->val();
>   dbug_tmp_restore_column_map(table->write_set, old_map);
>
>   If this is not done, you will sooner or later hit a DBUG_ASSERT
>   in the field store() / val() functions.
>   (For not DBUG binaries, the dbug_tmp_restore_column_map() and
>   dbug_tmp_restore_column_map() are inline dummy functions and should
>   be optimized away be the compiler).
>
> - - If one needs to temporary set the column map for all binaries (and not
>   just to avoid the DBUG_ASSERT() in the Field::store() / Field::val()
>   methods) one should use the functions tmp_use_all_columns() and
>   tmp_restore_column_map() instead of the above dbug_ variants.
>
> - - All 'status' fields in the handler base class (like records,
>   data_file_length etc) are now stored in a 'stats' struct. This makes
>   it easier to know what status variables are provided by the base
>   handler.  This requires some trivial variable names in the extra()
>   function.
>
> - - New virtual function handler::records().  This is called to optimize
>   COUNT(*) if (handler::table_flags() & HA_HAS_RECORDS()) is true.
>   (stats.records is not supposed to be an exact value. It's only has to
>   be 'reasonable enough' for the optimizer to be able to choose a good
>   optimization path).
>
> - - Non virtual handler::init() function added for caching of virtual
>   constants from engine.
>
> - - Removed has_transactions() virtual method. Now one should instead
return
>   HA_NO_TRANSACTIONS in table_flags() if the table handler DOES NOT
support
>   transactions.
>
> - - The 'xxxx_create_handler()' function now has a MEM_ROOT_root argument
>   that is to be used with 'new handler_name()' to allocate the handler
>   in the right area.  The xxxx_create_handler() function is also
>   responsible for any initialization of the object before returning.
>
>   For example, one should change:
>
>   static handler *myisam_create_handler(TABLE_SHARE *table)
>   {
>     return new ha_myisam(table);
>   }
>
>   ->
>
>   static handler *myisam_create_handler(TABLE_SHARE *table, MEM_ROOT
*mem_root)
>   {
>     return new (mem_root) ha_myisam(table);
>   }
>
> - - New optional virtual function: use_hidden_primary_key().
>   This is called in case of an update/delete when
>   (table_flags() and HA_PRIMARY_KEY_REQUIRED_FOR_DELETE) is defined
>   but we don't have a primary key. This allows the handler to take
precisions
>   in remembering any hidden primary key to able to update/delete any
>   found row. The default handler marks all columns to be read.
>
> - - handler::table_flags() now returns a ulonglong (to allow for more
flags).
>
> - - New/changed table_flags()
>   - HA_HAS_RECORDS     Set if ::records() is supported
>   - HA_NO_TRANSACTIONS     Set if engine doesn't support transactions
>   - HA_PRIMARY_KEY_REQUIRED_FOR_DELETE
>                             Set if we should mark all primary key columns
for
>     read when reading rows as part of a DELETE
>     statement. If there is no primary key,
>     all columns are marked for read.
>   - HA_PARTIAL_COLUMN_READ  Set if engine will not read all columns in
some
>     cases (based on table->read_set)
>  - HA_PRIMARY_KEY_ALLOW_RANDOM_ACCESS
>        Renamed to HA_PRIMARY_KEY_REQUIRED_FOR_POSITION.
>  - HA_DUPP_POS              Renamed to HA_DUPLICATE_POS
>  - HA_REQUIRES_KEY_COLUMNS_FOR_DELETE
>     Set this if we should mark ALL key columns for
>     read when when reading rows as part of a DELETE
>     statement. In case of an update we will mark
>     all keys for read for which key part changed
>     value.
>   - HA_STATS_RECORDS_IS_EXACT
>      Set this if stats.records is exact.
>      (This saves us some extra records() calls
>      when optimizing COUNT(*))
>
>
> - - Removed table_flags()
>   - HA_NOT_EXACT_COUNT     Now one should instead use HA_HAS_RECORDS if
>    handler::records() gives an exact count() and
>    HA_STATS_RECORDS_IS_EXACT if stats.records is exact.
>   - HA_READ_RND_SAME    Removed (no one supported this one)
>
> - - Removed not needed functions ha_retrieve_all_cols() and
ha_retrieve_all_pk()
>
> - - Renamed handler::dupp_pos to handler::dup_pos
>
> - - Removed not used variable handler::sortkey
>
>
> Upper level handler changes:
>
> - - ha_reset() now does some overall checks and calls ::reset()
> - - ha_table_flags() added. This is a cached version of table_flags(). The
>   cache is updated on engine creation time and updated on open.
>
>
> MySQL level changes (not obvious from the above):
>
> - - DBUG_ASSERT() added to check that column usage matches what is set
>   in the column usage bit maps. (This found a LOT of bugs in current
>   column marking code).
>
> - - In 5.1 before, all used columns was marked in read_set and only
updated
>   columns was marked in write_set. Now we only mark columns for which we
>   need a value in read_set.
>
> - - Column bitmaps are created in open_binary_frm() and
open_table_from_share().
>   (Before this was in table.cc)
>
> - - handler::table_flags() calls are replaced with
handler::ha_table_flags()
>
> - - For calling field->val() you must have the corresponding bit set in
>   table->read_set. For calling field->store() you must have the
>   corresponding bit set in table->write_set. (There are asserts in
>   all store()/val() functions to catch wrong usage)
>
> - - thd->set_query_id is renamed to thd->mark_used_columns and instead
>   of setting this to an integer value, this has now the values:
>   MARK_COLUMNS_NONE, MARK_COLUMNS_READ, MARK_COLUMNS_WRITE
>   Changed also all variables named 'set_query_id' to mark_used_columns.
>
> - - In filesort() we now inform the handler of exactly which columns are
needed
>   doing the sort and choosing the rows.
>
> - - The TABLE_SHARE object has a 'all_set' column bitmap one can use
>   when one needs a column bitmap with all columns set.
>   (This is used for table->use_all_columns() and other places)
>
> - - The TABLE object has 3 column bitmaps:
>   - def_read_set     Default bitmap for columns to be read
>   - def_write_set    Default bitmap for columns to be written
>   - tmp_set          Can be used as a temporary bitmap when needed.
>   The table object has also two pointer to bitmaps read_set and write_set
>   that the handler should use to find out which columns are used in which
way.
>
> - - count() optimization now calls handler::records() instead of using
>   handler->stats.records (if (table_flags() & HA_HAS_RECORDS) is true).
>
> - - Added extra argument to Item::walk() to indicate if we should also
>   traverse sub queries.
>
> - - Added TABLE parameter to cp_buffer_from_ref()
>
> - - Don't close tables created with CREATE ... SELECT but keep them in
>   the table cache. (Faster usage of newly created tables).
>
>
> New interfaces:
>
> - - table->clear_column_bitmaps() to initialize the bitmaps for tables
>   at start of new statements.
>
> - - table->column_bitmaps_set() to set up new column bitmaps and signal
>   the handler about this.
>
> - - table->column_bitmaps_set_no_signal() for some few cases where we need
>   to setup new column bitmaps but don't signal the handler (as the handler
>   has already been signaled about these before). Used for the momement
>   only in opt_range.cc when doing ROR scans.
>
> - - table->use_all_columns() to install a bitmap where all columns are
marked
>   as use in the read and the write set.
>
> - - table->default_column_bitmaps() to install the normal read and write
>   column bitmaps, but not signaling the handler about this.
>   This is mainly used when creating TABLE instances.
>
> - - table->mark_columns_needed_for_delete(),
>   table->mark_columns_needed_for_delete() and
>   table->mark_columns_needed_for_insert() to allow us to put additional
>   columns in column usage maps if handler so requires.
>   (The handler indicates what it neads in handler->table_flags())
>
> - - table->prepare_for_position() to allow us to tell handler that it
>   needs to read primary key parts to be able to store them in
>   future table->position() calls.
>   (This replaces the table->file->ha_retrieve_all_pk function)
>
> - - table->mark_auto_increment_column() to tell handler are going to
update
>   columns part of any auto_increment key.
>
> - - table->mark_columns_used_by_index() to mark all columns that is part
of
>   an index.  It will also send extra(HA_EXTRA_KEYREAD) to handler to allow
>   it to quickly know that it only needs to read colums that are part
>   of the key.  (The handler can also use the column map for detecting
this,
>   but simpler/faster handler can just monitor the extra() call).
>
> - - table->mark_columns_used_by_index_no_reset() to in addition to other
columns,
>   also mark all columns that is used by the given key.
>
> - - table->restore_column_maps_after_mark_index() to restore to default
>   column maps after a call to table->mark_columns_used_by_index().
>
> - - New item function register_field_in_read_map(), for marking used
columns
>   in table->read_map. Used by filesort() to mark all used columns
>
> - - Maintain in TABLE->merge_keys set of all keys that are used in query.
>   (Simplices some optimization loops)
>
> - - Maintain Field->part_of_key_not_clustered which is like
Field->part_of_key
>   but the field in the clustered key is not assumed to be part of all
index.
>   (used in opt_range.cc for faster loops)
>
> - -  dbug_tmp_use_all_columns(), dbug_tmp_restore_column_map()
>    tmp_use_all_columns() and tmp_restore_column_map() functions to
temporally
>    mark all columns as usable.  The 'dbug_' version is primarily intended
>    inside a handler when it wants to just call Field:store() &
Field::val()
>    functions, but don't need the column maps set for any other usage.
>    (ie:: bitmap_is_set() is never called)
>
> - - We can't use compare_records() to skip updates for handlers that
returns
>   a partial column set and the read_set doesn't cover all columns in the
>   write set. The reason for this is that if we have a column marked only
for
>   write we can't in the MySQL level know if the value changed or not.
>   The reason this worked before was that MySQL marked all to be written
>   columns as also to be read. The new 'optimal' bitmaps exposed this
'hidden
>   bug'.
>
> - - open_table_from_share() does not anymore setup temporary MEM_ROOT
>   object as a thread specific variable for the handler. Instead we
>   send the to-be-used MEMROOT to get_new_handler().
>   (Simpler, faster code)
>
>
>
> Bugs fixed:
>
> - - Column marking was not done correctly in a lot of cases.
>   (ALTER TABLE, when using triggers, auto_increment fields etc)
>   (Could potentially result in wrong values inserted in table handlers
>   relying on that the old column maps or field->set_query_id was correct)
>   Especially when it comes to triggers, there may be cases where the
>   old code would cause lost/wrong values for NDB and/or InnoDB tables.
>
> - - Split thd->options flag OPTION_STATUS_NO_TRANS_UPDATE to two flags:
>   OPTION_STATUS_NO_TRANS_UPDATE and OPTION_KEEP_LOG.
>   This allowed me to remove some wrong warnings about:
>   "Some non-transactional changed tables couldn't be rolled back"
>
> - - Fixed handling of INSERT .. SELECT and CREATE ... SELECT that wrongly
reset
>   (thd->options & OPTION_STATUS_NO_TRANS_UPDATE) which caused us to loose
>   some warnings about
>   "Some non-transactional changed tables couldn't be rolled back")
>
> - - Fixed use of uninitialized memory in ha_ndbcluster.cc::delete_table()
>   which could cause delete_table to report random failures.
>
> - - Fixed core dumps for some tests when running with --debug
>
> - - Added missing FN_LIBCHAR in mysql_rm_tmp_tables()
>   (This has probably caused us to not properly remove temporary files
after
>   crash)
>
> - - slow_logs was not properly initialized, which could maybe cause
>   extra/lost entries in slow log.
>
> - - If we get an duplicate row on insert, change column map to read and
>   write all columns while retrying the operation. This is required by
>   the definition of REPLACE and also ensures that fields that are only
>   part of UPDATE are properly handled.  This fixed a bug in NDB and
>   REPLACE where REPLACE wrongly copied some column values from the
replaced
>   row.
>
> - - For table handler that doesn't support NULL in keys, we would give an
error
>   when creating a primary key with NULL fields, even after the fields has
been
>   automaticly converted to NOT NULL.
>
> - - Creating a primary key on a SPATIAL key, would fail if field was not
>   declared as NOT NULL.
>
>
> Cleanups:
>
> - - Removed not used condition argument to setup_tables
>
> - - Removed not needed item function reset_query_id_processor().
>
> - - Field->add_index is removed. Now this is instead maintained in
>   (field->flags & FIELD_IN_ADD_INDEX)
>
> - - Field->fieldnr is removed (use field->field_index instead)
>
> - - New argument to filesort() to indicate that it should return a set of
>   row pointers (not used columns). This allowed me to remove some
references
>   to sql_command in filesort and should also enable us to return column
>   results in some cases where we couldn't before.
>
> - - Changed column bitmap handling in opt_range.cc to be aligned with
TABLE
>   bitmap, which allowed me to use bitmap functions instead of looping over
>   all fields to create some needed bitmaps. (Faster and smaller code)
>
> - - Broke up found too long lines
>
> - - Moved some variable declaration at start of function for better code
>   readability.
>
> - - Removed some not used arguments from functions.
>   (setup_fields(), mysql_prepare_insert_check_table())
>
> - - setup_fields() now takes an enum instead of an int for marking columns
>    usage.
>
> - - For internal temporary tables, use handler::write_row(),
>   handler::delete_row() and handler::update_row() instead of
>   handler::ha_xxxx() for faster execution.
>
> - - Changed some constants to enum's and define's.
>
> - - Using separate column read and write sets allows for easier checking
>   of timestamp field was set by statement.
>
> - - Remove calls to free_io_cache() as this is now done automaticly in
ha_reset()
>
> - - Don't build table->normalized_path as this is now identical to
table->path
>   (after bar's fixes to convert filenames)
>
> - - Fixed some missed DBUG_PRINT(.."%lx") to use "0x%lx" to make it easier
to
>   do comparision with the 'convert-dbug-for-diff' tool.
>
>
> Things left to do in 5.1:
>
> - - We wrongly log failed CREATE TABLE ... SELECT in some cases when using
>   row based logging (as shown by testcase
binlog_row_mix_innodb_myisam.result)
>   Mats has promised to look into this.
>
> - - Test that my fix for CREATE TABLE ... SELECT is indeed correct.
>   (I added several test cases for this, but in this case it's better that
>   someone else also tests this throughly).
>   Lars has promosed to do this.
>
> - - Update worklog #WL3281 with things that where not done in this
cleanup.
>   What is should still be done in 5.1 are:
>
>  - Add a hash of all tables used by any active transaction. We need
>    this to not drop or rename a table or database while an object is
>    still in use.  (Currently we only detect if a table is in use by a
>    running statement)
>
>  - Maybe: New interface of index_read() to be able to handle bitmap
indexes
>    (Need to change length of key to bitmap to say which key parts are
>    available)
>
> - -------------
>
> Regards,
> Monty
>
>
> If you don't want to be on this alias and only read these threads in the
forums, click this link:
https://intranet.mysql.com/secure/forum/unsubscribe.php?60
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2 (GNU/Linux)
> Comment: For info see http://quantumlab.net/pine_privacy_guard/
>
> iD4DBQFEkVjnSVDhKrJykfIRAmvBAJ4yf6kdFrwIPzyLfhlGWOrZ7Ri8jwCXW4a0
> JIuMk+u1Pmk16ooQobnB6Q==
> =+jWG
> -----END PGP SIGNATURE-----
> MySql-DDE discussion list
> www.freelists.org/
>

MySql-DDE discussion list
www.freelists.org/

Other related posts: