Migrate - Introduction

From ADempiere
Jump to: navigation, search
This Wiki is read-only for reference purposes to avoid broken links.


Introduction

What is Data Migration?

Welcome to Migrate, Adempiere's universal migration tool for upgrading, transferring, and converting databases.

“Migrating” means moving from one place to another. Specifically for databases, “migrating data” can have either of the following meanings:

  1. Transferring

    The process of transferring data between storage types or computer systems. Like copying data from hard disk to floppies, or from one server to another. This is commonly referred to as Copying, Transferring, Moving, or Replicating.

  2. Converting

    The process of converting data from one format or system to another. For example, if your company changes its database system from a proprietary vendor to an open source alternative, the data needs to be manipulated to fit into the new database's format. This is commonly referred to as Converting or Translating.

  3. Upgrading

    The process of upgrading a database's structure to enable new or different functionality. Newer software versions may have introduced new functionality or bug fixes which require a different database structure than was available in previous versions. In such cases, your database needs to be adjusted to the new structure so that it can be correctly utilized by the new software version. This is commonly referred to as version migration or upgrading.

Migrate can do all three types of migration, therefore we call it a Universal Migration Tool.

You can use Migrate for following tasks:

  • converting your database from Oracle to postgreSQL

  • converting your database for use by Compiere to use by Adempiere

  • upgrading your database for use by a different Adempiere version

History

Before Adempiere forked from the Compiere project, version migration was available to Compiere users for a fee. The user had to load the newest reference database, which was distributed with the Compiere software package, and then start a closed-source proprietary migration program, which would check the license validity and download SQL scripts from Compiere's web server to correctly upgrade the live database by copying the reference database's structure. This was done through a graphical user interface which was straight-forward and worked very well, but it had one disadvantage (apart from the obvious cost factor and being closed-source): it was not very flexible.

This is inherent in the nature of scripts – they run a number of commands in sequence to get from origin A to target B. It is not possible to get to a target C or D. For Compiere it meant that it was only possible to upgrade from older versions to the newest version, not to a version in between or downgrade to a lower version. You had to load the newest reference database to work with the newest scripts. As a consequence, you were forced to do a full upgrade every time, introducing many bugs and trial features, which was not ideal for business environments.

Compiere's proprietary upgrade service

Since Compiere's version migration was proprietary, it was not included in the code base from which Adempiere forked out, and a new solution had to be found quickly to be able to do any version migration at all. Karsten Thiemann programmed a nice little tool called DBDifference, which would generate SQL-scripts based on the structural differences between the reference and target database. The SQL-scripts would then be manually applied to upgrade the target database.

As the user is actively involved in SQL-script generation and can also review and edit the scripts before they are applied to the target database, there is of course much more flexibility and control than was possible with Compiere's solution. But for the casual database user the task was daunting, and real world implementations with numerous extensions and customizations messing up DBDifference's logic required heavy interventions which were not always feasible. DBDifference also relied mainly on the reference database's design, without giving much thought to the contents of the Application Dictionary, a storage of meta-data and rules defining the data's use by Adempiere where also most customizations are defined.

Once your data reflected the structure required by an Adempiere release, things got easier because you could use scripts pre-generated by the Adempiere team (if you took good care of your customizations), but getting to that point was a major task.

Adempiere's script-based solution required massive user intervention

To make things worse, with Adempiere you had the choice of using postgreSQL, a free and open-source database system. So if you previously used a proprietary database system, you had to do a conversion migration to translate your data to postgreSQL. Another set of tools ( DDLUtils ) was used for this purpose, also requiring heavy user intervention.

So although the migration tools introduced by Adempiere were very flexible and in many cases proven to be workable, they lacked the ease of use old hands were accustomed to from Compiere's version migration tool. Being very suitable for Adempiere's application developers to maintain the seed database, they are a bit challenging for the average Adempiere user.

Migrate solves these disadvantages by providing a graphical user interface which makes it easy to use for the uninitiated, and giving up on the script concept entirely by using algorithms instead. Also Migrate uses a reference database against which the live database is checked, but the algorithms also make heavy use of the meta-data available in Adempiere's Application Dictionary and thus are also aware of any customizations and extensions. Any changes to the live database are made directly, no scripts are generated or need to be applied.

Migrate simplifies automated migration using algorithms instead of scripts

Functionality

Transfer Mode

Migrate reads the structure and data from a source database and writes it into a target database. In its most simple form, this corresponds to what we previously described as Transfer Migration.

You can therefore use migrate to transfer or copy a database from server A to server B, though it is not recommendable. This kind of migration is very straight-forward and does not require any overhead logic, and the tools provided by your database vendor (exp and imp for Oracle, pg_dump and pg_restore for postgreSQL) are much more suitable and extremely efficient. Migrate is much too slow and bulky for this task.

However, Migrate comes in handy if the source and target are for different database vendors, for example if you want to transfer your data from Oracle to postgreSQL. This is what we previously described as Conversion Migration.

In this case, Migrate reads content from the source database, translates it to a format understood by the target database, and then writes it to the target.

Note that although Migrate attempts to correctly translate content to the target's format, this is not always possible. Converting data types and indexes is relatively safe, converting views is a bit more difficult, and translating functions and procedural languages, such as from pl/sql to PL/pgSQL, is virtually impossible if you do not program a full-fledged command interpreter. Consequently, the user will be given warning messages to check on views that have been translated, but the translation of functions is currently not implemented at all.

In both above cases, data is read from the source and a new target is created, or an existing target is overwritten, to contain the source's data. The only difference is whether or not the source and target vendors are different. In Migrate, this kind of migration is called “Transfer Mode”.

Upgrade Mode

Things get more interesting if the target does not get overwritten, but if source data is merged into existing target data: The table structure etc. of your live data in the target table is modified to reflect the structure provided as reference from the source table. Data records missing in the target will be added from the source. Views and Functions defined in the target will be replaced by those defined in the source. So if a new Adempiere version required new tables or views or functions, that functionality would be copied to your live data from the source database. We therefore call this kind of migration “Upgrade Mode”, and the source is the reference database and the target is your live database.

This version migration is what will most often be used.

Note that version migration only refers to Adempiere versions, not versions of the database engine. Your database vendor will provide you tools to upgrade the database version, if necessary. Normally this is also achieved very efficiently by exporting (or dumping) data, installing the new database version, and then importing (or restoring) from the dump file.

Putting it all Together

Say you are currently running Compiere on an Oracle database, and you want to change over to Adempiere on a postgreSQL database. You would do this migration in two steps (each step will take approximately 3-5 hours, depending on the size of your live database):

First you would transfer your data from Oracle to postgreSQL. Migrate will take care that all data types are correctly translated and move the data. All tables, indexes, sequences, foreign keys etc. will be applied in the target database. An attempt will be made to translate views. Functions will be commented out (so that you can review the original code) and replaced with compilable stubs.

This translation is intended as a one-way step. If you try to translate back and forth between database vendors, you will eventually end up with gibberish.

As second step, you would load the reference database and run a version migration. Now the views and functions will be replaced by those defined in the reference database. So only your custom views need to be checked and custom functions need to be translated manually.

When done, you are ready to use Adempiere running on postgreSQL. From now on, you will only require version migrations each time you upgrade Adempiere, and they will run significantly faster.

Using Migrate for other applications

You may wonder whether Migrate can also be used for non- Adempiere databases. The more complex your database is, the less likely Migrate will be able to handle it correctly.

Simple transfer migrations (conversions between Oracle and postgreSQL ) should work pretty well. No problems are expected for tables, sequences, primary keys, indexes1, check constraints, unique constraints, not null constraints, and foreign keys.

Views and triggers will probably be translated correctly, but should be checked for safety's sake. Efforts will be made to translate operators, but there are only limited possibilities because some databases use words as operator names (“plus”, “minus”, etc.), and others use symbols (“+”, “-”, …). Translation of functions is not implemented yet and has to be done manually. In addition, complications arise if you use overloaded function and operator names but the target does not allow overloading2.

If you do not have a reference database against which you could do a subsequent version migration, you will need to do a lot of manual checking and verifying to confirm that the conversion was successful.

Upgrade migrations will work, but you will lose all tables, columns, views etc. which are not defined in the reference database. That is because Migrate will not be able to look up meta-data from the Adempiere -style application dictionary it requires to handle customizations correctly. Table names are also assumed to follow Adempiere conventions which may lead to loss of data. For example, tables starting with “T_…” are considered to contain temporary data which will be dropped.

That being said, migration of Compiere databases should work pretty well, as Adempiere was forked off the Compiere project and both applications therefore use the same application dictionary format and table names.

Process Description

Migrate performs the following steps to run a migration:

Connect to Databases

Migrate uses JDBC to connect to the source and target databases.

If conducting a transfer migration, any existing data in the target database is erased.

Load Meta-Data

As a first step, some tests are made to detect and correct buggy behavior by some JDBC drivers.

Meta-data on the database's structure (tables, indexes, views, functions, sequences, foreign keys, etc.) is loaded.

The Application Dictionary is accessed to gather information on customizations, system clients, and languages used.

Structural Migration

To get rid of overhead, Migrate first of all removes all kinds of database objects which are not tables from the target database:

  • check constraints

  • unique constraints

  • foreign keys

  • views

  • operators

  • triggers

  • functions

  • primary keys3

  • indexes3

With the database reduced to this state, Migrate can pretty much do whatever it wants without running into constraint issues or being slowed down for integrity checks.

Then temporary tables are truncated to reduce the amount of data that needs to be migrated and thus increase performance:

  • Data from temporary tables (T_…) is removed

  • Records from Import tables (I_…) which have already been imported are removed

  • Records from the TEST table (Test) are removed

  • Processes and Errors are removed (AD_PInstance, AD_Find, AD_Error)

  • Changes which are not customizations are removed (AD_ChangeLog)

  • Sessions older than a week are removed (AD_Session)

  • Notes which have been processed are removed (AD_Note)

  • Log entries older than a week are removed (…Log)

The GardenWorld demonstration client is dropped, and all system records which are not referenced by real clients are purged.

Any sequences defined in the target are synchronized with the reference database, and sequences which are not yet defined are added.

Finally, the main structural migration task of synchronizing the target's table structure starts:

  • Non-customized tables are dropped from the target if they do not exist in the reference database

  • Tables existing only in the reference database are added to the target

  • Tables existing in both the target and the reference database are synchronized:

    • Target tables are renamed to have the same name as their counterparts in the reference database4

    • Non-customized columns are dropped from the target if they do not exist in the reference database

    • Columns existing only in the reference database are added to the target

    • Columns existing in both the target and the reference database are synchronized so that the target column has the same properties as the column in the reference database:

      • column name

      • data type and size

      • default values

      • nullable constraint

After table synchronization, any non-customized sequences are dropped from the target if they do not exist in the reference database.

Database objects are recreated – all objects existing in the reference database are created in the target, and those target objects which are customizations are re-created:

  • functions

  • triggers

  • operators

  • views

  • indexes5

  • primary keys5

Data Migration

Data records are transferred from the reference database to the target:

  • if the record does not yet exist in the target, it is added.

  • if the record already exists in the target, the target record is updated to contain the same data in all columns as the reference database.

New parent tables are populated6 (only for upgrade migrations). If new tables are added to the target which use previously existing independent tables as child tables, records must be added to the parent table to reflect already existing data in the child tables.

Parent links are preserved (only for upgrade migrations). If a target table did not contain a column which is used as part of a foreign key constraint in the reference database, that column will have been added with a default value which does not reference any parent record. The correct parent must be found and the default value replaced with a link to the parent record.

Orphaned data is removed (only for upgrade migrations). Records who's parent records have been purged during migration are orphans which are no longer required and must be deleted.

Check constraints are enforced (only for upgrade migrations). Records containing values which would violate a check constraint are modified to comply with the constraint.

Cleanup

Cleanup operations are performed only for upgrade migrations:

Customizations are re-applied. Users may modify windows and processes in Adempiere, but those modifications would be overwritten and reset by the migration process. Modifications which should be preserved can be marked as customization in the change log, and they will be re-applied.

Sequence counters are checked to ensure that the next number is larger than any number already used in the database. Missing sequence counters are added (Sequence counters defined in the application dictionary as well as native database sequence counters).

Missing translations are added. If translation records are required but do not exist yet, they are added with the original text from the main record.

Terminology is synchronized:

  • New elements are created in the application dictionary for any columns or parameters which have no base element defined yet.

  • unused elements are deleted

  • consistent terminology is deployed throughout the application dictionary

Trees are re-organized so that customized nodes are inserted back into their original locations.

Security settings are verified and role access records updated or added.

Version information stored in the application dictionary is updated.

Enforce Constraints

Constraints are recreated – all constraints existing in the reference database are created in the target, and those target constraints which are customizations are re-created:

  • foreign keys

  • check constraints

  • unique constraints

Close Database Connections

The source connection is closed and, if appropriate, the reference database is dropped.

Any remaining changes are committed to the target and the target connection is closed. If requested, the live database will be optimized.



[1] Success will vary depending on the complexity of your database. For example, Migrate will have no problem to convert your indexes, unless if they are function-based indexes, because Migrate can not yet translate functions.

[2] Function overloading, also called polymorphism, is when you have multiple functions by the same name which are distinguished by different types or numbers of arguments.

[3] For performance reasons, primary keys and indexes are actually dropped at a later stage, and also temporary indexes are created and later dropped again during the migration process. These performance enhancements do not affect the functionality of the actual migration process and are omitted in this description for simplicity's sake.

[4] This feature is not implemented yet.

[5] For performance reasons, indexes and primary keys are actually recreated at a later stage after data migration.

[6] This feature is not implemented yet.