Difference between revisions of "A Step by Step Guide to Data Migration with Talend ETL"

From ADempiere
Jump to: navigation, search
This Wiki is read-only for reference purposes to avoid broken links.
m
m (Overview)
Line 5: Line 5:
 
= Overview =
 
= Overview =
  
In ERP Implementation, one of the major activity is Data Migration. This task is no doubt, tedious and time consuming, and no one want to work on it. I have been looking for tools to help in this task since I start working with ADempiere, I.e., Excel, ADempiere's Data Import Tools, or even 2Pack. And yes, we provide templates and hoping the customers will fill them in correctly, so that we can load them using ADempiere's Data Import Tools. But in reality, most data from customers are far from usable. And most of the time, we need to modify or rewrite the whole import process by ourselves.  
+
In ERP Implementation, one of the major activity is Data Migration. This task is undoubtedly, tedious and time consuming, yet very important one. I have been exploring for tools to help in this boring task since I start working with ADempiere. For instance, Excel, ADempiere own Data Import Tools, or even with PackOut.
  
I have been thinking about tools like ETL (i.e., Pentaho Kettle, etc.) to ease the process for a while. While most use case, are direct import to tables. Without ability to call ADempiere API to importing data is not very useful.
+
While ADempiere own import module seem to be the best as it can directly call the underlining API to make a valid object, it is not very flexible. And yes, to make it fit, we also provide templates and hoping the customers will fill them correctly. But in reality, most data from customers are far from usable. And most of the time, we need to rewrite the import process by ourselves in order to fit with the submitted data.
  
Until very recently, I just found that Talend has been used for ADempiere with ability to call ADempiere API some time back at [[Talend Open Studio]]. Later is better than never. So far, it is the best tools to work with ADempiere I have found. And in this article, I will try to help explaining how to use Talend with Adempiere in a bit more detail.
+
I have been thinking about tools like ETL in order to ease the process. Though the migration between DB to DB is easy, but without ability to call ADempiere's API to importing data is not very useful.
 +
 
 +
Until very recently, I just found that Talend has been used for ADempiere with ability to call ADempiere API for sometime [[Talend Open Studio]] (gosh!, where have I been). But Late is better than never. So far, it has been the best tools to work with ADempiere in my opinion. And in this article, I will try to help explaining how to use Talend with Adempiere in a bit more detail.
  
 
= Start with Talend and ADempiere =
 
= Start with Talend and ADempiere =

Revision as of 05:20, 29 May 2012

Note.gif Note:

DISCLAIMER - This is an implementation guide written by Kittiu, from ecosoft Bangkok, Thailand. It is not an official guide nor meant to be comprehensive yet. The knowledge from this article is based on other knowledge, other contributors are welcome to discuss on improving it.

Status: Draft

Overview

In ERP Implementation, one of the major activity is Data Migration. This task is undoubtedly, tedious and time consuming, yet very important one. I have been exploring for tools to help in this boring task since I start working with ADempiere. For instance, Excel, ADempiere own Data Import Tools, or even with PackOut.

While ADempiere own import module seem to be the best as it can directly call the underlining API to make a valid object, it is not very flexible. And yes, to make it fit, we also provide templates and hoping the customers will fill them correctly. But in reality, most data from customers are far from usable. And most of the time, we need to rewrite the import process by ourselves in order to fit with the submitted data.

I have been thinking about tools like ETL in order to ease the process. Though the migration between DB to DB is easy, but without ability to call ADempiere's API to importing data is not very useful.

Until very recently, I just found that Talend has been used for ADempiere with ability to call ADempiere API for sometime Talend Open Studio (gosh!, where have I been). But Late is better than never. So far, it has been the best tools to work with ADempiere in my opinion. And in this article, I will try to help explaining how to use Talend with Adempiere in a bit more detail.

Start with Talend and ADempiere

The good thing about Talend that make it useful for our cases (ADempiere), is ability to write ADempiere Connector and Import Components. Following picture depict the overview of how it will work for you,

Talend overview1.jpg

Following will briefly discuss how it works

  1. Initialize database connection
  2. Login to Adempiere using tAdempiereConnection component to receive ADempiere Context (i.e., Env.getCtx()). Now, we considered Logged In. This will also trigger the main import subjob.
  3. This component will will initialize Delimited File (point to local csv file), read the file and send columns and data (row by row) to to tMap component
  4. tMap component (the following picture), will map the input schema to the output schema. We can use expression to modify / filter / lookup data before sending to output.
  5. Provide lookup to other table, in this case, we have Org Name and we want to lookup for AD_Org_ID.
  6. Output data will be mapped to Adempiere's Model using tAdempiereOutput component. As you can guess, data will be created using Adempiere's API (i.e., object.save()).
  7. Just for information, the tLogRow_1 will display the raw output data.

TMap.jpg

Now, if you are convinced that Talend can help your project, we can move on (if not, and you have better solution, please let me know too. :).

Before continue reading, I recommend you to read the following list of articles. In this tutorial, I will not go into details of why I am doing the way I am doing, but rather provide you with useful examples.

Setup Talend for ADempiere

Note: You can also refer to Talend Open Studio as the original document.

  • Provide Adempiere core libraries
    • Copy ADEMPIERE_HOME/lib/Adempiere.jar and ADEMPIERE_HOME/lib/CCTools.jar to TOS_HOME/lib/java
    • Note: For customization.jar or other additional library, you can put it here as well. But be remembered that we will have to tell about it in the next Optional step.
  • Install Adempiere Talend's components
    • Download the Adempiere components Media:Adempiere_talend_0.2.zip
    • Extract components (tAdempiereXXX) and put it in any directory, i.e., TOS_HOME/custom_components
    • Point TOS to the directory containing the components.
      • Open TOS, > Window > Preferences, in tree, > Talend > Components
      • Update field "User component folder:" to your custom_component folder.
      • Apply the change. Now, if you create new job, you should now see Adempiere components in the Palette area under Business > Adempiere
  • (Optional), if you also installed customization.jar, make sure your Adempiere components know about it.
    • With TOS, > Window > Perspectives > Component's Designer
    • You should see 3 Adempiere components here, tAdempiereConnection, tAdempiereProcess, tAdempiereOutput. Each component will have a config file (tAdempiereXXX_java.xml).
    • Add new line for your customiizatoin.jar in <CODEGENERATION> part.
	<CODEGENERATION>
		<IMPORTS>
			<IMPORT NAME="Adempiere" MODULE="Adempiere.jar" REQUIRED="true" />
			<IMPORT NAME="Adempiere Libraries" MODULE="CCTools.jar" REQUIRED="true" />
			<IMPORT NAME="Customization" MODULE="customization.jar" REQUIRED="true" />
		</IMPORTS>
	</CODEGENERATION>
  • Install Samples, for samples job discussed in this tutorial.
    • Download the sample jobs Media:Adempiere_talend_example.zip
    • Switch back to perspective "Integration", on "Job Design" not, right click and "Import Items"
    • Select archive file to import.
  • The Component's Designer View
    •  ???
    •  ???

Data Migration Use Cases

Now as the environment setup is ready, and you have some knowledge about Talend ETL, now we are ready to explore how it works with Adempiere. The first case, will be a bit in detailed, but will be more conceptual as we move on.

Case 1: Simple Data Import - Business Partner Group

In this case, we will learn how to import data into 1 table. I.e., Business Partner Group. Though it is a simple case, you will see how other data, in this case, Accounting tab will be created as well. And at the end, we will elaborate it a bit, by look up to other table for ID instead hard code the ID.

Source Data: c_bp_group.csv

"Search Key";Name;Description;Default;"Print Color";"Priority Base";"Confidential Info";"Price List";"Purchase Pricelist";"Discount Schema";"PO Discount Schema";"Credit Watch %";"Price Match Tolerance";Dunning
RTS;Retailers;;false;Black;Same;false;Standard;;;;0;0;Default
WHS;"Whole seller";;false;Black;Same;false;Standard;;;;0;0;Default

Steps:

  • Create new job named "MBPGroup", click finish.
    • Double click on the new job to open the canvas.
  • Create database connection to adempiere (postgres)
    • On the left panel, > Metadata > Db Connections, right click > Create Connection
    • Create new connection named "adempiere", click next.
    • DB Type: "PostgreSQL", fill in all information, Login, Password, Server, Port, Database (i.e., adempiere), Schema (adempeire)
    • Click check button, it should show connection successful. Click finish.
    • Drag the newly created connection, i.e., adempeire, to the canvas.
    • Choose to create component as "tPostgresqlConnection"

Bp group db.jpg

  • Create adempeire connection
    • From the right palette, drag component "tAdempiereConnection" into the canvas
    • Double click on the icon, you will see component properties in below panel. Fill in Adempiere connection as following,

TAdempiereConnection basic setting.jpg

  • Test Adempiere Connection
    • Connect database connection with tAdempiereConnection, right click on "adempiere" (db connection) > Trigger > On Subjob OK, and drag it to tAdempiereConnection_1
    • On the Run tab, click Run, if connection is valid, it should show no exception error.

Test adempiere connection.jpg

Note.gif Note:
  • Just like when connect Adempiere for the first time, it will show Adempiere Connection dialog.
  • If you have read the suggested tutorial, you will know that Talend is using Code Generation concept.
  • You can always see the generated java code from the Code tab of the canvas. It is important to know this as it will be the Code that you want to trace for error.
  • Create source file connector (c_bp_group.csv)
    • On the left panel, > File delimited, right click > Create file delimited
    • Create new delimited file named "c_bp_group", click next.
    • Browse to select file c_bp_group.csv, click next.
    • This step will allow you to configure how it read the source file. I.e., check "Set heading row as column name" and click Refresh Preview.

C bp group step3.jpg

    • Click next, it will show the final metadata for c_bp_group, click finish to create the file connector.
    • Drag the newly created file connector, c_bp_group, to the canvas.
    • Choose to create component as "tFileInputDelimited".
  • Connect data from source data to Adempiere
    • Drag following components from palette to canvas
      • tMap: this component will be used to map data between input source file and output Adempiere Model
      • tAdempiereOutput_1: this component represent Adempiere model.
      • tLogRow_1: this is optional, but will be useful to display log of data row.
    • Connect each component together as in following picture.
      • From tAdempeireConnection_1 to c_bp_group file connector, use On Subjob OK trigger
      • From c_bp_group to tMap_1, simply connect.
      • From tMap_1 to tAdempiereOutput_1, name the output "c_bp_group"
      • From tAdempiereOutput_1 to tLogRow_1, simply connect.

Connect c bp group components.jpg

  • Configure tAdempiereOutput to import data with MBPGroup model,
    • Click on tAdempiereOutput_1, it will open the component properties.
    • Set Adempiere model class name: org.compiere.model.MBPGroup
    • As we are going to insert new data (not update), leave Insert Mode checked.
  • Important step, map input data to output model
    • Double click on tMap_1 component, tMap window will appear.
    • You will notice that now we have a table (column schema) on the left ready, but on the right only empty "c_bp_group" output table. We will have to define what we want to map to MBPGroup model.
    • For "c_bp_group", add the column that match with ADempiere column names,
    • Map column from the left to table on the right. For simplicity, we will hardcode AD_Org_ID to 0.

C bp group tmap1.jpg

Note.gif Note:
  • A key field is required for tAdempiereOutput component to run smoothly.
  • In Insert Mode, this field will be left blank (internally, 0 will be filled). In Update Mode, provided with Value / Name, ID can be looked up from database)
  • Make sure that Data Type match from left to right.
  • Make sure that Data Type match with the underlining Adempiere Model.
  • Click Finish to close the window. If you are asked to "Propagate the changes", click Yes. This will make sure that the new c_bp_group's schema is synchronized to tAdempiereOutput_1 and tLogRow_1.

Now we are ready to import data. Make sure that, we do not have the importing data in BPartner Group window yet. And if ready, click Run button.

You will see the data being transferred from source file to Adempiere in real time.

C bp group final.JPG

Case 2: Complex Data Import - Business Partners

Most data model are not as simple as in the first example. Give business partners for instance. Creating business partner consist of creating data in multiple tables, i.e., C_BPartner + C_BPartner_Acct + C_Location + C_BPartner_Location + AD_User. And these data will be created with the relation to one another, i.e., C_BPartner_Location is created with the foreign key C_BPartner_ID & C_Location_ID created before hand. In this example, we will be creating Business Partner and its related information, each from a row of record. And to make life a bit more complex, we will exclude all Business Partner with BP Group "Wholesales".

Source Data: c_bpartner.csv

Client;Organisation;Search Key;Name;Name 2;Description;Credit Status;Tax ID;Tax Group;Business Partner Group;Customer;Invoice Rule (D=After Delivery);Delivery Rule (A=Availability);Delivery Via (D=Delivery);Price List;Payment Rule;Payment Term;Sales Region;Sales Representative;Credit Limit;Partner Location;Address 1;Address 2;Address 3;Address 4;City;ZIP;Country;C_Country_ID;Ship Address;Invoice Address;Phone;2nd Phone;Fax
GardenWorld;HQ;CUS1001;SONY CO.,LTD.;;;;             ;;Retailers;TRUE;D;A;D;Standard;S;Immediate;East;GardenAdmin;0;Office;2322 Wall Street;Central City;;;Florida;23110;Thailand;319;TRUE;TRUE;9877687817;;877899998
GardenWorld;HQ;CUS1002;CP Co.,Ltd.;;;;;;Retailers;TRUE;D;A;D;Standard;S;30 Net;East;GardenAdmin;2000000;Factory;5765 Hia Jieng Road;Hua Him;Thien 2,Ward Tan Phong;District 7;Thianjin;12321;Vietnam;340;TRUE;TRUE;0986799719, 998991822;;998877766
GardenWorld;HQ;CUS1003;Ecosoft Co., Ltd.;;;;;;Retailers;TRUE;D;A;D;Standard;S;Immediate;West;GardenAdmin;0;Factory;76/98 Naratiwas Rd.;Sathorn;CENTRE#06-05;;Bangkok;33212;Singapore;300;TRUE;TRUE;6596192189;;291181812
GardenWorld;HQ;CUS1004;P&G Co., Ltd.;;;;;;Wholesales;TRUE;D;A;D;Standard;S;30 Net;West;GardenAdmin;500000;Office;889 Empire Tower;Yanawa;;;Bangkok;10120;Thailand;319;TRUE;TRUE;6756255-8,01 8272366;;987766121

Steps:

  • In this example, we will learn from the pre-installed job, "MBPartner". Double click on it will open the following canvas,

MBPartner.jpg

  • The connection to Adempiere as the same as previous example. Following are highlights of this case,
  • tMap - Lookup Tables
    • Looking up to related tables for IDs, for example, the following picture show how c_bp_group_id is looked up from c_bp_group table. Other look up tables are c_paymentterm, c_salesregion, m_pricelist, ad_user
    • You can also notice that, internally it is the joining of tables.

Tmap lookup bp group.jpg

  • tMap - Filter
    • As mentioned, that we want to exclude Business Partner in Group "Wholesales"
    • Clicking on Expression Filter button Expression filter icon.jpg on c_bpartner schema allow us to filter out "Wholesales"
    • Filter is available on both the Output and Input side.

Expression filter1.jpg

  • tMap - Multiple Output Model
    • As you can guess, the output row from tMap can map to as many table as we wish, what we have to do is create that output schema, and map fields.
    • What is important is to making sure that in sub sequence tables, we are using the IDs created from prior tables. For example, in this case we want to use C_BPartner_ID and C_Location_ID as foreign key in C_BPartner_Location table.
    • This can be achieved by adding Init Code for MBPartnerLocation's Mode, tAdempiereConnection_3. This code will get the C_BPartner_ID and C_Location_ID from their schema internally.

TAdempiere init code.jpg

    • I believe all the setup is intuitative enough. That's all, we need. Next, simply click Run button to finish your task.
    • Open the Business Partner to check the result.

Possible Enhancement (not yet supported)

  • Ability to work with DocAction for Documents. Although customizing tAdempiereOut to cover the
  • In updating mode, currently it only support Model with a unique key, but not yet support model with multiple foreign keys, i.e., C_BPartner_Location.
  • Transaction control is not yet supported, as currently it will commit with each po.save().
  • Ability to report the success / fail record. I think it will be possible to just reuse the source data file and give pass/fail flag and remarks at record end.

See Also