Wednesday 17 October 2007

Data Importing

Just been working for a client on a data import. The import had to read a flat file but produce a primary and several secondary records depending on the data involved.

I toyed with an agent firstly, but this is really restrictive when it comes to adding secondary data, without having the initial load of the data into a temporary table, and then having the agent go through the temporary table and complete the secondary tasks.

My solution was to use a pivotal agent to call a method on a c# appserver rule. The main dll is not transaction dependent. The code reads in the file into a data table (thanks to Andreas Knab's blog, this is childs play) then for each row gets together the information for a new or update to the main table. The code then calls another ASR which is within a transaction and returns the new or updated record id, which is then used to create secondary records using the same principle. Errors are collated in an arraylist.

At the end, the errors arraylist is written to a file to check the import, and we are done.

What impresses me is the speed of adding / processing records within c# compared to similar code within an agent. The c# code takes minutes to update / create 1000+ records. An agent would take considerably longer on the same specification machine.

One thing I have learnt is the use of transactions. I originally started with one appserver, which would fall over when the number of records added reached 200+. This was due to the transaction not being able to be expanded, and hence at this point all the modifications were rolled back. By creating a wrapper around the transactional appserver, the transaction is only around one update / insert. If the code does fall over, the records before are all saved and you don't have to start again.

My offline task now is to create a more generic update. Use a config file to determine which file to import, what the columns map to as well as any linking required / conditions. This seems simple enough, but I think it will be down to the syntax and flexibility of the import configuration whether it is a real benefit.

No comments: