Rhino - ETL

Using Rhino-ETL ( a C# based) framework for developing standard ETL's is pretty easy and one can do a lot of fun stuff with the underlying source data. I just wrote up a quick console app to generate data into a text file and push the same data into a table in SQL Server as well as an external file. Here are the following steps (for the external file push):
1. Create a new C# console application solution in Visual Studio.
2. Target the .Net framework as shown in the below screen shot in your project properties:-

3. Create 3 Sub Folders underneath your project as shown in the following screen shot
DataObjects --> Contains the class files associated with each and every table/file in your environment. Example:- if your source file contains student data, then you would create a class file called Student with the individual properties (in relation to the properties) exposed (nouns).
Operations--> This primarily contains the class files that contain the activities (adjectives) that need to be performed on the DataObjects. Example:- Writing the Student Data to a database, Reading the Student data from a file etc.
WorkFolder--> Contains the external file sources to interact with. Example:- a flat file, a csv or a tsv. In this case it will be student.txt.


Lets write some code to insert a student record from a flat file into another flat file.......(as simple as it sounds)

4. Create a class file called StudentRecord.cs (pipe delimited) and declare the required entity attributes as shown in the following code snippet:-
Contains records in the following manner (student.txt)
StudentId|StudentName|StudentAddress|StudentClassId|StudentMarks //header
1|Ishwar|TestAddress|1|85 //row

5. Create a class file called NewStudentRecord (which contains the attributes that need to be transferred to the new file)
This will be outputted in the following manner(tab separated)
StudentId\tStudentName\tStudentAddress\tStudentClass
1\tIshwar\tTestAddress\t1

Let us now create the action called student write i.e. Let us go about writing this out to an output file called studentoutput.txt and I am creating a new C# class file called StudentWriteFile which will be as shown in the following code snippet:-

Now let us go about writing the main program... create the main program in the following manner:-
The setting's values basically point to the settings files that I have created which contains the absolute path of the student.txt and the studentoutput.txt files.

After which in your main just initialize the MainProgram in the following manner:-
 new MainProgram().Execute();

and you will have your first rhino-etl to rock and roll with......



Comments

Ashwin said…
Thanks for the blog.. Ive been looking for something like this to learn Rhino ETL for quite a while.. Was surfing the net before you put this article.

Anyhow I have a small doubt with the example you have provided.

1)"StudentRead" Cant find this class.
Ishwar said…
Great catch Ashwin.... must have added the class at a later stage...
public StudentRead(string filePath) { this.filePath = filePath; }
string filePath = null; public override IEnumerable Execute(IEnumerable rows) { FluentFile engine = FluentFile.For(); engine.HeaderText = "Id\tsName\tsAddress\tsclass"; using (FileEngine file = engine.To(filePath)) { foreach (Row testRow in rows) { Row row = new Row(); //row.Copy(leftRow); //copy over all properties not in the student records row["sId"] = testRow["StudentId"]; row["sName"] = testRow["StudentName"]; row["sAddress"] = testRow["StudentAddress"]; row["sclass"] = testRow["StudentClassId"]; file.Read(row.ToObject()); //pass through rows if needed for another later operation yield return row; } }

Popular posts from this blog

Branding your SharePoint site in a super fast way - Emgage

Load Data into Azure DW using C# in an SSIS script task

Power BI To Embed Or Not To Embed