Posts

Showing posts from 2014

Data Implosion and not Explosion

Image
I was planning on writing this article a while ago but never came down to it.... but I do have some time on my hands to write about how data has changed over the years. Nowadays data has been imploding and what I want to infer from implosion is that the manner in which data is being collated and published to end users with various transformations along the way creates a sense of cause and effect. You have massive data-sets but the information from this can be co-related in so many different ways that eventually one is unable to figure out how to go about getting the required information onto the end user's field of vision. Let us consider Big Data solutions....now whenever we feel that the volume of data has reached its significant end in terms of storage limitations we can go about introducing a solution (a big data solution) in order to contain the explosion and ensure that there is no data loss.... hence in this case we can always ensure that in the eventuality of a massive up…

Blue Screen of Death --> Microsoft Power Map

Image
I was fiddling around with Power Map and wanted to see how far I could go ahead with the mapping visualization.......
I basically leveraged the Power Station data file available at:
http://office.microsoft.com/en-us/excel-help/redir/XT104048055.aspx?CTT=5&origin=HA104091224

After downloading this excel file, I started to create Power Map with the basic idea of having 3 layers.
1. The first layer would take into account all the counties based on the Power Transmitted
2. The second layer would consist of all the companies based the Power distribution
3. The third layer considers the power distribution based on Plant Name....

Now this excel file contains roughly about 20,000 rows...... My machine configuration is basically a Windows 8 with an Intel i7 core and 16 GB of RAM. I was good up to step 1 in the steps described above but when I hit step 2 that's when I noticed that my memory just bloated like crazy...... I decided like a crazy person to try step 3 and that's when my …

Redshift Experience

Image
Big Data - the keyword given to solutions that can handle massive amount of data usually in the petabyte or greater amount. There are several big data solutions out there and all of them have their unique characteristics which can be useful in different scenarios. I was looking into Cloudera's versions of Hadoop like Impala, Sentry and HBase. All these vary based on the use case. For some of my clients I have leveraged Amazon Redshift, Cassandra (and hopefully soon Apache Hadoop). The architecture of these systems differ but the end goal is the storage and processing of vast amounts of data down to second or milli second based result generation. Focusing on this aspect I am going to give a more detailed insight on Redshift which is a node based peta byte scaled database as well as a high level overview of what I recently implemented.
Note: The above diagram is from the Redshift Warehousing article (http://docs.aws.amazon.com/redshift/latest/dg/c_high_level_system_architecture.ht…

SSAS Cube issues ..... Incorrect Measure Values

Have you ever noticed that your measure value in SSAS do not correspond to the value in the data warehouse.... this is a big hindrance....because one is wasting precious development time in
extrapolating data between the cube and the warehouse. I am just going to create a checklist for issues to look at really quick if one does come across this issue!!!

Problem Statement:
Let us consider a Fact table called FactInternetSales with a fact called internetsalesamount
select sum(internetsalesamount) from FactInternetSales;
Let us say that this value is 25000.
Now if we run this against the cube .... lets call it Sales.
select measures].[internetsalesamount] on 0 from Sales;
Now the value returned from this query is 510......
Why is this happening? Now just follow the following checklist to ensure that you can rectify this issue as soon as one possibly can....
1. Go to the cube and in the [internetsalesamount] properties, change the aggregation value to count instead of sum. Validate w…

New CEO for Microsoft

It was recently announced that Satya Nadella (a Microsoft veteran) has become the CEO of Microsoft. Even though I have a great sense of pride that a person of Indian origins or a first generation American immigrant of Indian origins is now the biggest and probably the most powerful person in the I.T. industry today, Microsoft placed a safe bet in naming Nadella as its CEO. I feel that a better choice would have been Sundar Pichai (VP @ Google) or even Robin Li ( The founder and CEO of Baidu). Of course from an experience point of view Nadella would need absolutely no introduction. His resume speaks for itself. He created and successfully handled a multitude of Microsoft's unique silo's or divisions. But infusing new ideas decoupled from the earlier thought process that Microsoft has always had would have paved for a new era when it comes to Microsoft revolutionizing the tech industry. This is a factor that will bear a brunt in Microsoft's cogs because clearly Nadella has …

SSIS SFTP handling using WinSCP

Just a follow on to the earlier post.....
try {
string file = Dts.Variables["strLocalDirectory"].Value.ToString() + "\\" + Dts.Variables["strLocalFile"].Value.ToString();
string username = Dts.Variables["strFTPUserName"].Value.ToString();
string password = Dts.Variables["strFTPPassword"].Value.ToString();
// Setup session options
SessionOptions sessionOptions = new SessionOptions {
HostName = Dts.Variables["strFTPHost"].Value.ToString(),
UserName = username,
Password = password,
Protocol = Protocol.Sftp,
PortNumber = int.Parse(Dts.Variables["strFTPPort"].Value.ToString()),
FtpMode = FtpMode.Active,
FtpSecure = FtpSecure.None,
SshHostKeyFingerprint = Dts.Variables["strFT…

SSIS FTPS File handling

The following scripttask showcases both the FTP and FTPS based connections for uploading a file (it can be modified to perform more options). My earlier approach was to leverage the FTPWebRequest but the "AUTH SSL" command was taking a huge amount of time so decided to fall back on using WinSCP instead. Also ensure that the WinSCP.exe is added to the path environment variable and the WinSCP.dll is placed in the GAC.
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
public void Main()
{
/* * Traditional FTP */
/*ConnectionManager mgr = Dts.Connections.Add("FTP"); try { mgr.Properties["ServerName"].SetValue(mgr, "ftp://"+Dts.Variables["strFTPHost"].Value.ToString()); mgr.Properties["ServerUserName"].SetValue(mgr, Dts.Variables["strFTPUserName"].Value.ToString()); …