Floating FB popout byF5debug

SQL Server Integration Services (SSIS) – Part 45 – Fuzzy Grouping Transformation in SSIS

 

This is part 45 of the series of article on SSIS

Introduction:

In this article we are going to see on how to use the Fuzzy Grouping task. This task is mainly used to group some similar data in a row and cleaning the duplicates to maintain a standard of the table. This task requires a connection to the SQL database that the transformation algorithm requires to. Let’s jump start to the section on how to do that using a sample package.

You can look into my series of article on SSIS at the url - http://f5debug.net/tutorial/ssis.aspx

[more]

Steps:

Follow steps 1 to 3 on my first article to open the BIDS project and select the right project to work on integration services project. Once the project is created, we will see on how to use the Fuzzy Grouping control. Once you open the project just drag and drop the Fuzzy Grouping control and a source provider as shown in the below image.

clip_image001

Now I configured the OELDB source by selecting Order tables from Northwind database as shown in the below screen.

clip_image002

Now we need to configure the Fuzzy Grouping, double click on the task and it will open the window as shown in the below screen

clip_image003

We need to do the configuration as shown in the below screen. Now we will see on how the fuzzy grouping transformation works out. Go to the Columns tab and select the row which we need to exact do the full search of the reduplicating and applying the fuzzy algorithm.

clip_image004

CustomerID is the value which we are going to do a fuzzy grouping using the in-build algorithm, Here we have different options to search for the conditions.

Match Type has 2 values as EXACT and FUZZY. Exact do the exact match for the specified column and give the result only it matches the exact value. Fuzzy do a similar search and checks for some particular value and uses it to do the transformation like (Example a column employee name has Karthik A and Karthik B if the type is Fuzzy then it takes this value into account where in Exact it will not take since last name A and B changes)

How exactly the algorithm works out is as follows

Step 1 – It transforms and loads the data to a temporary table

Step 2 – Fuzzy algorithm takes into account and searches for the matched one’s

Step 3 – For each row the transformation conditions checks for the match above the threshold and created an exact match

Step 4 – Applies the resulted output to the packages to proceed further.

The columns available for the fuzzy grouping logic are as follows

  • Input Column – Selected columns
  • Output Alias – This values comes as output for the fuzzy inputs
  • Group Output Alias – This values holds the best value match
  • Match Type – Exact or Fuzzy selection
  • Minimum Similarity – Has the minimum similarity value
  • Similarity Output Alias – Has the similarity score for that column
  • Numerals – Handles the matched data in number
  • Comparison Flags – Checks for the comparison string handling

Now move to the next tab Advanced as shown in the below image

clip_image005

Here we update the global values which are used across the package to handle the fuzzy transformation can be applied here. The main feature here we need to look is the similarity threshold where we need to specify the minimum threshold match that should be accepted in the transformation.

Conclusion:

So in this article we have seen on how to use the Fuzzy Grouping transformation task and the key configurations used in order to use this task handy.

About Author: Karthikeyan Anbarasan, Microsoft MVP (Most Valuable Professional) in ASP.NET/IIS Architecture. He is the Founder and Chief Editor of the www.f5debug.net website and has authored books on Windows Phone and Business Intelligence(SSIS). He is also a Passionate Speaker and a Blogger on Microsoft Technologies.

You can Join Me On: Facebook, Twitter, Google Plus, LinkedIn

Comments (1) -

Bohdan
Bohdan
5/17/2013 12:49:37 PM #

Thank you Karthikeyan for a good explanation.  How to actually get results in the end?

Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading

Recent Posts

Random Posts

Most Viewed

URL Shortner

Create your own short urls!!!

Site Counter

free counters

Disclaimer

This is a personal weblog. The opinions expressed here represent my own and not those of my employer. For accuracy and official reference refer to MSDN/ TechNet. I have documented my personal experience on this blog.

Protected by Copyscape Web Plagiarism Finder

Site Meter