This article will present you with a crash course in validating address data using the tGoogleAddressRow component in Talend Studio. Incorrect or incomplete address information is a common problem in customer databases but thankfully, Talend offers an out-of-the-box data quality component. This component allows data integration developers to validate and/or enrich address data via the use of the tGoogleAddressRow component. I’ll explain how to setup a Google Places API key and how to create a simple Talend job in order to make use of the tGoogleAddressRow component.
First things first, you’ll need to obtain a Google Places API key to use with the tGoogleAddressRow. The key is free but limited to 100 uses per day. This can be increased to 10,000 for 60 days for free if you register for billing.
1. Navigate to https://developers.google.com/places/web-service/, sign in with your Google account and click ‘Get a Key’.
2. Create a project.
3. Name the key and click ‘Create’.
4. Now you have your Google Places API key and you are ready to create the Talend job.
5. Begin with a tFixedFlowInput; create a schema that represents your business’ customer address structure then enter some parts of an address you want to cleanse.
6. Now connect the output to a tMap…
…and create a ‘string’ output in your tMap and concatenate the address fields together. For example:
Address.CompanyName + ” ” + Address.AddressLine1 + ” ” + Address.AddressLine2 + ” ” + Address.City + ” ” + Address.PostalCode
There will be a degree of trial and error involved to find the best fields to use. Google will detect companies’ names which will aid in the verification of an address.
7. Connect the output to a tReplace and configure to replace “null” with empty string “”. This is useful when not all of the address fields from the source are populated.
8. Now connect the output to a tGoogleAddressRow.
9. Configure with the API key and the row that contains the concatenated address data.
10. Connect the output to a tLogRow. Finally, the completed Talend jobs should look similar to this:
Run the job and lo and behold, witness the wonder of the tGoogleAddressValidation component.
The Input given was “datalytyx, holborn, London” and the output was Holborn Gate, 330 High Holborn, London WC1V 7QT, United Kingdom.
As you can see it has done a good job of filling in the missing information. As well as the ‘formatted address’, the API returns the individual address lines that can then be mapped back to address structure specified in the input.
Talend also offers a tLoqateAddressRow component that uses the Loqate address cleansing and validation service. This component and service offers more functionality. You can specify the individual address lines on the input, as opposed to supplying one string. Also it returns a comprehensive ‘accuracy code’ which indicates how likely the address API accurately cleansed the various parts of the address. This service is not free, however. But you can try the web-based demo for free at http://www2.loqate.com/demo/register/new/.
I hope this has helped!
0 Comments