Skip to content
Martin Danielsson edited this page Nov 17, 2025 · 7 revisions

This page assumes that you have already downloaded and successfully installed NFT. See Installation Guide for more information.

In order to get started with NFT, this page shows a couple of use cases for which NFT is suitable.

If you have suggestions for other use cases which could fit onto this page, please feel free to send me a note.

Splitting a CSV file into two

Introduction

In some cases, it is necessary to split a CSV file into two separate CSV files for data quality assessments, or for filtering out certain cases. NFT supports this use case quite nicely using the <FilterTarget> and <FilterFields> definitions.

Prerequisites

  1. First, create a directory for the transformation XML file, input and output data.
  2. Create subdirectories input and output inside that directory
  3. Copy/paste the sample data into a file called data.csv in the newly created input directory:
CompanyId;CompanyName;City;Revenue
1;Best Things Online Ltd.;Online;-662
2;Nile Co.;Cairo;44567
3;Everything Matters Co.;London;1229
4;But Nobody Cares Inc.;Paris;
5;Company No. 5;New York City;
6;Haufe-Lexware GmbH & Co. KG;Freiburg;230992
7;My Little Pony Inc.;Rainbow Town;66582
8;Sin City Herald;Sin City;-99
9;Other Company;Anywhere;10098
10;Latin Accessories LLC;Rome;5

Transformation Code

Copy/paste this transformation code into a file called split_data.xml in the base directory you created above (so that the input and output files can be reached by the source and target definitions below):

<?xml version="1.0" encoding="utf-8" ?>
<Transformation xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:schemaLocation="http://haufe.com Config.xsd">
  <Logger type="file" level="info">split_log.txt</Logger>

  <Source config="delim=';'">file://input\data.csv</Source>
  <Target config="delim=';'">file://output\good_data.csv</Target>
  <FilterTarget config="delim=';'">file://output\bad_data.csv</FilterTarget>

  <SourceFilters>
    <SourceFilter>FilterLog(Not(IsEmpty($Revenue)), "Revenue is missing")</SourceFilter>
    <SourceFilter>FilterLog(GreaterEqual(Int($Revenue), 0), "Revenue is negative")</SourceFilter>
  </SourceFilters>

  <Fields appendSource="true"/>
  
  <FilterFields appendSource="true">
    <Field name="Reason">FilterLogLastMessage()</Field>
  </FilterFields>
</Transformation>

The transformation takes all records which match the source filter criteria (here: revenue is filled, and is greater or equal to zero) and puts them into the good_data.csv file in the output directory. As to field definitions, all source fields are simply propagated to the target file, using the appendSource attribute of the Fields tag.

All records not matching the source filter criteria are written into the bad_data.csv file, using the FilterFields definition. The fields being output are all the source fields (as with good_data.csv), and additionally the message generated by the FilterLog operator in the source filter definition.

Logging information is written into the split_log.txt file.

Running the Transformation

In this case, the directory created above is C:\Temp\nft_sample. Open a command line window (cmd.exe), make sure that either NoFrillsTransformation.exe is in your path, or that you know where it resides on your machine:

C:\Temp\nft_sample>dir /b
input
output

C:\Temp\nft_sample>NoFrillsTransformation.exe split_data.xml
Operation finished successfully.

C:\Temp\nft_sample>

Results

good_data.csv:

CompanyId;CompanyName;City;Revenue
2;Nile Co.;Cairo;44567
3;Everything Matters Co.;London;1229
6;Haufe-Lexware GmbH & Co. KG;Freiburg;230992
7;My Little Pony Inc.;Rainbow Town;66582
9;Other Company;Anywhere;10098
10;Latin Accessories LLC;Rome;5

bad_data.csv:

CompanyId;CompanyName;City;Revenue;Message
1;Best Things Online Ltd.;Online;-662;Revenue is negative
4;But Nobody Cares Inc.;Paris;;Revenue is missing
5;Company No. 5;New York City;;Revenue is missing
8;Sin City Herald;Sin City;-99;Revenue is negative

In effect, we have split the input CSV file into two categories of data and included a reason why it was filtered out.

Sample Log File

The log file will look as follows:

[2015-07-22 15:52:35] INFO: Read configuration file: split_data.xml
[2015-07-22 15:52:35] INFO: Initialized operators.
[2015-07-22 15:52:35] INFO: Initialized Lookup maps.
[2015-07-22 15:52:35] INFO: Available operators: add, and, concat, ...
[2015-07-22 15:52:35] INFO: Initialized Filters.
[2015-07-22 15:52:35] INFO: Initialized field mappings, found 0 output fields.
[2015-07-22 15:52:35] INFO: Initialized filter field mappings, found 1 output fields.
[2015-07-22 15:52:35] INFO: CsvReaderFactory: Attempting to create a CsvReaderPlugin.
[2015-07-22 15:52:35] INFO: CsvReaderFactory: Successfully created a CsvReaderPlugin.
[2015-07-22 15:52:35] INFO: Initialized field mappings, found 4 output fields.
[2015-07-22 15:52:35] INFO: Initialized filter field mappings, found 5 output fields.
[2015-07-22 15:52:35] INFO: CsvWriterFactory: Creating a CsvWriterPlugin.
[2015-07-22 15:52:36] INFO: CsvWriterFactory: Creating a CsvWriterPlugin.
[2015-07-22 15:52:36] INFO: Started processing.
[2015-07-22 15:52:36] INFO: Reading Source #1
[2015-07-22 15:52:36] INFO: Revenue is negative
[2015-07-22 15:52:36] INFO: Revenue is missing
[2015-07-22 15:52:36] INFO: Revenue is missing
[2015-07-22 15:52:36] INFO: Revenue is negative
[2015-07-22 15:52:36] INFO: Finishing Write Process.
[2015-07-22 15:52:36] INFO: Finished processing.
[2015-07-22 15:52:36] INFO: Total source records read: 10
[2015-07-22 15:52:36] INFO: Total target records written: 6
[2015-07-22 15:52:36] INFO: Records filtered out: 4
[2015-07-22 15:52:36] INFO: Operation finished successfully.

The logging is very verbose in this case. Setting the Logger's level attribute to warning will eliminate all INFO messages; in that case, the log file would be empty, as all logs here are just for informational purposes.

Please note the info messages coming from the FilterLog Operator: Revenue is missing and Revenue is negative. In case you don't use the FilterTarget tag, adding additional information here can be useful.

Extracting Data from Salesforce

Prerequisites

In order to get this sample up and running, you will need:

  • Access to a Salesforce Sandbox, containing some Account data
  • The Salesforce Data Loader must be installed on the local machine
  • A valid Salesforce Configuration File, including a user name and encrypted password

The configuration file is assumed to be named sfdc_config.xml and reside inside the same directory as the transformation XML file.

See Salesforce Configuration File for an extensive documentation on how this works in detail.

You may use the following XML template to get started:

<?xml version="1.0" encoding="utf-8"?>
<SfdcConfig>
  <DataLoaderDir>C:\Program Files (x86)\salesforce.com\Data Loader</DataLoaderDir>
  <LogFileDir>.\</LogFileDir>
  <SuccessFileName>.\success_log.csv</SuccessFileName>
  <ErrorFileName>.\error_log.csv</ErrorFileName>
  <SfdcUsername>my.user.name@domain.com</SfdcUsername>
  <SfdcEncryptedPassword>ab3aa7674884bcf3...</SfdcEncryptedPassword>
  <SfdcEndPoint>https://test.salesforce.com</SfdcEndPoint>
  <KeepTempFiles>false</KeepTempFiles>
  <FailOnErrors>true</FailOnErrors>
</SfdcConfig>

Depending on whether you access a prod Org, you will have to pass https://login.salesforce.com or https://test.salesforce.com (for sandboxes) as the SfdcEndPoint. A description on how to obtain the encrypted password can be found in the Salesforce Configuration File page.

Transformation Code

Copy/paste this into a file called sfdc_extract.xml:

<?xml version="1.0" encoding="utf-8" ?>
<Transformation xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:schemaLocation="http://haufe.com Config.xsd">
  <Logger type="file" level="info">sfdc_extract.txt</Logger>

  <Source config="sfdc_config.xml">soql://select Id, Name, Phone, BillingStreet, BillingPostalCode, BillingCity, OwnerId from Account</Source>
  <Target config="delim=';'">file://sfdc_accounts.csv</Target>

  <Fields appendSource="true"/>
</Transformation>

This transformation will extract all Account records from the Salesforce Org/Sandbox and write them into a CSV file.

Caution: If you have a lot of data, this can take a long time.

Loading Data into Salesforce

Prerequisites

Transformation Code

Extracting Data from an SQL Server

Prerequisites

  • SQL Server Connection String stored in a file, e.g. called connectionstring.txt
  • As an alternative, provide the connection string directly in the Source tag's config attribute

Transformation Code

Extracting data from an SQL Server is very straightforward:

<Transformation>
  <Source config="@connectionstring.txt">sqlserver://SELECT * FROM Contacts</Source>
  <Target config="delim=';' sort=LastName,FirstName">file://contacts.csv</Target>

  <Fields appendSource="true" />
</Transformation>

This will dump the content of the Contacts table into a CSV file called contacts.csv.

See also: SQL Server Reader

Clone this wiki locally