We created a small demonstration screen that uses uniVocity to migrate data provided by the United States Department of Agriculture (USDA). You can find its source code here.

If you go to the USDA website, you will find all different releases of their nutrient database. This information is on the public domain and can be downloaded for free. These files are updated roughly on a yearly basis.

We will show how uniVocity can be used to import this data into a database and update to new releases of the USDA nutrient database. We will also migrate this data to a completely different database schema.

We downloaded the USDA ASCII files containing the SR25 and SR26 databases and created a database schema to store this information. The USDA files are in this zip file.

This is the entity relationship diagram of the source database:
alt text

The destination database looks like this: alt text

Start the application that comes with our release package, under the /demonstration_app folder.

The application does not come with a license file so it will work with batching disabled until you place a license.zip file inside the /demonstration_app folder. The processes will execute much faster with the license. Follow these instructions to request an evaluation license.

The following window will be displayed: alt text

When you start the application, two in-memory databases are created to store the information. On the left side you can query the tables of the source database, and on the right side the tables of the destination database. All tables will be empty.

As a first step, let's insert all the USDA nutrient data into our source database. In the process list on the top, select "skip nutrient data - load/update source database using SR25 files", then click the "execute process" button

alt text

This is the code responsible for mapping the USDA files from our zip file into the source database (also on Github):

    DataIntegrationEngine engine = Univocity.getEngine(getEngineName());
    DataStoreMapping mapping = engine.map("data", "source");
    engine.setMappingSequence("FD_GROUP", "FOOD_DES", "NUTR_DEF", "WEIGHT", "NUT_DATA");


At the end of this first mapping cycle, you finally have some data to play with. We created some queries to display some of this information in a more usable fashion.

alt text

So that was easy, and a competent programmer would be able to implement the same mapping code manually to load a database from these files and get this far in a few hours.

But now, let's map this database to the destination database. Select the process "Skip nutrient data - migrate from source database to destination database".

alt text

The mapping code for this process is (view on Github):

DataIntegrationEngine engine = Univocity.getEngine(getEngineName());
engine.addDatasetProducer(EngineScope.CYCLE, new FoodProcessor()).on("FOOD_DES", "Ndb_no", "Long_Desc");

DataStoreMapping mapping = engine.map("source", "destination");

EntityMapping map = mapping.map("food_names", "food_name");

map = mapping.map("food_state_names", "food_state");

map = mapping.map("food_name_details", "food");
map.reference().using("name").referTo("food_names", "food_name").on("name_id").directly().onMismatch().abort();

map = mapping.map("food_state_details", "state_of_food");
map.identity().associate("food_code", "name", "order").to("food_id", "state_id", "sequence");
map.reference().using("food_code").referTo("food_name_details", "food").on("food_id");
map.reference().using("name").referTo("food_state_names", "food_state").on("state_id");

map = mapping.map("FD_GROUP", "food_group");

map = mapping.map("FOOD_DES", "food");
map.reference().using("NDB_No").referTo("food_name_details", "food").on("id");
map.value().copy("CHO_Factor", "Fat_Factor", "Pro_Factor", "N_Factor")
        .to("carbohydrate_factor", "fat_factor", "protein_factor", "nitrogen_protein_factor");

map = mapping.map("FOOD_DES", "group_of_food");
map.identity().associate("NDB_No", "FdGrp_Cd").to("food_id", "group_id");
map.reference().using("NDB_No").referTo("food_name_details", "food").on("food_id");
map.reference().using("FdGrp_Cd").referTo("FD_GROUP", "food_group").on("group_id");

engine.addFunction(EngineScope.STATELESS, "normalize", new FunctionCall<String, String>() {
    public String execute(String input) {
        return input == null ? null : input.trim().toLowerCase();

map = mapping.map("WEIGHT", "weight");
map.addOutputRowReader(new NoDuplicatesRowReader("description"));

map = mapping.map("WEIGHT", "weight_of_food");
map.identity().associate("NDB_No", "Msre_Desc", "Amount").to("food_id", "weight_id", "amount");
map.reference().using("NDB_No").referTo("food_name_details", "food").on("food_id");
map.reference().using("Msre_Desc").referTo("WEIGHT", "weight").on("weight_id").readingWith("normalize");
map.addOutputRowReader(new NoDuplicatesRowReader("weight_id", "food_id", "amount"));

map = mapping.map("NUTR_DEF", "nutrient");
map.value().copy("NutrDesc", "units", "tagname").to("description", "unit", "acronym");

map = mapping.map("NUT_DATA", "nutrient_of_food");
map.identity().associate("NDB_No", "Nutr_No").to("food_id", "nutrient_id");
map.reference().using("NDB_No").referTo("food_name_details", "food").on("food_id");
map.reference().using("Nutr_No").referTo("NUTR_DEF", "nutrient").on("nutrient_id");


Now have a look at the destination database. All the information from the source has been migrated correctly. The same information, in a wildly different database! Take your time to select the tables and analyze the data there. We created some queries to help you visualize the data in both source and destination. Select "Foods and their weights" on both sides, then search for "butter, salted" in the search field at the bottom. You should see this:

alt text

Now, go ahead and and change some data in the source database. Select the table food_des, select the first row of the table (butter, salted). Now change it to something else, let's say: "bacon, fried", and execute the process again. You should now find "bacon, fried" on the destination using the search field again.

It works like magic!

Using only the mapping code presented before, uniVocity identified your update on the FOOD_DES table and correctly updated FOOD_NAME and STATE_OF_FOOD in the destination database. The weight information is preserved as you can see in the query results. If you look at the logs (printed to the standard output), you will also see that uniVocity executed only the required updates, instead of overriding the other records or deleting and inserting everything again.

alt text

You will notice that if you change any record in the destination database, they won't be overriden (unless they were also modified in the source database - to prevent that, righ click and disable updates to the record). That's because these mappings were defined to generate metadata, which allows uniVocity to "know" when to perform data updates.

Take your time to explore the demonstration app. Delete records, update, create new ones, upgrade the source database from the SR25 to the SR26 version, downgrade it, synchronize everything again. All your data changes are handled by uniVocity, in those two snippets of code displayed before. Nothing more, nothing less. Think about how much time you would need to hand-code an ETL process to do this. Using uniVocity, we created these processes in less than one day.

Things we found out when upgrading from SR25 to SR26

If you open the sr26upd folder in our zip file, you will find update files to apply over the SR25 database. These files are provided by the USDA and we left them there to validate our own update process. When creating this example, we came across an interesting situation: Some of the update files provided by the USDA are incorrect. uniVocity detects a number of rows to insert and update that does not match what is in the update files for WEIGHT and NUT_DATA.

Look at the log: uniVocity performed 1208 insertions and 2566 updates on the WEIGHT table. However, the ADD_WGT file has 1206 rows to insert.

Here are the rows missing in the ADD_WGT file:

[14296, 2, .5, scoop (2 tbsp), 29, null, null]
[19078, 3, .5, oz Hersheys, 14.2, null, null]


And this row is missing in the CHG_WGT file:

[25014, 2, 1, bag, 87, null, null] - changed from SR25: [25014, 2, 1, bag, 87, 12, 2.428]


We don't think this is a critical error on the USDA part as this is not critical information. We doubt someone is going to lose money or even their lives over a missing ounce of Hersheys. But this error illustrates very clearly that keeping data accurate is not that straightforward. It also demonstrates how uniVocity can help you keep your data intact and precise with almost no effort. With uniVocity you can do much, much more than what you just saw.

There are more errors in these updates files and we created this test case (on github) to print out everything.

Think about it for a moment: how long would you need to create your own data synchronization process to keep updating your database consistently every time your client data changes? We created the "load source database process" in a few minutes, without any complication.

How long would you need to create the same process with the tools you currently have?

uniVocity is the only piece of software available in the world that allows you to create complex data mapping operations with almost zero effort.
Download it now, or contact us to create a demonstration project for you, free of charge!