uniVocity 1.0.7 released with lots of bug fixes and improvements

We just released the 1.0.7 maintenance release of the uniVocity framework, with various bug fixes and some extra functionalities: Exporting data became very easy with dynamic entities and mapping auto-detection. What the hell is that? Let me demonstrate with an example.

Dumping data from a database

Ever spent time and resources isolating test data from a database to implement a particular test case? Try this:

    JdbcDataStoreConfiguration myDatabase = new JdbcDataStoreConfiguration("MyDatabase", my_javax_sql_DataSource);
    myDatabase.setSchema("dbo");

    CsvDataStoreConfiguration myCsv = new CsvDataStoreConfiguration("MyCsvDirectory");
    CsvEntityConfiguration myCsvDefaults = myCsv.getDefaultEntityConfiguration();

    myCsvDefaults.getFormat().setLineSeparator("\r\n");
    myCsvDefaults.setHeaderWritingEnabled(true);
    myCsvDefaults.setEmptyValue("\"\""); //writes empty strings found in the database as "" in CSV

    //A CSV file will be dynamically generated for each database table in this directory: 
    myCsv.setOutputDirectory(System.getProperty("user.home") + "/csv_export", "UTF-8");

    //Let's create a data integration engine with these configurations
    EngineConfiguration engineConfiguration = new EngineConfiguration("MyDataIntegrationEngine", myDatabase, myCsv);
    Univocity.registerEngine(engineConfiguration);

    DataIntegrationEngine engine = Univocity.getEngine("MyDataIntegrationEngine");

    //Now we map the database to a CSV directory.
    DataStoreMapping mapping = engine.map("MyDatabase", "MyCsvDirectory");

    //Let's delete any file in the directory and insert data to the files
    mapping.configurePersistenceDefaults().notUsingMetadata().deleteAll().insertNewRows();

    //This will create a mapping from each database table to a CSV file with the same name (and column names)
    mapping.autodetectMappings(true);
    
    //And this executes a mapping cycle... all data from the database will be mapped to CSV files as configured above.
    engine.executeCycle(/* You can specify which tables you are interested in mapping. No arguments means "map everything". */); 

Let's choose the data we are interested in

One can write queries to select rows of interest, but it takes time to do that. Suppose you need to extract all data associated to a couple of records in the database. The table you are after is some_table, and you need records whose values in the id column are 10 and 11. You have many other tables in the database with foreign references to some_table, and fortunately (for our example) they have all have the same standard name: some_table_id.

Let's just filter the rows in code and be done with the task. We can add the following lines to the code presented above (before the call to engine.executeCycle()):

    mapping.addInputRowReader(new RowReader() {

            //index of the column we are checking
            private int INDEX;

            @Override
            public void initialize(RowMappingContext context) {
                if (context.getSourceEntity().equalsIgnoreCase("some_table")) {
                    INDEX = context.getInputIndex("id");
                } else /* any other table*/ {
                    INDEX = context.getInputIndex("some_table_id");
                }
            }

            @Override
            public void processRow(Object[] inputRow, Object[] outputRow, RowMappingContext context) {
                /* if the input table is [some_table] or if it has the [some_table_id] column */
                if (INDEX != -1) {
                    Integer id = (Integer) inputRow[INDEX];
                    /* Not 10 nor 11? Burn! */
                    if (!(id == 10 || id == 11)) {
                        context.discardRow();
                    }
                }

                /* Else rows from other tables are fully read and migrated */
            }
        }/* , we could apply this to a list of tables, but let's execute the RowReader against all tables */);

 

Now, the call the engine.executeCycle() to produce CSV files containing only with the data we want.

If you have blobs and binary data things can explode.

Well, we can simply convert these values to null (or do something else):

        engine.addFunction(EngineScope.STATELESS, "binaryToNull", new FunctionCall<Object, byte[]>() {
            @Override
            public Object execute(byte[] input) {
                return null;
            }
        });


        //and then
        mapping.getMapping("images", "images").transformFields("binaryToNull", "before_image", "after_image", "another_image");
        mapping.getMapping("maps", "maps").transformFields("binaryToNull", "map_file");

But we need to load this data into a database (possibly an in-memory database)

This is now extremely easy: we need to convert the original database schema into the schema of the database used for testing:

    String pathToSchemaOutput = System.getProperty("user.home") + "/csv_export/schema.sql";

    engine.exportEntities("MyDatabase")
        .asCreateTableScript(DatabaseDialect.HSQLDB)
        .noGeneratedIds() // the test schema should not have generated ID's as we want to load existing data (and ID's).
        .toFile(pathToSchemaOutput, "UTF-8");

Cool, I have a database schema, and data in a bunch of files. How to load everything?

With a few lines of code! Most of this can be organized in nicely reusable methods, but I chose to put everything a single place to make it easier for you to follow the code:

    // we are using Spring JDBC template here: 
    DataSource dataSource = null;
    try {
        Class.forName("org.hsqldb.jdbcDriver");
        DataSource dataSource = new SingleConnectionDataSource("jdbc:hsqldb:mem:sampledb", "sa", "", true);
        this.jdbcTemplate = new JdbcTemplate(dataSource);

        File schema = new File(System.getProperty("user.home") + "/csv_export/schema.sql");
        String script = FileUtils.readFileToString(schema, "UTF-8");
        this.jdbcTemplate.execute(script);
    } catch (Exception ex) {
        throw new IllegalStateException("Error loading scripts for sample database ", ex);
    }

    JdbcDataStoreConfiguration myTestDatabase = new JdbcDataStoreConfiguration("MyTestDatabase", dataSource);
    myTestDatabase.setSchema("public");

    //You don't usually need to convert Strings to the correct column type as expected by the database. Maybe you do, so here it goes:
    myTestDatabase.getDefaultEntityConfiguration().setParameterConversionEnabled(true);

    CsvDataStoreConfiguration myCsvDir = new CsvDataStoreConfiguration("MyCsvDirectory");

    //Let's read "" as empty Strings instead of NULL. 
    //You can have columns with NOT NULL constraints
    myCsvDir.getDefaultEntityConfiguration().setEmptyValue("");
    myCsvDir.getDefaultEntityConfiguration().getFormat().setLineSeparator("\r\n");

    //Use all files in our CSV directory
    myCsvDir.addEntities(System.getProperty("user.home") + "/csv_export", "UTF-8");

    //The rest is business as usual: 
    Univocity.registerEngine(new EngineConfiguration("engine", myCsvDir, myTestDatabase));
    try {
        DataIntegrationEngine engine = Univocity.getEngine("engine");

        DataStoreMapping mapping = engine.map("MyCsvDirectory", "MyTestDatabase");
        mapping.configurePersistenceDefaults().notUsingMetadata().deleteAll().insertNewRows();
        mapping.autodetectMappings();

        engine.executeCycle();
    } finally {
        Univocity.shutdown("engine");
    }

You're ready to use your test database! We hope you enjoyed this little tutorial.

For more details on these and other features, check the updated uniVocity tutorial, API javadocs and github projects.

Have fun with this latest release! In the meantime, we are working hard to bring uniVocity 1.1.0 to life with great new features and out-of-the-box support for:

  • Excel files, even those with complex layouts that made you spend weeks working with apache-poi to read/write.
  • Database dump file parsing and data extraction. You'll be able to read generic, MySQL and PostgreSQL dump files into any other database (or CSV's, TSV's or whatever else you want).

Don't waste more time writing code to map data from A to B: get our latest release and start coding fast and powerful data integration solutions with ease!

January 07, 2015 by Jeronimo Backes
previous / next