x Get our new HTML parser and build any scraping project 80% faster.

Routines

univocity-parsers 2.0.0 introduced pre-built routines - shortcuts we created to save time and effort - for common use cases such as dumping ResultSets to a given output format, iterating over java beans, and others.

This section demonstrates how to use all available routines.

Iterating over bean instances

Use the following routine to quickly and easily process big inputs where all records are to be converted to an instance of your annotated class:

// Let's configure our input format using the parser settings, as usual
// This configuration will be used as the base configuration for our routine.
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.getFormat().setLineSeparator("\n");

// Here we create an instance of our routines object.
CsvRoutines routines = new CsvRoutines(parserSettings); // Can also use TSV and Fixed-width routines

// the iterate() method receives our annotated class and an input to parse, and return
// an Iterator for objects of this class.

// internally, it will create a special instance of BeanRowProcessor
// to handle the conversion of each record to a TestBean
for (TestBean bean : routines.iterate(TestBean.class, getReader("/examples/bean_test.csv"))) {
    println(bean); //let's print it out.
}

This produces:

TestBean [quantity=1, comments=?, amount=555.999, pending=true]
TestBean [quantity=0, comments=" something ", amount=null, pending=false]

Parsing a list of beans and dump them into another file

Use the following routine to quickly and easily process big inputs where all records are to be converted to an instance of your annotated class:

// This time we're going to parse a list of beans at once and write them to an output.
// First we configure the input format
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.getFormat().setLineSeparator("\n");

// Then the output format
CsvWriterSettings writerSettings = new CsvWriterSettings();
writerSettings.getFormat().setLineSeparator("\r\n");
writerSettings.getFormat().setDelimiter(';');
writerSettings.setQuoteAllFields(true);

// Let's create a new routines object with the parser and writer configuration.
CsvRoutines routines = new CsvRoutines(parserSettings, writerSettings); // Can also use TSV and Fixed-width routines

// The parseAll routine allows us to get all beans using a single line of code.
List<TestBean> allBeans = routines.parseAll(TestBean.class, getReader("/examples/bean_test.csv"));

// For convenience, we will write to a String:
StringWriter output = new StringWriter();

// Now, let's write all beans to the output using the writeAll routine:
// Note that it takes an Iterable as the input. You could use routines.iterate(),
// as shown in the previous example, to avoid loading all objects in memory.
routines.writeAll(allBeans, TestBean.class, output);

// And here's the result
println(output.toString());

The output will be:

"1";"555.999";"yes";"";"?"
"0";"";"no";"";""" something """

Efficient parse-then-write process for large inputs

A very common use case is parsing a huge input, transforming the values parsed from it, and then writing everything into another output, possibly using another format. The parseAndWrite routine streams each record parsed from a given input to an output of your choice, without loading everything into memory before writing:

// The Csv class contains a few static methods that provide pre-defined configurations for CSV parsers/writers
// Here we will read a csv and write its data so it is compatible with the RFC-4180 standard.
CsvRoutines routines = new CsvRoutines(new CsvParserSettings(), Csv.writeRfc4180());

// let's parse only the model and year columns (at positions 2 and 0 respectively)
routines.getParserSettings().selectIndexes(2, 0);
routines.getParserSettings().getFormat().setLineSeparator("\n");

Reader input = getReader("/examples/example.csv");
Writer output = new StringWriter();

// using the parseAndWrite method, all rows from the input are streamed to the output efficiently.
routines.parseAndWrite(input, output);

// here's the result
print(output);

Which yields:

Model,Year
E350,1997
"Venture ""Extended Edition""",1999
Grand Cherokee,1996
"Venture ""Extended Edition, Very Large""",1999
"Venture ""Extended Edition""",

Dumping a ResultSet

Another useful routine is the write method that accepts a ResultSet. All contents of the given ResultSet will be written into a format of your choice. If you enable writing of headers in the settings object, the ResultSetMetadata will be used to obtain the column names. In case you need to write to a fixed-width format, the column length provided by your ResultSetMetadata will be used.

The following example uses an in-memory database we populate with the following code:

String createTable = "CREATE TABLE users(" +
        "    id INTEGER IDENTITY PRIMARY KEY," +
        "    name VARCHAR(50) not null," +
        "    email VARCHAR(50) not null" +
        ")";

Class.forName("org.hsqldb.jdbcDriver");
Statement statement = connectToDatabase();

statement.execute(createTable);
statement.executeUpdate("INSERT INTO users (name, email) VALUES ('Tomonobu Itagaki', 'dead@live.com')");
statement.executeUpdate("INSERT INTO users (name, email) VALUES ('Caine Hill', 'chill@company.com')");
statement.executeUpdate("INSERT INTO users (name, email) VALUES ('You Sir', 'user@email.com')");

Now we can run a query and dump the result:

ResultSet resultSet = statement.executeQuery("SELECT * FROM users");

// To dump the data of our ResultSet, we configure the output format:
TsvWriterSettings writerSettings = new TsvWriterSettings();
writerSettings.getFormat().setLineSeparator("\n");
writerSettings.setHeaderWritingEnabled(true); // we want the column names to be printed out as well.

// Then create a routines object:
TsvRoutines routines = new TsvRoutines(writerSettings);

// The write() method takes care of everything. Both resultSet and output are closed by the routine.
routines.write(resultSet, output);

The output will be:

ID    NAME    EMAIL
0    Tomonobu Itagaki    dead@live.com
1    Caine Hill    chill@company.com
2    You Sir    user@email.com

Further Reading

Feel free to proceed to the following sections (in any order).

Bugs, contributions & support

If you find a bug, please report it on github or send us an email on parsers@univocity.com.

We try out best to eliminate all bugs as soon as possible and you’ll rarely see a bug open for more than 24 hours after it’s reported. We do our best to answer all questions. Enhancements/suggestions are implemented on a best effort basis.

Fell free to submit your contribution via pull requests. Any little bit is appreciated, from improvements on documentation to a full blown rewrite from scratch.

For commercial support, customizations or anything in between, please contact support@univocity.com.

Thank you for using our parsers!

The univocity team.