uniVocity-parsers 1.4.0 released with even more features!

After a few months without updates, we released another minor version of our text parsing suite, uniVocity-parsers, to introduce some useful features and minor bug fixes.

What's new

Automatic line ending detection

As of version 1.4.0, you should be able to easily process inputs coming from anywhere without having to worry about which line separator sequence is there. Some users were having trouble processing files created by different operating systems. For example, your client may use Linux to create CSV files such as this:

1997,Ford,E350,"ac, abs, moon",3000.00\n


Now suppose another client, using MacOS, created this:

1997,Ford,E350,"ac, abs, moon",3000.00\r


Previously, you'd have to try to identify which line ending was being used to correctly process your input data. Now, all you have to do is to tell your parser to do this work for you:

    // The settings object provides many configuration options
    CsvParserSettings parserSettings = new CsvParserSettings();

    //You can configure the parser to automatically detect what line separator sequence is in the input

    // creates a CSV parser
    CsvParser parser = new CsvParser(parserSettings);

    // parses all rows in one go.
    List<String[]> allRows = parser.parseAll(getReader("/examples/example.csv"));


Concurrent row processing

Now you can wrap any RowProcessor in a ConcurrentRowProcessor to execute your specific processing over the rows parsed from the input in a separate thread. Let's start with the (beaten) example using annotations and a BeanListProcessor to convert rows into a list of Java beans. Here's our CSV input:

date,			amount,		quantity,	pending	,comments
10-oct-2001,	555.999,	1,			yEs		,?
2001-10-10,		,			?,			N		,"  "" something ""  "


Here's our TestBean:

class TestBean {

    // if the value parsed in the quantity column is "?" or "-", it will be replaced by null.
    @NullString(nulls = { "?", "-" })
    // if a value resolves to null, it will be converted to the String "0".
    @Parsed(defaultNullRead = "0")
    private Integer quantity;   // The attribute type defines which conversion will be executed when processing the value.
    // In this case, IntegerConversion will be used.
    // The attribute name will be matched against the column header in the file automatically.

    // the value for the comments attribute is in the column at index 4 (0 is the first column, so this means fifth column in the file)
    @Parsed(index = 4)
    private String comments;

    // you can also explicitly give the name of a column in the file.
    @Parsed(field = "amount")
    private BigDecimal amount;

    // values "no", "n" and "null" will be converted to false; values "yes" and "y" will be converted to true
    @BooleanString(falseStrings = { "no", "n", "null" }, trueStrings = { "yes", "y" })
    private Boolean pending;


We usually create the following code to read that input file and create instances of TestBean:

    // BeanListProcessor converts each parsed row to an instance of a given class, then stores each instance into a list.
    BeanListProcessor<TestBean> rowProcessor = new BeanListProcessor<TestBean>(TestBean.class);

    CsvParserSettings parserSettings = new CsvParserSettings();

    CsvParser parser = new CsvParser(parserSettings);

    // The BeanListProcessor provides a list of objects extracted from the input.
    List<TestBean> beans = rowProcessor.getBeans();


Now, to execute the annotation processing and creation of TestBean instances in a separate thread, simply change this line




    parserSettings.setRowProcessor(new ConcurrentRowProcessor(rowProcessor));


And that's all! Here is the output produced by the toString() method of each TestBean instance:

    TestBean [quantity=1, comments=?, amount=555.999, pending=true], 
    TestBean [quantity=0, comments=" something ", amount=null, pending=false]


A word of advice: Just because you can split the processing of your input using a separate thread, it doesn't mean you should. uniVocity-parsers is highly optimized and processing your data sequentially will still be faster than in parallel in many cases. We recommend you to profile your particular processing scenario before blindly deciding whether to use this feature.

More details on our updated tutorial.

Well, that's about it for this new release. We hope you enjoy it!

Download version 1.4.0 here.

Check this and other projects on our github page.

March 10, 2015 by Jeronimo Backes