uniVocity-parsers 1.3.0 is here with some useful new features.

We just released another minor version of our text parsing suite, uniVocity-parsers, to introduce some useful features and minor bug fixes.

Our CSV parser has been updated

  • Our CSV parser can handle unescaped quotes inside quoted elements. In other words: some people just want to watch the world burn. Consider this input:
something,"text with ""escaped quotes"" here",something else
something,"text with "unescaped quotes" here",something else

 

The first line contains what any decent CSV parser expects to find when you want to use quotes inside a value: An escape sequence. In the example, "" represents a single quote character. This line is parsed as:

  1. something
  2. text with "escaped quotes" here
  3. something else

Now the second line is obviously non-standard, and we are yet to find a parser that can handle this instead of going belly up. Well, it is controversial, but in the end the client is the boss. So we adjusted our CSV parser to be capable of handling this case. Probably not everyone will agree with us on this one, but by looking at the content there, the intention is obvious: the user wants to use unescaped quote characters, inside a quoted value. We got this requirement from a couple of clients because their clients were providing CSV files produced manually. So here we are, if you are using uniVocity-parsers from version 1.3.0, it will handle this case by default instead of throwing an exception at you, and our CSV parser will read the second line as:

  1. something
  2. text with "unescaped quotes" here
  3. something else

You can disable this capability to get exceptions when parsing such an input by turning off the property parseUnescapedQuotes in the CsvParserSettings class.

CsvParserSettings parserSettings =  new CsvParserSettings();
parserSettings.setParseUnescapedQuotes(false);

Column parsing

We introduced a few RowProcessors that are capable of collecting the values of each column parsed from the input

To avoid problems with memory when processing large inputs, we also introduced the following column processors. These will return the column values processed after a batch of a given number of rows:

Here's an example:

    CsvParserSettings parserSettings = new CsvParserSettings();
    // To get the values of all columns, use a column processor
    ColumnProcessor rowProcessor = new ColumnProcessor();
    parserSettings.setRowProcessor(rowProcessor);

    CsvParser parser = new CsvParser(parserSettings);

    //This will kick in our column processor
    parser.parse(getReader("/examples/example.csv"));

    //Finally, we can get the column values:
    Map<String, List<String>> columnValues = rowProcessor.getColumnValuesAsMapOfNames();

Use your own conversion implementations to parse annotated JavaBeans

This one is easy. If you want to create a custom conversion, simply annotate the fields you need with @Convert. The following example uses the custom conversion class WordsToSetConversion, which gets words from the value parsed for the description field, and adds them to a Set:
    class Car {
    @Parsed
    private Integer year;

    @Convert(conversionClass = WordsToSetConversion.class, args = { ",", "true" })
    @Parsed
    private Set<String> description;

 

More details on our updated tutorial.

Well, that's about it for this new release. We hope you enjoy it!


Download version 1.3.0 here.

Check this and other projects on our github page.

November 24, 2014 by Jeronimo Backes
previous / next