x Get our new HTML parser and build any scraping project 80% faster.

Welcome to univocity-parsers

univocity-parsers is a collection of extremely fast and reliable Java-based parsers for CSV, TSV and Fixed Width files. It provides a consistent interface for handling different the different formats, and a solid framework for the development of new parsers.

Introduction to univocity-parsers

The project is developed and maintained by Univocity Software, an Australian company that develops custom data integration solutions using univocity, our commercial data integration framework, and the new univocity-html-parser for HTML scraping.

While developing custom data migration services for our clients, involving a variety of text formats, we found that the parsers that currently exist for Java do not provide enough flexibility, throughput and reliability for massive and diverse (and messy) inputs.

Another inconvenience is the difficulty in extending these parsers and dealing with a different beast for each format.

We decided to then build our own architecture for parsing text files from the ground up. The main goal of this architecture is to provide maximum performance and flexibility while making it easy for anyone to create new parsers.

univocity-parsers is currently used by many commercial and open-source projects, including Spark-CSV, Apache Camel and Apache Drill.

Parsers

univocity-parsers currently provides parsers for:

  • CSV files - it’s the fastest and most flexible CSV parser for Java you can find

  • Fixed-width files - it’s the fastest and most flexible Fixed-width (or fixed-length) parser for Java you can find

  • TSV files - it’s the fastest and most flexible TSV parser for Java you can find

Tutorial

This library has MANY features so we split this tutorial in different sections. We suggest you to follow through this main tutorial to learn about the features shared among all parsers and then have a look at the specific sections for:

Input files and methods

All parsers work with an instance of java.io.Reader, java.io.File or java.io.InputStream. You will see calls such as getReader("/examples/example.csv") everywhere. This is just a helper method we use to build the examples (source code here):

    public Reader getReader(String relativePath) {
        ...
        return new InputStreamReader(this.getClass().getResourceAsStream(relativePath), "UTF-8");
        ...
    }

Writers on the other hand, can work instances of java.io.Writer, java.io.File or java.io.OutputStream.

All parsers/writers have the exact same API and the code used to handle any format is almost the same. Differences are restricted to a few configuration options.

In most of the examples, the files example.csv, example.tsv and example.txt file will be used as input for the different parsers provided by univocity-parsers. The information stored in each is exactly the same, only the format differs.

This is the content in example.csv:

# This example was extracted from Wikipedia (en.wikipedia.org/wiki/Comma-separated_values)
#
# 2 double quotes ("") are used as the escape sequence for quoted fields, as per the RFC4180 standard
#

Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00

# Look, a multi line value. And blank rows around it!

1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00
,,"Venture ""Extended Edition""","",4900.00

The following example shows one of the many ways to parse all rows of this CSV file:

CsvParserSettings settings = new CsvParserSettings();
//the file used in the example uses '\n' as the line separator sequence.
//the line separator sequence is defined here to ensure systems such as MacOS and Windows
//are able to process this file correctly (MacOS uses '\r'; and Windows uses '\r\n').
settings.getFormat().setLineSeparator("\n");

// creates a CSV parser
CsvParser parser = new CsvParser(settings);

// parses all rows in one go.
List<String[]> allRows = parser.parseAll(getReader("/examples/example.csv"));

To parse all rows of a TSV, just switch to TsvParserSettings and TsvParser:

TsvParserSettings settings = new TsvParserSettings();
settings.getFormat().setLineSeparator("\n");

// creates a TSV parser
TsvParser parser = new TsvParser(settings);

// parses all rows in one go.
List<String[]> allRows = parser.parseAll(getReader("/examples/example.tsv"));

And to parse all rows of a Fixed-Width file, switch to FixedWidthParserSettings and FixedParser. Note this parser requires the additional configuration object FixedWidthFields to determine the width, alignment and padding of each field to parse:

// creates the sequence of field lengths in the file to be parsed
FixedWidthFields lengths = new FixedWidthFields(4, 5, 40, 40, 8);

// creates the default settings for a fixed width parser
FixedWidthParserSettings settings = new FixedWidthParserSettings(lengths);

//sets the character used for padding unwritten spaces in the file
settings.getFormat().setPadding('_');
settings.getFormat().setLineSeparator("\n");

// creates a fixed-width parser with the given settings
FixedWidthParser parser = new FixedWidthParser(settings);

// parses all rows in one go.
List<String[]> allRows = parser.parseAll(getReader("/examples/example.txt"));

The output of all examples above will be same:

1 [Year, Make, Model, Description, Price]
-----------------------
2 [1997, Ford, E350, ac, abs, moon, 3000.00]
-----------------------
3 [1999, Chevy, Venture "Extended Edition", null, 4900.00]
-----------------------
4 [1996, Jeep, Grand Cherokee, MUST SELL!
air, moon roof, loaded, 4799.00]
-----------------------
5 [1999, Chevy, Venture "Extended Edition, Very Large", null, 5000.00]
-----------------------
6 [null, null, Venture "Extended Edition", null, 4900.00]
-----------------------

You can safely assume that any example provided in the following sections will work with any parser/writer.

Settings and features that are specific to a format are discussed in dedicated, format-specific sections.

So let’s get started!

(very) Basic file parsing

univocity-parsers comes with a basic API for parsing and processing data for all sorts of simpler use cases, which are demonstrated in this section, but the recommended (and faster) approach is Parsing with RowProcessors, a feature that puts univocity-parsers in another level of flexibility and power for handling the most intricate situations with almost no effort.

For now, let’s start with the basics.

Parsing all rows of a file in one go

TsvParserSettings settings = new TsvParserSettings();
settings.getFormat().setLineSeparator("\n");

// creates a TSV parser
TsvParser parser = new TsvParser(settings);

// parses all rows in one go.
List<String[]> allRows = parser.parseAll(getReader("/examples/example.tsv"));

You can also read all rows into Records, which allow you to convert rows to maps, fill existing maps, convert String values into other types such as int, float, Date and many more. Check the Using records section to learn more:

// configure to grab headers from file. We want to use these names to get values from each record.
settings.setHeaderExtractionEnabled(true);
// creates a CSV parser
CsvParser parser = new CsvParser(settings);

// parses all records in one go.
List<Record> allRecords = parser.parseAllRecords(getReader("/examples/example.csv"));
for(Record record : allRecords){
    print("Year: " + record.getValue("year", 2000)); //defaults year to 2000 if value is null.
    print(", Model: " + record.getString("model"));
    println(", Price: " + record.getBigDecimal("price"));
}

The output will be:

Year: 1997, Model: E350, Price: 3000.00
Year: 1999, Model: Venture "Extended Edition", Price: 4900.00
Year: 1996, Model: Grand Cherokee, Price: 4799.00
Year: 1999, Model: Venture "Extended Edition, Very Large", Price: 5000.00
Year: 2000, Model: Venture "Extended Edition", Price: 4900.00

To read all rows of a file (iterator-style)

// creates a CSV parser
CsvParser parser = new CsvParser(settings);

// call beginParsing to read records one by one, iterator-style.
parser.beginParsing(getReader("/examples/example.csv"));

String[] row;
while ((row = parser.parseNext()) != null) {
    println(out, Arrays.toString(row));
}

// The resources are closed automatically when the end of the input is reached,
// or when an error happens, but you can call stopParsing() at any time.

// You only need to use this if you are not parsing the entire content.
// But it doesn't hurt if you call it anyway.
parser.stopParsing();

For convenience, you can also use the parseNextRecord method, which will return an instance of Record instead of a raw String[]:

// call beginParsing to read records one by one, iterator-style.
parser.beginParsing(getReader("/examples/example.csv"));

//among many other things, we can set default values of one ore more columns in the record metadata.
//Let's again set year to 2000 if it comes as null.
parser.getRecordMetadata().setDefaultValueOfColumns(2000, "year");

Record record;
while ((record = parser.parseNextRecord()) != null) {
    print("Year: " + record.getInt("year"));
    print(", Model: " + record.getString("model"));
    println(", Price: " + record.getBigDecimal("price"));
}

Using an actual iterator

// creates a CSV parser
CsvParser parser = new CsvParser(settings);

for(String[] row : parser.iterate(getReader("/examples/example.csv"))){
    println(out, Arrays.toString(row));
}

To iterate Records:

for(Record record : parser.iterateRecords(getReader("/examples/example.csv"))){
    println(out, Arrays.toString(record.getValues()));
}

Parsing individual Strings

If you are getting rows from an external source, and just need to parse each one, you can simply use the parseLine(String) method. The following example parses TSV lines:

// creates a TSV parser
TsvParser parser = new TsvParser(new TsvParserSettings());

String[] line;
line = parser.parseLine("A    B    C");
println(out, Arrays.toString(line));

line = parser.parseLine("1    2    3    4");
println(out, Arrays.toString(line));

Which yields:

[A, B, C]
[1, 2, 3, 4]

Column selection

Parsing the entire content of each record in a file is a waste of CPU and memory when you are not interested in all columns. univocity-parsers lets you choose the columns you need, so values you don’t want are simply bypassed.

The following examples can be found in the example class SettingsExamples:

Consider the example.csv file with:


Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
   
    ...

And the following selection:

// Here we select only the columns "Price", "Year" and "Make".
// The parser just skips the other fields
parserSettings.selectFields("Price", "Year", "Make");

// let's parse with these settings and print the parsed rows.
List<String[]> parsedRows = parseWithSettings(parserSettings);

The output will be:

1 [3000.00, 1997, Ford]
-----------------------
2 [4900.00, 1999, Chevy]
-----------------------
    ...

The same output will be obtained with index-based selection.

// Here we select only the columns by their indexes.
// The parser just skips the values in other columns
parserSettings.selectIndexes(4, 0, 1);

// let's parse with these settings and print the parsed rows.
List<String[]> parsedRows = parseWithSettings(parserSettings);

You can also opt to keep the original row format with all columns, but only the values you are interested in being processed:

// Here we select only the columns "Price", "Year" and "Make".
// The parser just skips the other fields
parserSettings.selectFields("Price", "Year", "Make");

// Column reordering is enabled by default. When you disable it,
// all columns will be produced in the order they are defined in the file.
// Fields that were not selected will be null, as they are not processed by the parser
parserSettings.setColumnReorderingEnabled(false);

// Let's parse with these settings and print the parsed rows.
List<String[]> parsedRows = parseWithSettings(parserSettings);

Now the output will be:

1 [1997, Ford, null, null, 3000.00]
-----------------------
2 [1999, Chevy, null, null, 4900.00]
-----------------------
3 [1996, Jeep, null, null, 4799.00]
    ...

Settings

Each parser has its own settings class, but many configuration options are common across all parsers. The following snippet demonstrates how to use each one of them:

//You can configure the parser to automatically detect what line separator sequence is in the input
parserSettings.setLineSeparatorDetectionEnabled(true);

// sets what is the default value to use when the parsed value is null
parserSettings.setNullValue("<NULL>");

// sets what is the default value to use when the parsed value is empty
parserSettings.setEmptyValue("<EMPTY>"); // for CSV only

// sets the headers of the parsed file. If the headers are set then 'setHeaderExtractionEnabled(true)'
// will make the parser simply ignore the first input row.
parserSettings.setHeaders("a", "b", "c", "d", "e");

// prints the columns in reverse order.
// NOTE: when fields are selected, all rows produced will have the exact same number of columns
parserSettings.selectFields("e", "d", "c", "b", "a");

// does not skip leading whitespaces
parserSettings.setIgnoreLeadingWhitespaces(false);

// does not skip trailing whitespaces
parserSettings.setIgnoreTrailingWhitespaces(false);

// reads a fixed number of records then stop and close any resources
parserSettings.setNumberOfRecordsToRead(9);

// does not skip empty lines
parserSettings.setSkipEmptyLines(false);

// sets the maximum number of characters to read in each column.
// The default is 4096 characters. You need this to avoid OutOfMemoryErrors in case a file
// does not have a valid format. In such cases the parser might just keep reading from the input
// until its end or the memory is exhausted. This sets a limit which avoids unwanted JVM crashes.
parserSettings.setMaxCharsPerColumn(100);

// for the same reasons as above, this sets a hard limit on how many columns an input row can have.
// The default is 512.
parserSettings.setMaxColumns(10);

// Sets the number of characters held by the parser's buffer at any given time.
parserSettings.setInputBufferSize(1000);

// Disables the separate thread that loads the input buffer. By default, the input is going to be loaded incrementally
// on a separate thread if the available processor number is greater than 1. Leave this enabled to get better performance
// when parsing big files (> 100 Mb).
parserSettings.setReadInputOnSeparateThread(false);

// let's parse with these settings and print the parsed rows.
List<String[]> parsedRows = parseWithSettings(parserSettings);

The output of the CSV parser with all these settings will be:

1 [<NULL>, <NULL>, <NULL>, <NULL>, <NULL>]
-----------------------
2 [Price, Description, Model, Make, Year]
-----------------------
3 [3000.00, ac, abs, moon, E350, Ford, 1997]
-----------------------
4 [4900.00, <EMPTY>, Venture "Extended Edition", Chevy, 1999]
-----------------------
5 [<NULL>, <NULL>, <NULL>, <NULL>,    ]
-----------------------
6 [<NULL>, <NULL>, <NULL>, <NULL>,      ]
-----------------------
7 [4799.00, MUST SELL!
air, moon roof, loaded, Grand Cherokee, Jeep, 1996]
-----------------------
8 [5000.00, <NULL>, Venture "Extended Edition, Very Large", Chevy, 1999]
-----------------------
9 [4900.00, <EMPTY>, Venture "Extended Edition", <NULL>, <NULL>]
-----------------------
    ...

Other settings

skipBitsAsWhitespace: flag to configure the parser to consider BIT values 0 and 1 (effectively the '\0' and '\1' characters) as whitespace. Useful for processing database dumps that may export such values instead of the traditional '0' and '1'' characters.

errorContentLength: in case of errors, limits the length of the problematic content that was parsed and is printed in error messages.

Format Settings

All parser settings have a default format definition. The following attributes are set by default for all parsers:

  • lineSeparator (default System.getProperty(“line.separator”);): this is an array of 1 or 2 characters with the sequence that indicates the end of a line. Using this, you should be able to handle files produced by different operating systems. Of course, if you want your line separator to be “#$”, you can.

  • normalizedNewline (default \n): used to represent the sequence of 2 characters used as a line separator (e.g. \r\n in Windows). It is used by our parsers/writers to easily handle portable line separators.

    • When parsing, if the sequence of characters defined in lineSeparator is found while reading from the input, it will be transparently replaced by the normalizedNewline character.

    • When writing, normalizedNewline is replaced by the lineSeparator sequence.
  • comment (default #): if the first character of a line of text matches the comment character, then the row will be considered a comment and discarded from the input.

Format-specific settings and features

Each format has support for (hopefully) everything you will ever need and more. Check the sections dedicated to each one:

Working with CSV

Working with TSV

Working with Fixed-Width

Parsing with RowProcessors

Everything you’ve seen so far is provided as a convenience for simpler situations, but univocity-parsers is built around the concept of RowProcessor and we encourage you to use them if you are after the best possible performance.

The RowProcessor is a fairly simple interface with 3 methods:

public interface RowProcessor {

    void processStarted(ParsingContext context);

    void rowProcessed(String[] row, ParsingContext context);

    void processEnded(ParsingContext context);
}

The settings object of all parsers come with a setProcessor method, which takes your RowProcessor implementation.

When you are ready to parse the input, call parser.parse(), and each row parsed from the input will be sent to your processor’s rowProcessed method (that’s why parser.parse() is void). Before parsing the first row, processStarted will be called to notify you that rows are coming. After parsing the last row, or in the case of an error, all open resources will be closed and the process will stop. Once this completes the processEnded will be called so you can perform any additional housekeeping required. processEnded is guaranteed to run.

All three methods have a ParsingContext object with some controls and information over the parsing process.

The library provides many useful RowProcessor implementations by default and you can always provide your own.

Most implementations of RowProcessor that come with the library by default come in two flavors:

  1. Abstract classes with one abstract method that delegates the result of each processed record to you, e.g. BeanProcessor, and ObjectRowProcessor

  2. Concrete classes with List in the name which indicates the the result of each processed record is added into a list. You can access the elements of this list once the processing has finished. e.g. BeanListProcessor, and ObjectRowListProcessor

Introducing a few basic row processors

The following example uses the RowListProcessor, which just stores the rows read from a file into a List:

// The settings object provides many configuration options
CsvParserSettings parserSettings = new CsvParserSettings();

//You can configure the parser to automatically detect what line separator sequence is in the input
parserSettings.setLineSeparatorDetectionEnabled(true);

// A RowListProcessor stores each parsed row in a List.
RowListProcessor rowProcessor = new RowListProcessor();

// You can configure the parser to use a RowProcessor to process the values of each parsed row.
// You will find more RowProcessors in the 'com.univocity.parsers.common.processor' package, but you can also create your own.
parserSettings.setProcessor(rowProcessor);

// Let's consider the first parsed row as the headers of each column in the file.
parserSettings.setHeaderExtractionEnabled(true);

// creates a parser instance with the given settings
CsvParser parser = new CsvParser(parserSettings);

// the 'parse' method will parse the file and delegate each parsed row to the RowProcessor you defined
parser.parse(getReader("/examples/example.csv"));

// get the parsed records from the RowListProcessor here.
// Note that different implementations of RowProcessor will provide different sets of functionalities.
String[] headers = rowProcessor.getHeaders();
List<String[]> rows = rowProcessor.getRows();

Each row will contain:

[Year, Make, Model, Description, Price]
=======================
1 [1997, Ford, E350, ac, abs, moon, 3000.00]
-----------------------
2 [1999, Chevy, Venture "Extended Edition", null, 4900.00]
-----------------------
3 [1996, Jeep, Grand Cherokee, MUST SELL!
air, moon roof, loaded, 4799.00]
-----------------------
4 [1999, Chevy, Venture "Extended Edition, Very Large", null, 5000.00]
-----------------------
5 [null, null, Venture "Extended Edition", null, 4900.00]
-----------------------

You can also use a ObjectRowProcessor, which will produce rows of objects. You can convert values using an implementation of the Conversion interface.

The Conversions class provides some useful defaults for you. For convenience, the ObjectRowListProcessor can be used to store all rows into a list.

// ObjectRowProcessor converts the parsed values and gives you the resulting row.
ObjectRowProcessor rowProcessor = new ObjectRowProcessor() {
    @Override
    public void rowProcessed(Object[] row, ParsingContext context) {
        //here is the row. Let's just print it.
        println(out, Arrays.toString(row));
    }
};

// converts values in the "Price" column (index 4) to BigDecimal
rowProcessor.convertIndexes(Conversions.toBigDecimal()).set(4);

// converts the values in columns "Make, Model and Description" to lower case, and sets the value "chevy" to null.
rowProcessor.convertFields(Conversions.toLowerCase(), Conversions.toNull("chevy")).set("Make", "Model", "Description");

// converts the values at index 0 (year) to BigInteger. Nulls are converted to BigInteger.ZERO.
rowProcessor.convertFields(new BigIntegerConversion(BigInteger.ZERO, "0")).set("year");

CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.getFormat().setLineSeparator("\n");
parserSettings.setProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);

CsvParser parser = new CsvParser(parserSettings);

//the rowProcessor will be executed here.
parser.parse(getReader("/examples/example.csv"));

After applying the conversions, the output will be:

[1997, ford, e350, ac, abs, moon, 3000.00]
[1999, null, venture "extended edition", null, 4900.00]
[1996, jeep, grand cherokee, must sell!
air, moon roof, loaded, 4799.00]
[1999, null, venture "extended edition, very large", null, 5000.00]
[0, null, venture "extended edition", null, 4900.00]

Using annotations to map your java beans

Use the Parsed annotation to map the property to a field in the CSV file. You can map the property using a field name as declared in the headers, or the column index in the input.

Each annotated operation maps to a Conversion and they are executed in the same sequence they are declared.

This example works with the csv file bean_test.csv

class TestBean {

    // if the value parsed in the quantity column is "?" or "-", it will be replaced by null.
    @NullString(nulls = {"?", "-"})
    // if a value resolves to null, it will be converted to the String "0".
    @Parsed(defaultNullRead = "0")
    private Integer quantity;   // The attribute type defines which conversion will be executed when processing the value.
    // In this case, IntegerConversion will be used.
    // The attribute name will be matched against the column header in the file automatically.

    @Trim
    @LowerCase
    // the value for the comments attribute is in the column at index 4 (0 is the first column, so this means fifth column in the file)
    @Parsed(index = 4)
    private String comments;

    // you can also explicitly give the name of a column in the file.
    @Parsed(field = "amount")
    private BigDecimal amount;

    @Trim
    @LowerCase
    // values "no", "n" and "null" will be converted to false; values "yes" and "y" will be converted to true
    @BooleanString(falseStrings = {"no", "n", "null"}, trueStrings = {"yes", "y"})
    @Parsed
    private Boolean pending;

    //

Instances of annotated classes are created with by BeanProcessor and BeanListProcessor:

// BeanListProcessor converts each parsed row to an instance of a given class, then stores each instance into a list.
BeanListProcessor<TestBean> rowProcessor = new BeanListProcessor<TestBean>(TestBean.class);

CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.getFormat().setLineSeparator("\n");
parserSettings.setProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);

CsvParser parser = new CsvParser(parserSettings);
parser.parse(getReader("/examples/bean_test.csv"));

// The BeanListProcessor provides a list of objects extracted from the input.
List<TestBean> beans = rowProcessor.getBeans();

Here is the output produced by the toString() method of each TestBean instance:

[TestBean [quantity=1, comments=?, amount=555.999, pending=true], TestBean [quantity=0, comments=" something ", amount=null, pending=false]]

The Headers annotation

You can annotate a class with the Headers annotation to control what headers to use when parsing/writing, without having to provide any explicit configuration on the parser/writer settings:

For example, consider the AnotherTestBean class:

@Headers(sequence = {"pending", "date"}, extract = true, write = true)
public class AnotherTestBean {

    @Format(formats = {"dd-MMM-yyyy", "yyyy-MM-dd"}, options = "locale=en")
    @Parsed
    private Date date;

    @BooleanString(falseStrings = {"n"}, trueStrings = {"y"})
    @Parsed
    private Boolean pending;

    //

Let’s write a few instances of AnotherTestBean to an output:

TsvWriterSettings settings = new TsvWriterSettings();

settings.setRowWriterProcessor(new BeanWriterProcessor<AnotherTestBean>(AnotherTestBean.class));

// We didn't provide a java.io.Writer here, so all we can do is write to Strings (streaming)
TsvWriter writer = new TsvWriter(settings);

// Let's write the headers declared in @Headers annotation of AnotherTestBean
String headers = writer.writeHeadersToString();

// Now, let's create an instance of our bean
AnotherTestBean bean = new AnotherTestBean();
bean.setPending(true);
bean.setDate(2012, Calendar.AUGUST, 5);

// Calling processRecordToString will write the contents of the bean in a TSV formatted String
String row1 = writer.processRecordToString(bean);

// You can write whatever you need as well
String row2 = writer.writeRowToString("Random", "Values", "Here");

// Let's change our bean and produce another String
bean.setPending(false);

String row3 = writer.processRecordToString(bean);

This will write the following:

pending    date
y    05-Aug-2012
Random    Values    Here
n    05-Aug-2012

Error handling

All sorts of errors can occur while processing data with a RowProcessor. You can get invalid values, unexpected formats, type errors, etc. In most cases you want to log these errors and continue processing your data. For that you can use a RowProcessorErrorHandler, which is a callback interface that will be used to report errors to you when they occur, with as much detail as possible:

BeanListProcessor<AnotherTestBean> beanProcessor = new BeanListProcessor<AnotherTestBean>(AnotherTestBean.class);
settings.setProcessor(beanProcessor);

//Let's set a RowProcessorErrorHandler to log the error. The parser will keep running.
settings.setProcessorErrorHandler(new RowProcessorErrorHandler() {
    @Override
    public void handleError(DataProcessingException error, Object[] inputRow, ParsingContext context) {
        println(out, "Error processing row: " + Arrays.toString(inputRow));
        println(out, "Error details: column '" + error.getColumnName() + "' (index " + error.getColumnIndex() + ") has value '" + inputRow[error.getColumnIndex()] + "'");
    }
});

CsvParser parser = new CsvParser(settings);
parser.parse(getReader("/examples/bean_test.csv"));

println(out);
println(out, "Printing beans that could be parsed");
println(out);
for (AnotherTestBean bean : beanProcessor.getBeans()) {
    println(out, bean); //should print just one bean here
}

When running this example, you should get the following printed out:

Error processing row: [yEs, 10-oct-2001]
Error details: column 'pending' (index 0) has value 'yEs'

Printing beans that could be parsed

AnotherTestBean [date=10/Oct/2001, pending=false]

Recovering from errors

You can recover from errors using a RetryableErrorHandler:

settings.setProcessorErrorHandler(new RetryableErrorHandler<ParsingContext>() {
    @Override
    public void handleError(DataProcessingException error, Object[] inputRow, ParsingContext context) {
        println(out, "Error processing row: " + Arrays.toString(inputRow));
        println(out, "Error details: column '" + error.getColumnName() + "' (index " + error.getColumnIndex() + ") has value '" + inputRow[error.getColumnIndex()] + "'. Setting it to null");

        if(error.getColumnIndex() == 0){
            setDefaultValue(null);
        } else {
            keepRecord(); //prevents the parser from discarding the row.
        }
    }
});

Use keepRecord to prevent the parser from discarding your row. You can update the inputRow directly and at will then call keepRecord().

The setDefaultValue method assigns a default value to use in the column that could not be processed. Using this method will automatically instruct the parser to retry processing your row, so you don’t need to explicitly invoke keepRecord().

The output now should be:

Error processing row: [yEs, 10-oct-2001]
Error details: column 'pending' (index 0) has value 'yEs'. Setting it to null

Printing beans that could be parsed

AnotherTestBean [date=10/Oct/2001, pending=null]
AnotherTestBean [date=10/Oct/2001, pending=false]

Collecting Comments

If your input files have comments that might be useful for you to control the parsing process, you can configure the parser to collect them:

// This configures the parser to store all comments found in the input.
// You'll be able to retrieve the last parsed comment or all comments parsed at
// any given time during the parsing.
settings.setCommentCollectionEnabled(true);
CsvParser parser = new CsvParser(settings);

parser.beginParsing(getReader("/examples/example.csv"));
String[] row;
while ((row = parser.parseNext()) != null) {
    // using the getContext method we have access to the parsing context, from where the comments found so far can be accessed
    // let's get the last parsed comment and print it out in front of each parsed row.
    String comment = parser.getContext().lastComment();
    if (comment == null || comment.trim().isEmpty()) {
        comment = "No relevant comments yet";
    }
    println("Comment: " + comment + ". Parsed: " + Arrays.toString(row));
}

// We can also get all comments parsed.
println("\nAll comments found:\n-------------------");
//The comments() method returns a map of line numbers associated with the comments found in them.
Map<Long, String> comments = parser.getContext().comments();
for (Entry<Long, String> e : comments.entrySet()) {
    long line = e.getKey();
    String commentAtLine = e.getValue();
    println("Line: " + line + ": '" + commentAtLine + "'");
}

This should print the following to the output:

Comment: 2 double quotes ("") are used as the escape sequence for quoted fields, as per the RFC4180 standard. Parsed: [Year, Make, Model, Description, Price]
Comment: 2 double quotes ("") are used as the escape sequence for quoted fields, as per the RFC4180 standard. Parsed: [1997, Ford, E350, ac, abs, moon, 3000.00]
Comment: 2 double quotes ("") are used as the escape sequence for quoted fields, as per the RFC4180 standard. Parsed: [1999, Chevy, Venture "Extended Edition", null, 4900.00]
Comment: Look, a multi line value. And blank rows around it!. Parsed: [1996, Jeep, Grand Cherokee, MUST SELL!
air, moon roof, loaded, 4799.00]
Comment: Look, a multi line value. And blank rows around it!. Parsed: [1999, Chevy, Venture "Extended Edition, Very Large", null, 5000.00]
Comment: Look, a multi line value. And blank rows around it!. Parsed: [null, null, Venture "Extended Edition", null, 4900.00]

All comments found:
-------------------
Line: 0: 'This example was extracted from Wikipedia (en.wikipedia.org/wiki/Comma-separated_values)'
Line: 2: '2 double quotes ("") are used as the escape sequence for quoted fields, as per the RFC4180 standard'
Line: 9: 'Look, a multi line value. And blank rows around it!'

Routines

To make your life easier, we built a some pre-defined routines for handling common use cases, such as dumping data from a ResultSet, running parse-and-write operations, and more. Check the Routines section to learn more.

Further Reading

Feel free to proceed to the following sections (in any order).

Bugs, contributions & support

If you find a bug, please report it on github or send us an email on parsers@univocity.com.

We try out best to eliminate all bugs as soon as possible and you’ll rarely see a bug open for more than 24 hours after it’s reported. We do our best to answer all questions. Enhancements/suggestions are implemented on a best effort basis.

Fell free to submit your contribution via pull requests. Any little bit is appreciated, from improvements on documentation to a full blown rewrite from scratch.

For commercial support, customizations or anything in between, please contact support@univocity.com.

Thank you for using our parsers!

The univocity team.