Working with TSV
Parsing TSV files
To parse TSV files, simply use a TsvParser. As we keep saying, the API is essentially same for every parser.
This is the input:
# TSV's can also have comments
# Multi-line records are escaped with \n.
# Accepted escape sequences are: \n, \t, \r and \\
Year Make Model Description Price
1997 Ford E350 ac, abs, moon 3000.00
1999 Chevy Venture "Extended Edition" 4900.00
# Look a multi line value. And blank rows around it!
1996 Jeep Grand Cherokee MUST SELL!\nair, moon roof, loaded 4799.00
1999 Chevy Venture "Extended Edition, Very Large" 5000.00
Venture "Extended Edition" 4900.00
This is the code:
TsvParserSettings settings = new TsvParserSettings();
settings.getFormat().setLineSeparator("\n");
// creates a TSV parser
TsvParser parser = new TsvParser(settings);
// parses all rows in one go.
List<String[]> allRows = parser.parseAll(getReader("/examples/example.tsv"));
The output will be:
1 [Year, Make, Model, Description, Price]
-----------------------
2 [1997, Ford, E350, ac, abs, moon, 3000.00]
-----------------------
3 [1999, Chevy, Venture "Extended Edition", null, 4900.00]
-----------------------
4 [1996, Jeep, Grand Cherokee, MUST SELL!
air, moon roof, loaded, 4799.00]
-----------------------
5 [1999, Chevy, Venture "Extended Edition, Very Large", null, 5000.00]
-----------------------
6 [null, null, Venture "Extended Edition", null, 4900.00]
-----------------------
TSV format
The TSV format lets you set the default escape character for values that contain \n, \r, \t and \.
escapeChar
(default \): value used to escape special characters in TSV.
TSV parser settings
Line joining
By default the TsvParser considers that values that contain the newline character will have the line endings escaped as a literal \ character followed by a n or r character. This way, all data of a single record will be represented in a single - and potentially long - line of text.
However, this is not always the case, and you might want to actually “break” the contents into multiple lines instead, by having the escape character before the line ending. To parse/write files using this method, enable the lineJoiningEnabled
flag:
//Let's write 3 values to a TSV, one of them has a line break.
String []values = new String[]{"Value 1", "Breaking [\n] here", "Value 3"};
TsvWriterSettings writerSettings = new TsvWriterSettings();
writerSettings.getFormat().setLineSeparator("\n");
// In TSV, we can have line separators escaped with a slash before a line break. In this case the current
// line will be joined with the next line.
writerSettings.setLineJoiningEnabled(true);
//Let's write the values and see how the data looks like:
String writtenLine = new TsvWriter(writerSettings).writeRowToString(values);
println("Written data\n------------\n" + writtenLine);
// To parse, we just use the same confiuration:
TsvParserSettings parserSettings = new TsvParserSettings();
parserSettings.setLineJoiningEnabled(true);
parserSettings.getFormat().setLineSeparator("\n");
TsvParser parser = new TsvParser(parserSettings);
//Let's parse the contents we've just written:
values = parser.parseLine(writtenLine);
println("\nParsed elements\n---------------");
println("First: " + values[0]);
println("Second: " + values[1]);
println("Third: " + values[2]);
The parsed result will be:
Written data
------------
Value 1 Breaking [\
] here Value 3
Parsed elements
---------------
First: Value 1
Second: Breaking [
] here
Third: Value 3
Further Reading
Feel free to proceed to the following sections (in any order).
- Introduction to univocity-parsers
- Reading data into java beans
- Writing
- Using records
- Routines
- Other Row Processors
- Working with CSV
- Working with Fixed-Width
Bugs, contributions & support
If you find a bug, please report it on github or send us an email on parsers@univocity.com.
We try out best to eliminate all bugs as soon as possible and you’ll rarely see a bug open for more than 24 hours after it’s reported. We do our best to answer all questions. Enhancements/suggestions are implemented on a best effort basis.
Fell free to submit your contribution via pull requests. Any little bit is appreciated, from improvements on documentation to a full blown rewrite from scratch.
For commercial support, customizations or anything in between, please contact support@univocity.com.
Thank you for using our parsers!
The univocity team.