Skip to main content

Text files

The Paillave.Etl.TextFile package parses and produces flat-text files — comma- or semicolon-separated values, tab-delimited files, fixed-width files. Two operators do all the work:

OperatorPurpose
CrossApplyTextFileRead every line of an IFileValue, parse it into a strongly-typed object and push it downstream.
ToTextFileValueBuffer a stream and emit a single text IFileValue (a CSV, TSV, or fixed-width file).

All snippets below are mirrored by tests in src/Paillave.Etl.Tests/DocExamples/TextFileOperatorsExamplesTests.cs. Each section names the matching test for traceability.

Defining the file shape

The shape of every text file is described by a FlatFileDefinition<T>, built with a single lambda that maps each property of the destination type to a column:

var def = FlatFileDefinition.Create(i => new TradeRow
{
Symbol = i.ToColumn("Symbol"),
Quantity = i.ToNumberColumn<int>("Quantity", "."),
Price = i.ToNumberColumn<decimal>("Price", "."),
TradeDate = i.ToDateColumn("TradeDate", "yyyy-MM-dd"),
})
.IsColumnSeparated(';');

The i parameter exposes the IFieldMapper interface. Its main helpers are:

MethodUse
ToColumn(name) / ToColumn(index)String column
ToColumn<T>(name) / ToColumn<T>(index)Generic conversion (int, Guid, bool, …)
ToNumberColumn<T>(name, decimalSep, groupSep?)Numeric column with explicit separators
ToDateColumn(name, format)Date column with a DateTime.ParseExact format
ToCulturedDateColumn(name, cultureName, format?)Date parsed with a specific culture
ToOptionalDateColumn / ToOptional...Same, but produces null on empty input
ToBooleanColumn(name, trueValue, falseValue)true/false from custom tokens
ToColumn(index, size)Fixed-width column — the size in characters
ToSourceName()Inject the file name into a property
ToLineNumber()Inject the data-row index (1-based, skips the header)
ToRowGuid()Inject a freshly generated Guid per row
Ignore<T>()Place-holder for a property that should not be read

FlatFileDefinition<T> itself exposes the file-level switches:

MethodEffect
IsColumnSeparated(sep, textDelimiter?)Default. CSV/TSV with a separator (default ;) and an optional text-delimiter (default ").
HasFixedColumnWidth(int...)Switches to the fixed-width parser. Sizes are inferred from ToColumn(index, size) calls if any.
IgnoreFirstLines(n)Skip n lines before reading the header (for files prefixed with banners or comments).
WithLinePreProcessor(Func<string,string>)Transform every raw line before it is split (e.g. trim BOMs, replace separators).
WithValuePreProcessor("Property", Func<string,string>)Transform a single column's text before conversion.
RespectHeaderCase(bool)By default headers match case-insensitively.
WithCultureInfo(culture) / WithCultureInfo("fr-FR")Default culture for number/date conversions.
WithEncoding(encoding)Encoding used by ToTextFileValue.

Reading a file

Header-based CSV

Test: CrossApplyTextFile_ParsesCsvWithHeader

fileStream
.CrossApplyTextFile("parse trades", FlatFileDefinition.Create(i => new TradeRow
{
Symbol = i.ToColumn("Symbol"),
Quantity = i.ToNumberColumn<int>("Quantity", "."),
Price = i.ToNumberColumn<decimal>("Price", "."),
TradeDate = i.ToDateColumn("TradeDate", "yyyy-MM-dd"),
}).IsColumnSeparated(';'))
.Do("log", t => Console.WriteLine(t.Symbol));

Input:

Symbol;Quantity;Price;TradeDate
AAPL;100;150.25;2024-01-15
MSFT;50;310.50;2024-01-16

Output: two TradeRow instances flowing through the stream.

The mapper is typed: Quantity = i.ToNumberColumn<int>(…) produces an int and any line where the field cannot be parsed throws a FlatFileFieldDeserializeException whose LineNumber and ColumnName properties point right at the bad cell.

Positional CSV (no header)

Test: CrossApplyTextFile_ParsesPositionalCsv

fileStream
.CrossApplyTextFile("parse", FlatFileDefinition.Create(i => new TradeRow
{
Symbol = i.ToColumn(0),
Quantity = i.ToNumberColumn<int>(1, "."),
Price = i.ToNumberColumn<decimal>(2, "."),
}).IsColumnSeparated(','));

Pass column indices (0-based) instead of names. The definition becomes header-less automatically as soon as no ColumnName is provided.

Fixed-width files

Test: CrossApplyTextFile_ParsesFixedWidth

fileStream
.CrossApplyTextFile("parse", FlatFileDefinition.Create(i => new FixedRow
{
Code = i.ToColumn(0, 4), // 4 chars
Label = i.ToColumn(1, 10), // 10 chars
Quantity = i.ToNumberColumn<int>(2, 5, "."), // 5 chars
}));

Adding a size argument to every column is enough — the definition auto-switches to HasFixedColumnWidth(...) with the column sizes you just declared. Strings are returned space-padded; trim them in the selector if needed.

Culture-aware numbers and dates

Test: CrossApplyTextFile_ParsesWithCustomCulture

.IsColumnSeparated(';')
.WithCultureInfo(CultureInfo.GetCultureInfo("fr-FR"));
Price = i.ToNumberColumn<decimal>("Price", ",", " "),

Setting WithCultureInfo("fr-FR") makes 1 234,50 parse as 1234.5m. Per-column overrides via ToCulturedDateColumn(name, "en-US", "MM/dd/yyyy") are also supported.

Stream-level metadata

Test: CrossApplyTextFile_ExposesMetadataColumns

FlatFileDefinition.Create(i => new
{
Symbol = i.ToColumn("Symbol"),
Quantity = i.ToNumberColumn<int>("Quantity", "."),
SourceName = i.ToSourceName(), // "trades.csv"
LineNumber = i.ToLineNumber(), // 1, 2, 3, …
});

ToSourceName() injects IFileValue.Name into a property, ToLineNumber() injects the row index, and ToRowGuid() generates a fresh Guid per row. They are invaluable for traceability when many files merge into a single downstream stream.

Pre-processing dirty input

.WithLinePreProcessor(line => line.Replace('\t', ';'))
.WithValuePreProcessor("Symbol", v => v.Trim().ToUpperInvariant());

WithLinePreProcessor runs once per raw line (after the file is split), WithValuePreProcessor runs once per cell. Combine both to fix a malformed file without writing a custom parser.

Skipping prologue lines

FlatFileDefinition.Create(...)
.IgnoreFirstLines(3); // banner + blank + comment

IgnoreFirstLines(n) drops the first n lines — header detection runs after this step.

Writing a file

ToTextFileValue — emit a CSV

Test: ToTextFileValue_WritesCsvWithHeader

trades
.ToTextFileValue("write trades", "out.csv", FlatFileDefinition.Create(i => new TradeRow
{
Symbol = i.ToColumn("Symbol"),
Quantity = i.ToNumberColumn<int>("Quantity", "."),
Price = i.ToNumberColumn<decimal>("Price", "."),
TradeDate = i.ToDateColumn("TradeDate", "yyyy-MM-dd"),
}).IsColumnSeparated(';'));

Output:

Symbol;Quantity;Price;TradeDate
AAPL;10;150.25;2024-01-15
MSFT;20;310.50;2024-01-16

ToTextFileValue returns an ISingleStream<IFileValue>. Combine with Subscribe connectors (file system, FTP, SFTP, S3, Azure Storage…) to deliver the file to its destination, or call .Get() on the value to read its content.

The Correlated<T> overload preserves stream correlation tokens — useful when partitioning a stream by group and emitting one file per partition.

Custom culture & encoding for output

.IsColumnSeparated(';')
.WithCultureInfo("fr-FR")
.WithEncoding(Encoding.UTF8);

Numbers are formatted using the file definition's culture; the same ToNumberColumn(... separators ...) declaration also drives the output formatter so round-trips are loss-less.

Routing to multiple destinations

The destinations argument of ToTextFileValue accepts a Dictionary<string, IEnumerable<Destination>> describing connector names and folder paths. See Recipes — dealing with files for the full list.

Cheat sheet

IntentSnippet
Read CSV with headerCrossApplyTextFile(name, FlatFileDefinition.Create(...).IsColumnSeparated(';'))
Read TSV.IsColumnSeparated('\t')
Read positional fileuse ToColumn(index) instead of ToColumn("Name")
Read fixed-width fileadd size to every column: ToColumn(idx, size)
Skip prologue lines.IgnoreFirstLines(n)
Custom culture.WithCultureInfo("fr-FR")
Inject metadatai.ToSourceName(), i.ToLineNumber(), i.ToRowGuid()
Sanitize lines / values.WithLinePreProcessor(...), .WithValuePreProcessor("X", ...)
Write CSV.ToTextFileValue(name, "out.csv", FlatFileDefinition.Create(...).IsColumnSeparated(';'))