Create files
Handling files in ETL.NET involves manipulating streams of payloads implementing the interface IFileValue
. But how to actually create these payloads from nothing if they are not taken from any file source?
One file
One file with no specific format
The first way to create a file, is to create a file with a content with no specific or known format. In this situation, the principle is to create an instance of FileValueWriter
using FileValueWriter.Create
static method. FileValueWriter
implements IFileValue
and that wraps nearly every method of StreamWriter
. Note: all these methods return the current instance so that they can be called in a fluent way.
Before this, the stream must be aggregated for it to issue lists of rows instead of single rows. To change a stream into a single stream event (ISingleStream
) the operator ToList
can be used.
var streamOfFile = streamOfRows
.ToList("aggregate all rows")
.Select("create file", rows => FileValueWriter
.Create("fileExport.txt")
.WriteLine("this content has no specific format")
.Write(String.Join(", ", rows.Select(row => row.Name).ToList())));
One file in CSV or Excel format
In many occasions, writing a file will consist in creating an excel file, or a csv file with fixed width or delimited columns
File creation extensions Paillave.EtlNet.TextFile
or Paillave.EtlNet.ExcelFile
can change a stream into a IFileValue
instance out of the box.
var streamOfFile = streamOfRows
.Select("create row to save", i => new { i.Index, i.Name })
.ToTextFileValue("save into csv file", "fileExport.csv", FlatFileDefinition.Create(i => new
{
Index = i.ToNumberColumn<int>("Idx", "."),
Name = i.ToColumn("Title")
}).IsColumnSeparated(','));
Many files
Many files with no specific format
The way to create several files in a single process is to use the GroupBy
operator.
The first way to use it is only possible if FileValueWriter
is used. This way to use GroupBy
is to simply give the grouping key/keys. This way, the operator will issue one event per group containing the list of values contained in the group.
var streamOfFile = streamOfRows
.GroupBy("group rows", i => i.CategoryId)
// can also be written this way to permit several grouping keys:
// .GroupBy("group rows", i => new { i.CategoryId })
.Select("create file", rows => FileValueWriter
.Create($"otherFileExport{rows.FirstValue.CategoryId}.txt")
.WriteLine($"here is the list of indexes in the category {rows.FirstValue.CategoryId}")
.Write(String.Join(", ", rows.Aggregation.Select(row => row.Name).ToList())));
The other way is to use the GroupBy
operator by giving a subprocess along with the grouping key. The subprocess is the definition of a process from a stream that will issue every event belonging to the group. With the substream, the GroupBy
operator will give the first element of the group as it is very likely to be useful to create the file name. To achieve the same than what is above, it is just necessary to reproduce the pattern described higher within the subprocess:
var streamOfFile = streamOfRows
.GroupBy("process per group", i => i.CategoryId, (subStream, firstRow) => subStream
.ToList("aggregate all rows")
.Select("create file", rows => FileValueWriter
.Create($"fileExport{firstRow?.CategoryId}.txt")
.WriteLine($"here is the list of indexes in the category {firstRow?.CategoryId}")
.Write(String.Join(", ", rows.Select(row => row.Name).ToList()))));
Many files in CSV or Excel format
Defining a subprocess like shown in the example right above is the only way to go to produce several files by using Paillave.EtlNet.TextFile
or Paillave.EtlNet.ExcelFile
extensions:
var streamOfFile = streamOfRows
.GroupBy("process per group", i => i.CategoryId, (subStream, firstRow) => subStream
.Select("create row to save", i => new { i.Index, i.Name })
.ToTextFileValue(
"save into csv file",
$"fileExport{firstRow?.CategoryId}.csv",
FlatFileDefinition.Create(i => new
{
Index = i.ToNumberColumn<int>("Idx", "."),
Name = i.ToColumn("Title")
})));
important
Keep in mind that the given first row will be null when called to evaluate the execution plan by the runtime. Therefore, ensure that no null exception is raised when using it. Of course, during the actual process, it will never be null.