Pipelines
A fact pipeline is the ordered list of steps in a fact.
Each step:
- chooses an operation with
use - optionally passes
options - receives the output samples of the previous step
A tiny example
[[steps]]
use = "seed.file.text"
options.files = ["observatory.log"]
[[steps]]
use = "text.lines"
[[steps]]
use = "text.find"
options.regex = "comet-[0-9]+"
This pipeline:
- reads a file
- splits it into lines
- extracts comet IDs from each line
Per-sample and batch-style behavior
Most operations conceptually run per sample.
For example:
text.trimtext.replacehtml.findjson.find
Some operations act more like batch filters or producers:
seed.*operations usually create samples from nothingslicereorders or narrows the whole input batchcompactremoves empty samples from the whole input batch
You do not usually need to think about the distinction while authoring facts, but it helps explain the shape of the output.