DataLoader is the canonical ingestion path for training and evaluation data.
Every loader returns Vec<Example<S>> directly, so you can pass results into:
evaluate_trainsetoptimizer.compile::<S, _, _>(...)
RawExample -> Example<S> conversion is required.
Core API
DataLoader::load_json::<S>(...)DataLoader::load_csv::<S>(...)DataLoader::load_parquet::<S>(...)DataLoader::load_hf::<S>(...)DataLoader::load_hf_from_parquet::<S>(...)(deterministic/offline helper)
DataLoader::load_json_with::<S, _>(...)DataLoader::load_csv_with::<S, _>(...)DataLoader::load_parquet_with::<S, _>(...)DataLoader::load_hf_with::<S, _>(...)
Default Behavior
TypedLoadOptions::default():
- Ignores unknown source fields.
- Errors on missing required signature fields.
- Uses signature field names directly unless remapped.
Field Remapping
UseTypedLoadOptions.field_map when source column names differ from signature names.
Custom Mapping
Use mapper overloads for fully custom row conversion logic.DataLoadError::Mapper.
Unknown Field Policy
UnknownFieldPolicy controls how extra source fields are handled:
Ignore(default): extra source fields are ignored.Error: extra source fields fail load with row+field information.
Error Model
Typed loader failures include row-level context where relevant:MissingField { row, field }UnknownField { row, field }TypeMismatch { row, field, message }Mapper { row, message }
Io,Csv,Json,Parquet,Hf
Migration Note
Removed raw loader signatures:load_json(path, input_keys, output_keys)load_csv(path, delimiter, has_headers, input_keys, output_keys)load_parquet(path, input_keys, output_keys)load_hf(dataset_name, subset, split, input_keys, output_keys, verbose)save_json(...)save_csv(...)
load_* / load_*_with APIs instead.