The map is created from a small historic dataset held at the BRC.
This means it's usually counter-productive to use them for small datasets.
The algorithm also becomes faster, because regression trees have to be fit to smaller datasets at each iteration.
Theoretically you should obtain somewhat higher variance from the smaller datasets used for estimation, but the expectation of the coefficient values should be the same.
Only a few data blocks are mandatory, and unneeded blocks can be simply left out, resulting in a smaller dataset.
If the dataset is very large, sampling without replacement gives approximately the same result, but in small datasets the difference can be substantial.
Array-based strings have smaller overhead, so (for example) concatenation and split operations are faster on small datasets.
A smaller public dataset containing the most widely-used metadata for each file in the collection that is published and updated quarterly.
A quick illustration of such normalizing on a very small dataset:
This may be sufficient for small datasets.