bids2table

CI Docs codecov Ruff Python3 License

Index BIDS datasets fast, locally or in the cloud.

Installation

To install the latest release from pypi, you can run

pip install bids2table

To install with S3 support, include the s3 extra

pip install bids2table[s3]

The latest development version can be installed with

pip install "bids2table[s3] @ git+https://github.com/childmindresearch/bids2table.git"

Usage

To run these examples, you will need to clone the bids-examples repo.

git clone -b 1.9.0 https://github.com/bids-standard/bids-examples.git

Finding BIDS datasets

You can search a directory for valid BIDS datasets using b2t2 find

(bids2table) clane$ b2t2 find bids-examples | head -n 10
bids-examples/asl002
bids-examples/ds002
bids-examples/ds005
bids-examples/asl005
bids-examples/ds051
bids-examples/eeg_rishikesh
bids-examples/asl004
bids-examples/asl003
bids-examples/ds003
bids-examples/eeg_cbm

Indexing datasets from the command line

Indexing datasets is done with b2t2 index. Here we index a single example dataset, saving the output as a parquet file.

(bids2table) clane$ b2t2 index -o ds102.parquet bids-examples/ds102
ds102: 100%|███████████████████████████████████████| 26/26 [00:00<00:00, 154.12it/s, sub=26, N=130]

You can also index a list of datasets. Note that each iteration in the progress bar represents one dataset.

(bids2table) clane$ b2t2 index -o bids-examples.parquet bids-examples/*
100%|████████████████████████████████████████████| 87/87 [00:00<00:00, 113.59it/s, ds=None, N=9727]

You can pipe the output of b2t2 find to b2t2 index to create an index of all datasets under a root directory.

(bids2table) clane$ b2t2 find bids-examples | b2t2 index -o bids-examples.parquet
97it [00:01, 96.05it/s, ds=ieeg_filtered_speech, N=10K]

The resulting index will include both top-level datasets (as in the previous command) as well nested derivatives datasets.

Indexing datasets hosted on S3

bids2table supports indexing datasets hosted on S3 via cloudpathlib. To use this functionality, make sure to install bids2table with the s3 extra. Or you can also just install cloudpathlib directly

pip install cloudpathlib[s3]

As an example, here we index all datasets on OpenNeuro

(bids2table) clane$ b2t2 index -o openneuro.parquet \
  -j 8 --use-threads s3://openneuro.org/ds*
100%|█████████████████████████████████████| 1408/1408 [12:25<00:00,  1.89it/s, ds=ds006193, N=1.2M]

Using 8 threads, we can index all ~1400 OpenNeuro datasets (1.2M files) in less than 15 minutes.

Indexing datasets from python

You can also index datasets using the Python API.

import bids2table as b2t2
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

# Index a single dataset.
tab = b2t2.index_dataset("bids-examples/ds102")

# Find and index a batch of datasets.
tabs = b2t2.batch_index_dataset(
    b2t2.find_bids_datasets("bids-examples"),
)
tab = pa.concat_tables(tabs)

# Index a dataset on S3.
tab = b2t2.index_dataset("s3://openneuro.org/ds000224")

# Save as parquet.
pq.write_table(tab, "ds000224.parquet")

# Convert to a pandas dataframe.
df = tab.to_pandas(types_mapper=pd.ArrowDtype)
  1# ruff: noqa: I001
  2r"""
  3[![CI](https://github.com/childmindresearch/bids2table/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/childmindresearch/bids2table/actions/workflows/ci.yaml?query=branch%3Amain)
  4[![Docs](https://github.com/childmindresearch/bids2table/actions/workflows/docs.yaml/badge.svg?branch=main)](https://childmindresearch.github.io/bids2table/bids2table)
  5[![codecov](https://codecov.io/gh/childmindresearch/bids2table/branch/main/graph/badge.svg?token=22HWWFWPW5)](https://codecov.io/gh/childmindresearch/bids2table)
  6[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
  7![Python3](https://img.shields.io/badge/python->=3.11-blue.svg)
  8[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
  9
 10Index [BIDS](https://bids-specification.readthedocs.io/en/stable/) datasets fast, locally or in the cloud.
 11
 12## Installation
 13
 14To install the latest release from pypi, you can run
 15
 16```sh
 17pip install bids2table
 18```
 19
 20To install with S3 support, include the `s3` extra
 21
 22```sh
 23pip install bids2table[s3]
 24```
 25
 26The latest development version can be installed with
 27
 28```sh
 29pip install "bids2table[s3] @ git+https://github.com/childmindresearch/bids2table.git"
 30```
 31
 32## Usage
 33
 34To run these examples, you will need to clone the [bids-examples](https://github.com/bids-standard/bids-examples) repo.
 35
 36```sh
 37git clone -b 1.9.0 https://github.com/bids-standard/bids-examples.git
 38```
 39
 40### Finding BIDS datasets
 41
 42You can search a directory for valid BIDS datasets using `b2t2 find`
 43
 44```
 45(bids2table) clane$ b2t2 find bids-examples | head -n 10
 46bids-examples/asl002
 47bids-examples/ds002
 48bids-examples/ds005
 49bids-examples/asl005
 50bids-examples/ds051
 51bids-examples/eeg_rishikesh
 52bids-examples/asl004
 53bids-examples/asl003
 54bids-examples/ds003
 55bids-examples/eeg_cbm
 56```
 57
 58### Indexing datasets from the command line
 59
 60Indexing datasets is done with `b2t2 index`. Here we index a single example dataset, saving the output as a parquet file.
 61
 62```
 63(bids2table) clane$ b2t2 index -o ds102.parquet bids-examples/ds102
 64ds102: 100%|███████████████████████████████████████| 26/26 [00:00<00:00, 154.12it/s, sub=26, N=130]
 65```
 66
 67You can also index a list of datasets. Note that each iteration in the progress bar represents one dataset.
 68
 69```
 70(bids2table) clane$ b2t2 index -o bids-examples.parquet bids-examples/*
 71100%|████████████████████████████████████████████| 87/87 [00:00<00:00, 113.59it/s, ds=None, N=9727]
 72```
 73
 74You can pipe the output of `b2t2 find` to `b2t2 index` to create an index of all datasets under a root directory.
 75
 76```
 77(bids2table) clane$ b2t2 find bids-examples | b2t2 index -o bids-examples.parquet
 7897it [00:01, 96.05it/s, ds=ieeg_filtered_speech, N=10K]
 79```
 80
 81The resulting index will include both top-level datasets (as in the previous command) as well nested derivatives datasets.
 82
 83### Indexing datasets hosted on S3
 84
 85bids2table supports indexing datasets hosted on S3 via [cloudpathlib](https://github.com/drivendataorg/cloudpathlib). To use this functionality, make sure to install bids2table with the `s3` extra. Or you can also just install cloudpathlib directly
 86
 87```sh
 88pip install cloudpathlib[s3]
 89```
 90
 91As an example, here we index all datasets on [OpenNeuro](https://openneuro.org/)
 92
 93```
 94(bids2table) clane$ b2t2 index -o openneuro.parquet \
 95  -j 8 --use-threads s3://openneuro.org/ds*
 96100%|█████████████████████████████████████| 1408/1408 [12:25<00:00,  1.89it/s, ds=ds006193, N=1.2M]
 97```
 98
 99Using 8 threads, we can index all ~1400 OpenNeuro datasets (1.2M files) in less than 15 minutes.
100
101
102### Indexing datasets from python
103
104You can also index datasets using the Python API.
105
106```python
107import bids2table as b2t2
108import pandas as pd
109import pyarrow as pa
110import pyarrow.parquet as pq
111
112# Index a single dataset.
113tab = b2t2.index_dataset("bids-examples/ds102")
114
115# Find and index a batch of datasets.
116tabs = b2t2.batch_index_dataset(
117    b2t2.find_bids_datasets("bids-examples"),
118)
119tab = pa.concat_tables(tabs)
120
121# Index a dataset on S3.
122tab = b2t2.index_dataset("s3://openneuro.org/ds000224")
123
124# Save as parquet.
125pq.write_table(tab, "ds000224.parquet")
126
127# Convert to a pandas dataframe.
128df = tab.to_pandas(types_mapper=pd.ArrowDtype)
129```
130"""
131
132__all__ = [
133    "index_dataset",
134    "batch_index_dataset",
135    "find_bids_datasets",
136    "get_arrow_schema",
137    "get_column_names",
138    "parse_bids_entities",
139    "validate_bids_entities",
140    "set_bids_schema",
141    "get_bids_schema",
142    "get_bids_entity_arrow_schema",
143    "format_bids_path",
144    "load_bids_metadata",
145    "cloudpathlib_is_available",
146]
147
148from ._indexing import (
149    index_dataset,
150    batch_index_dataset,
151    find_bids_datasets,
152    get_arrow_schema,
153    get_column_names,
154)
155from ._entities import (
156    parse_bids_entities,
157    validate_bids_entities,
158    set_bids_schema,
159    get_bids_schema,
160    get_bids_entity_arrow_schema,
161    format_bids_path,
162)
163from ._metadata import load_bids_metadata
164from ._pathlib import cloudpathlib_is_available
165from ._version import *
def index_dataset( root: str | pathlib.Path | cloudpathlib.cloudpath.CloudPath, include_subjects: str | list[str] | None = None, max_workers: int | None = 0, chunksize: int = 32, executor_cls: type[concurrent.futures._base.Executor] = <class 'concurrent.futures.process.ProcessPoolExecutor'>, show_progress: bool = False) -> pyarrow.lib.Table:
197def index_dataset(
198    root: str | PathT,
199    include_subjects: str | list[str] | None = None,
200    max_workers: int | None = 0,
201    chunksize: int = 32,
202    executor_cls: type[Executor] = ProcessPoolExecutor,
203    show_progress: bool = False,
204) -> pa.Table:
205    """Index a BIDS dataset.
206
207    Args:
208        root: BIDS dataset root directory.
209        include_subjects: Glob pattern or list of patterns for matching subjects to
210            include in the index.
211        max_workers: Number of indexing processes to run in parallel. Setting
212            `max_workers=0` (the default) uses the main process only. Setting
213            `max_workers=None` starts as many workers as there are available CPUs. See
214            `concurrent.futures.ProcessPoolExecutor` for details.
215        chunksize: Number of subjects per process task. Only used for
216            `ProcessPoolExecutor` when `max_workers > 0`.
217        executor_cls: Executor class to use for parallel indexing.
218        show_progress: Show progress bar.
219
220    Returns:
221        An Arrow table index of the BIDS dataset.
222    """
223    root = as_path(root)
224
225    schema = get_arrow_schema()
226
227    dataset, _ = _get_bids_dataset(root)
228    if dataset is None:
229        _logger.warning(f"Path {root} is not a valid BIDS dataset directory.")
230        return pa.Table.from_pylist([], schema=schema)
231
232    subject_dirs = _find_bids_subject_dirs(root, include_subjects)
233    subject_dirs = sorted(subject_dirs, key=lambda p: p.name)
234    if len(subject_dirs) == 0:
235        _logger.warning(f"Path {root} contains no matching subject dirs.")
236        return pa.Table.from_pylist([], schema=schema)
237
238    func = partial(_index_bids_subject_dir, schema=schema, dataset=dataset)
239
240    tables = []
241    file_count = 0
242    for sub, table in (
243        pbar := tqdm(
244            _pmap(func, subject_dirs, max_workers, chunksize, executor_cls),
245            desc=dataset,
246            total=len(subject_dirs),
247            disable=not show_progress,
248        )
249    ):
250        file_count += len(table)
251        pbar.set_postfix(dict(sub=sub, N=_hfmt(file_count)), refresh=False)
252        tables.append(table)
253
254    # NOTE: concat_tables produces a table where each column is a ChunkedArray, with one
255    # chunk per original subject table. Is it better to keep the original chunks (one
256    # per subject) or merge using `combine_chunks`?
257    table = pa.concat_tables(tables).combine_chunks()
258    return table

Index a BIDS dataset.

Arguments:
  • root: BIDS dataset root directory.
  • include_subjects: Glob pattern or list of patterns for matching subjects to include in the index.
  • max_workers: Number of indexing processes to run in parallel. Setting max_workers=0 (the default) uses the main process only. Setting max_workers=None starts as many workers as there are available CPUs. See concurrent.futures.ProcessPoolExecutor for details.
  • chunksize: Number of subjects per process task. Only used for ProcessPoolExecutor when max_workers > 0.
  • executor_cls: Executor class to use for parallel indexing.
  • show_progress: Show progress bar.
Returns:

An Arrow table index of the BIDS dataset.

def batch_index_dataset( roots: list[str | pathlib.Path | cloudpathlib.cloudpath.CloudPath], max_workers: int | None = 0, executor_cls: type[concurrent.futures._base.Executor] = <class 'concurrent.futures.process.ProcessPoolExecutor'>, show_progress: bool = False) -> Generator[pyarrow.lib.Table, NoneType, NoneType]:
261def batch_index_dataset(
262    roots: list[str | PathT],
263    max_workers: int | None = 0,
264    executor_cls: type[Executor] = ProcessPoolExecutor,
265    show_progress: bool = False,
266) -> Generator[pa.Table, None, None]:
267    """Index a batch of BIDS datasets.
268
269    Args:
270        roots: List of BIDS dataset root directories.
271        max_workers: Number of indexing processes to run in parallel. Setting
272            `max_workers=0` (the default) uses the main process only. Setting
273            `max_workers=None` starts as many workers as there are available CPUs. See
274            `concurrent.futures.ProcessPoolExecutor` for details.
275        executor_cls: Executor class to use for parallel indexing.
276        show_progress: Show progress bar.
277
278    Yields:
279        An Arrow table index for each BIDS dataset.
280    """
281    file_count = 0
282    for dataset, table in (
283        pbar := tqdm(
284            _pmap(_batch_index_func, roots, max_workers, executor_cls=executor_cls),
285            total=len(roots) if isinstance(roots, Sequence) else None,
286            disable=show_progress not in {True, "dataset"},
287        )
288    ):
289        file_count += len(table)
290        pbar.set_postfix(dict(ds=dataset, N=_hfmt(file_count)), refresh=False)
291        yield table

Index a batch of BIDS datasets.

Arguments:
  • roots: List of BIDS dataset root directories.
  • max_workers: Number of indexing processes to run in parallel. Setting max_workers=0 (the default) uses the main process only. Setting max_workers=None starts as many workers as there are available CPUs. See concurrent.futures.ProcessPoolExecutor for details.
  • executor_cls: Executor class to use for parallel indexing.
  • show_progress: Show progress bar.
Yields:

An Arrow table index for each BIDS dataset.

def find_bids_datasets( root: str | pathlib.Path | cloudpathlib.cloudpath.CloudPath, exclude: str | list[str] | None = None, maxdepth: int | None = None) -> Generator[pathlib.Path | cloudpathlib.cloudpath.CloudPath, NoneType, NoneType]:
132def find_bids_datasets(
133    root: str | PathT,
134    exclude: str | list[str] | None = None,
135    maxdepth: int | None = None,
136) -> Generator[PathT, None, None]:
137    """Find all BIDS datasets under a root directory.
138
139    Args:
140        root: Root path to begin search.
141        exclude: Glob pattern or list of patterns matching sub-directory names to
142            exclude from the search.
143        maxdepth: Maximum depth to search.
144
145    Yields:
146        Root paths of all BIDS datasets under `root`.
147    """
148    root = as_path(root)
149
150    if isinstance(exclude, str):
151        exclude = [exclude]
152    elif exclude is None:
153        exclude = []
154    exclude = [re.compile(fnmatch.translate(pat)) for pat in exclude]
155
156    entry_count = 1
157    ds_count = 0
158
159    if _is_bids_dataset(root):
160        ds_count += 1
161        yield root
162
163    # Tuple of path, depth
164    stack = [(root, 0)]
165
166    while stack:
167        top, depth = stack.pop()
168
169        inside_bids = _is_bids_dataset(top)
170        depth += 1
171
172        for entry in top.iterdir():
173            entry_count += 1
174
175            if any(re.fullmatch(pat, entry.name) for pat in exclude):
176                continue
177
178            if _is_bids_dataset(entry):
179                ds_count += 1
180                yield entry
181
182            # Checks if we should descend into this directory.
183            # Check not reached final depth.
184            descend = maxdepth is None or depth < maxdepth
185            # Heuristic checks whether the filename looks like a (visible) directory.
186            descend = descend and not (entry.suffix or entry.name.startswith("."))
187            # Only descend into specific subdirectories of BIDS directories.
188            descend = descend and (
189                not inside_bids or entry.name in _BIDS_NESTED_PARENT_DIRNAMES
190            )
191            # Finally, check if actually a directory (which is slow so we want to
192            # short-circuit as much as possible).
193            if descend and entry.is_dir():
194                stack.append((entry, depth))

Find all BIDS datasets under a root directory.

Arguments:
  • root: Root path to begin search.
  • exclude: Glob pattern or list of patterns matching sub-directory names to exclude from the search.
  • maxdepth: Maximum depth to search.
Yields:

Root paths of all BIDS datasets under root.

def get_arrow_schema() -> pyarrow.lib.Schema:
 92def get_arrow_schema() -> pa.Schema:
 93    """Get Arrow schema of the BIDS dataset index."""
 94    entity_schema = get_bids_entity_arrow_schema()
 95    index_fields = {
 96        name: pa.field(name, cfg["dtype"], metadata=cfg["metadata"])
 97        for name, cfg in _INDEX_ARROW_FIELDS.items()
 98    }
 99    fields = [
100        index_fields["dataset"],
101        *entity_schema,
102        index_fields["extra_entities"],
103        index_fields["root"],
104        index_fields["path"],
105    ]
106
107    metadata = {
108        **entity_schema.metadata,
109        "bids2table_version": importlib.metadata.version(__package__),
110    }
111    schema = pa.schema(fields, metadata=metadata)
112    return schema

Get Arrow schema of the BIDS dataset index.

def get_column_names() -> enum.StrEnum:
115def get_column_names() -> enum.StrEnum:
116    """Get an enum of the BIDS index columns."""
117    # TODO: It might be nice if the column names were statically available. One option
118    # would be to generate a static _schema.py module at install time (similar to how
119    # _version.py is generated) which defines the static default schema and column
120    # names.
121    schema = get_arrow_schema()
122    items = []
123    for f in schema:
124        name = f.metadata["name".encode()].decode()
125        items.append((name, name))
126
127    BIDSColumn = enum.StrEnum("BIDSColumn", items)
128    BIDSColumn.__doc__ = "Enum of BIDS index column names."
129    return BIDSColumn

Get an enum of the BIDS index columns.

def parse_bids_entities(path: str | pathlib.Path) -> dict[str, str]:
138def parse_bids_entities(path: str | Path) -> dict[str, str]:
139    """Parse entities from BIDS file path.
140
141    Parses all BIDS filename `"{key}-{value}"` entities as well as special entities:
142    datatype, suffix, ext (extension). Does not validate entities or cast to types.
143
144    Args:
145        path: BIDS path to parse.
146
147    Returns:
148        A dict mapping BIDS entity keys to values.
149    """
150    if isinstance(path, str):
151        path = Path(path)
152    entities = {}
153
154    filename = path.name
155    parts = filename.split("_")
156
157    datatype = _parse_bids_datatype(path)
158
159    # Get suffix and extension.
160    suffix_ext = parts.pop()
161    suffix, dot, ext = suffix_ext.partition(".")
162    ext = dot + ext if ext else None
163
164    # Suffix is actually an entity, put back in list.
165    if "-" in suffix:
166        parts.append(suffix)
167        suffix = None
168
169    # Split entities, skipping any that don't contain a '-'.
170    for part in parts:
171        if "-" in part:
172            key, val = part.split("-", maxsplit=1)
173            entities[key] = val
174
175    for k, v in zip(["datatype", "suffix", "ext"], [datatype, suffix, ext]):
176        if v is not None:
177            entities[k] = v
178    return entities

Parse entities from BIDS file path.

Parses all BIDS filename "{key}-{value}" entities as well as special entities: datatype, suffix, ext (extension). Does not validate entities or cast to types.

Arguments:
  • path: BIDS path to parse.
Returns:

A dict mapping BIDS entity keys to values.

def validate_bids_entities( entities: dict[str, typing.Any]) -> tuple[dict[str, str | int], dict[str, typing.Any]]:
197def validate_bids_entities(
198    entities: dict[str, Any],
199) -> tuple[dict[str, BIDSValue], dict[str, Any]]:
200    """Validate BIDS entities.
201
202    Validates the type and allowed values of each entity against the BIDS schema.
203
204    Args:
205        entities: dict mapping BIDS keys to unvalidated entities
206
207    Returns:
208        A tuple of `(valid_entities, extra_entities)`, where `valid_entities` is a
209            mapping of valid BIDS keys to type-casted values, and `extra_entities` a
210            mapping of any leftover entity mappings that didn't match a known entity or
211            failed validation.
212    """
213    valid_entities = {}
214    extra_entities = {}
215
216    for name, value in entities.items():
217        if name in _BIDS_NAME_ENTITY_MAP:
218            entity = _BIDS_NAME_ENTITY_MAP[name]
219            cfg = _BIDS_ENTITY_SCHEMA[entity]
220            typ = _BIDS_FORMAT_PY_TYPE_MAP[cfg["format"]]
221
222            # Cast to target type.
223            try:
224                value = typ(value)
225            except ValueError:
226                _logger.warning(
227                    f"Unable to coerce {repr(value)} to type {typ} for entity '{name}'.",
228                )
229                extra_entities[name] = value
230                continue
231
232            # Check allowed values.
233            if "enum" in cfg and value not in cfg["enum"]:
234                _logger.warning(
235                    f"Value {value} for entity '{name}' isn't one of the "
236                    f"allowed values: {cfg['enum']}.",
237                )
238                extra_entities[name] = value
239                continue
240
241            valid_entities[name] = value
242        else:
243            extra_entities[name] = value
244
245    return valid_entities, extra_entities

Validate BIDS entities.

Validates the type and allowed values of each entity against the BIDS schema.

Arguments:
  • entities: dict mapping BIDS keys to unvalidated entities
Returns:

A tuple of (valid_entities, extra_entities), where valid_entities is a mapping of valid BIDS keys to type-casted values, and extra_entities a mapping of any leftover entity mappings that didn't match a known entity or failed validation.

def set_bids_schema(path: str | pathlib.Path | None = None) -> None:
 78def set_bids_schema(path: str | Path | None = None) -> None:
 79    """Set the BIDS schema."""
 80    global _BIDS_SCHEMA, _BIDS_ENTITY_SCHEMA, _BIDS_NAME_ENTITY_MAP
 81    global _BIDS_ENTITY_ARROW_SCHEMA
 82
 83    schema = bidsschematools.schema.load_schema(path)
 84    entity_schema = {
 85        entity: schema.objects.entities[entity].to_dict()
 86        for entity in schema.rules.entities
 87    }
 88    # Also include special extra entities (datatype, suffix, extension).
 89    entity_schema.update(_BIDS_SPECIAL_ENTITY_SCHEMA)
 90    name_entity_map = {cfg["name"]: entity for entity, cfg in entity_schema.items()}
 91
 92    _BIDS_SCHEMA = schema
 93    _BIDS_ENTITY_SCHEMA = entity_schema
 94    _BIDS_NAME_ENTITY_MAP = name_entity_map
 95
 96    _BIDS_ENTITY_ARROW_SCHEMA = _bids_entity_arrow_schema(
 97        entity_schema,
 98        bids_version=schema["bids_version"],
 99        schema_version=schema["schema_version"],
100    )

Set the BIDS schema.

def get_bids_schema() -> bidsschematools.types.namespace.Namespace:
128def get_bids_schema() -> Namespace:
129    """Get the current BIDS schema."""
130    return _BIDS_SCHEMA

Get the current BIDS schema.

def get_bids_entity_arrow_schema() -> pyarrow.lib.Schema:
133def get_bids_entity_arrow_schema() -> pa.Schema:
134    """Get the current BIDS entity schema in Arrow format."""
135    return _BIDS_ENTITY_ARROW_SCHEMA

Get the current BIDS entity schema in Arrow format.

def format_bids_path(entities: dict[str, typing.Any], int_format: str = '%d') -> pathlib.Path:
248def format_bids_path(entities: dict[str, Any], int_format: str = "%d") -> Path:
249    """Construct a formatted BIDS path from entities dict.
250
251    Args:
252        entities: dict mapping BIDS keys to values.
253        int_format: format string for integer (index) BIDS values.
254
255    Returns:
256        A formatted `Path` instance.
257    """
258    special = {"datatype", "suffix", "ext"}
259
260    # Formatted key-value entities.
261    entities_fmt = []
262    for name, value in entities.items():
263        if name not in special:
264            if isinstance(value, int):
265                value = int_format % value
266            entities_fmt.append(f"{name}-{value}")
267    name = "_".join(entities_fmt)
268
269    # Append suffix and extension.
270    if suffix := entities.get("suffix"):
271        name += f"_{suffix}"
272    if ext := entities.get("ext"):
273        name += ext
274
275    # Prepend parent directories.
276    path = Path(name)
277    if datatype := entities.get("datatype"):
278        path = datatype / path
279    if ses := entities.get("ses"):
280        path = f"ses-{ses}" / path
281    path = f"sub-{entities['sub']}" / path
282    return path

Construct a formatted BIDS path from entities dict.

Arguments:
  • entities: dict mapping BIDS keys to values.
  • int_format: format string for integer (index) BIDS values.
Returns:

A formatted Path instance.

def load_bids_metadata( path: str | pathlib.Path | cloudpathlib.cloudpath.CloudPath, inherit: bool = True) -> dict[str, typing.Any]:
11def load_bids_metadata(path: str | PathT, inherit: bool = True) -> dict[str, Any]:
12    """Load the full JSON sidecar metadata for a BIDS file.
13
14    Sidecar files are loaded according to the inheritance principle in top-down order.
15
16    Args:
17        path: BIDS file path
18        inherit: Load the full metadata according to inheritance. Otherwise, load only
19            the first JSON sidecar found in the bottom-up search.
20
21    Returns:
22        A sidecar metadata dictionary.
23    """
24    path = as_path(path)
25    entities = _cache_parse_bids_entities(path)
26    query = dict(entities, ext=".json")
27
28    metadata = {}
29
30    parent = path.parent
31    if inherit:
32        sidecars = reversed(list(_find_bids_parents(parent, query)))
33    else:
34        sidecars = [next(_find_bids_parents(parent, query))]
35
36    for path in sidecars:
37        try:
38            data = _load_json(path)
39            metadata.update(data)
40        except (json.JSONDecodeError, TypeError):
41            continue
42    return metadata

Load the full JSON sidecar metadata for a BIDS file.

Sidecar files are loaded according to the inheritance principle in top-down order.

Arguments:
  • path: BIDS file path
  • inherit: Load the full metadata according to inheritance. Otherwise, load only the first JSON sidecar found in the bottom-up search.
Returns:

A sidecar metadata dictionary.

def cloudpathlib_is_available() -> bool:
29def cloudpathlib_is_available() -> bool:
30    """Check if cloudpathlib is available."""
31    return _CLOUDPATHLIB_AVAILABLE

Check if cloudpathlib is available.