bids2table

Index BIDS datasets fast, locally or in the cloud.

Installation

To install the latest release from pypi, you can run

pip install bids2table

To install with S3 support, include the s3 extra

pip install bids2table[s3]

The latest development version can be installed with

pip install "bids2table[s3] @ git+https://github.com/childmindresearch/bids2table.git"

Usage

To run these examples, you will need to clone the bids-examples repo.

git clone -b 1.9.0 https://github.com/bids-standard/bids-examples.git

Finding BIDS datasets

You can search a directory for valid BIDS datasets using b2t2 find

(bids2table) clane$ b2t2 find bids-examples | head -n 10
bids-examples/asl002
bids-examples/ds002
bids-examples/ds005
bids-examples/asl005
bids-examples/ds051
bids-examples/eeg_rishikesh
bids-examples/asl004
bids-examples/asl003
bids-examples/ds003
bids-examples/eeg_cbm

Indexing datasets from the command line

Indexing datasets is done with b2t2 index. Here we index a single example dataset, saving the output as a parquet file.

(bids2table) clane$ b2t2 index -o ds102.parquet bids-examples/ds102
ds102: 100%|███████████████████████████████████████| 26/26 [00:00<00:00, 154.12it/s, sub=26, N=130]

You can also index a list of datasets. Note that each iteration in the progress bar represents one dataset.

(bids2table) clane$ b2t2 index -o bids-examples.parquet bids-examples/*
100%|████████████████████████████████████████████| 87/87 [00:00<00:00, 113.59it/s, ds=None, N=9727]

You can pipe the output of b2t2 find to b2t2 index to create an index of all datasets under a root directory.

(bids2table) clane$ b2t2 find bids-examples | b2t2 index -o bids-examples.parquet
97it [00:01, 96.05it/s, ds=ieeg_filtered_speech, N=10K]

The resulting index will include both top-level datasets (as in the previous command) as well nested derivatives datasets.

Indexing datasets hosted on S3

bids2table supports indexing datasets hosted on S3 via cloudpathlib. To use this functionality, make sure to install bids2table with the s3 extra. Or you can also just install cloudpathlib directly

pip install cloudpathlib[s3]

As an example, here we index all datasets on OpenNeuro

(bids2table) clane$ b2t2 index -o openneuro.parquet \
  -j 8 --use-threads s3://openneuro.org/ds*
100%|█████████████████████████████████████| 1408/1408 [12:25<00:00,  1.89it/s, ds=ds006193, N=1.2M]

Using 8 threads, we can index all ~1400 OpenNeuro datasets (1.2M files) in less than 15 minutes.

Indexing datasets from python

You can also index datasets using the Python API.

import bids2table as b2t2
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

# Index a single dataset.
tab = b2t2.index_dataset("bids-examples/ds102")

# Find and index a batch of datasets.
tabs = b2t2.batch_index_dataset(
    b2t2.find_bids_datasets("bids-examples"),
)
tab = pa.concat_tables(tabs)

# Index a dataset on S3.
tab = b2t2.index_dataset("s3://openneuro.org/ds000224")

# Save as parquet.
pq.write_table(tab, "ds000224.parquet")

# Convert to a pandas dataframe.
df = tab.to_pandas(types_mapper=pd.ArrowDtype)

View Source

  1# ruff: noqa: I001
  2r"""
  3[![CI](https://github.com/childmindresearch/bids2table/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/childmindresearch/bids2table/actions/workflows/ci.yaml?query=branch%3Amain)
  4[![Docs](https://github.com/childmindresearch/bids2table/actions/workflows/docs.yaml/badge.svg?branch=main)](https://childmindresearch.github.io/bids2table/bids2table)
  5[![codecov](https://codecov.io/gh/childmindresearch/bids2table/branch/main/graph/badge.svg?token=22HWWFWPW5)](https://codecov.io/gh/childmindresearch/bids2table)
  6[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
  7![Python3](https://img.shields.io/badge/python->=3.11-blue.svg)
  8[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
  9
 10Index [BIDS](https://bids-specification.readthedocs.io/en/stable/) datasets fast, locally or in the cloud.
 11
 12## Installation
 13
 14To install the latest release from pypi, you can run
 15
 16```sh
 17pip install bids2table
 18```
 19
 20To install with S3 support, include the `s3` extra
 21
 22```sh
 23pip install bids2table[s3]
 24```
 25
 26The latest development version can be installed with
 27
 28```sh
 29pip install "bids2table[s3] @ git+https://github.com/childmindresearch/bids2table.git"
 30```
 31
 32## Usage
 33
 34To run these examples, you will need to clone the [bids-examples](https://github.com/bids-standard/bids-examples) repo.
 35
 36```sh
 37git clone -b 1.9.0 https://github.com/bids-standard/bids-examples.git
 38```
 39
 40### Finding BIDS datasets
 41
 42You can search a directory for valid BIDS datasets using `b2t2 find`
 43
 44```
 45(bids2table) clane$ b2t2 find bids-examples | head -n 10
 46bids-examples/asl002
 47bids-examples/ds002
 48bids-examples/ds005
 49bids-examples/asl005
 50bids-examples/ds051
 51bids-examples/eeg_rishikesh
 52bids-examples/asl004
 53bids-examples/asl003
 54bids-examples/ds003
 55bids-examples/eeg_cbm
 56```
 57
 58### Indexing datasets from the command line
 59
 60Indexing datasets is done with `b2t2 index`. Here we index a single example dataset, saving the output as a parquet file.
 61
 62```
 63(bids2table) clane$ b2t2 index -o ds102.parquet bids-examples/ds102
 64ds102: 100%|███████████████████████████████████████| 26/26 [00:00<00:00, 154.12it/s, sub=26, N=130]
 65```
 66
 67You can also index a list of datasets. Note that each iteration in the progress bar represents one dataset.
 68
 69```
 70(bids2table) clane$ b2t2 index -o bids-examples.parquet bids-examples/*
 71100%|████████████████████████████████████████████| 87/87 [00:00<00:00, 113.59it/s, ds=None, N=9727]
 72```
 73
 74You can pipe the output of `b2t2 find` to `b2t2 index` to create an index of all datasets under a root directory.
 75
 76```
 77(bids2table) clane$ b2t2 find bids-examples | b2t2 index -o bids-examples.parquet
 7897it [00:01, 96.05it/s, ds=ieeg_filtered_speech, N=10K]
 79```
 80
 81The resulting index will include both top-level datasets (as in the previous command) as well nested derivatives datasets.
 82
 83### Indexing datasets hosted on S3
 84
 85bids2table supports indexing datasets hosted on S3 via [cloudpathlib](https://github.com/drivendataorg/cloudpathlib). To use this functionality, make sure to install bids2table with the `s3` extra. Or you can also just install cloudpathlib directly
 86
 87```sh
 88pip install cloudpathlib[s3]
 89```
 90
 91As an example, here we index all datasets on [OpenNeuro](https://openneuro.org/)
 92
 93```
 94(bids2table) clane$ b2t2 index -o openneuro.parquet \
 95  -j 8 --use-threads s3://openneuro.org/ds*
 96100%|█████████████████████████████████████| 1408/1408 [12:25<00:00,  1.89it/s, ds=ds006193, N=1.2M]
 97```
 98
 99Using 8 threads, we can index all ~1400 OpenNeuro datasets (1.2M files) in less than 15 minutes.
100
101
102### Indexing datasets from python
103
104You can also index datasets using the Python API.
105
106```python
107import bids2table as b2t2
108import pandas as pd
109import pyarrow as pa
110import pyarrow.parquet as pq
111
112# Index a single dataset.
113tab = b2t2.index_dataset("bids-examples/ds102")
114
115# Find and index a batch of datasets.
116tabs = b2t2.batch_index_dataset(
117    b2t2.find_bids_datasets("bids-examples"),
118)
119tab = pa.concat_tables(tabs)
120
121# Index a dataset on S3.
122tab = b2t2.index_dataset("s3://openneuro.org/ds000224")
123
124# Save as parquet.
125pq.write_table(tab, "ds000224.parquet")
126
127# Convert to a pandas dataframe.
128df = tab.to_pandas(types_mapper=pd.ArrowDtype)
129```
130"""
131
132__all__ = [
133    "index_dataset",
134    "batch_index_dataset",
135    "find_bids_datasets",
136    "get_arrow_schema",
137    "get_column_names",
138    "parse_bids_entities",
139    "validate_bids_entities",
140    "set_bids_schema",
141    "get_bids_schema",
142    "get_bids_entity_arrow_schema",
143    "format_bids_path",
144    "load_bids_metadata",
145    "cloudpathlib_is_available",
146]
147
148from ._indexing import (
149    index_dataset,
150    batch_index_dataset,
151    find_bids_datasets,
152    get_arrow_schema,
153    get_column_names,
154)
155from ._entities import (
156    parse_bids_entities,
157    validate_bids_entities,
158    set_bids_schema,
159    get_bids_schema,
160    get_bids_entity_arrow_schema,
161    format_bids_path,
162)
163from ._metadata import load_bids_metadata
164from ._pathlib import cloudpathlib_is_available
165from ._version import *

def index_dataset( root: str | pathlib.Path | cloudpathlib.cloudpath.CloudPath, include_subjects: str | list[str] | None = None, max_workers: int | None = 0, chunksize: int = 32, executor_cls: type[concurrent.futures._base.Executor] = <class 'concurrent.futures.process.ProcessPoolExecutor'>, show_progress: bool = False) -> pyarrow.lib.Table: View Source

198def index_dataset(
199    root: str | PathT,
200    include_subjects: str | list[str] | None = None,
201    max_workers: int | None = 0,
202    chunksize: int = 32,
203    executor_cls: type[Executor] = ProcessPoolExecutor,
204    show_progress: bool = False,
205) -> pa.Table:
206    """Index a BIDS dataset.
207
208    Args:
209        root: BIDS dataset root directory.
210        include_subjects: Glob pattern or list of patterns for matching subjects to
211            include in the index.
212        max_workers: Number of indexing processes to run in parallel. Setting
213            `max_workers=0` (the default) uses the main process only. Setting
214            `max_workers=None` starts as many workers as there are available CPUs. See
215            `concurrent.futures.ProcessPoolExecutor` for details.
216        chunksize: Number of subjects per process task. Only used for
217            `ProcessPoolExecutor` when `max_workers > 0`.
218        executor_cls: Executor class to use for parallel indexing.
219        show_progress: Show progress bar.
220
221    Returns:
222        An Arrow table index of the BIDS dataset.
223    """
224    root = as_path(root)
225
226    schema = get_arrow_schema()
227
228    dataset, _ = _get_bids_dataset(root)
229    if dataset is None:
230        _logger.warning(f"Path {root} is not a valid BIDS dataset directory.")
231        return pa.Table.from_pylist([], schema=schema)
232
233    subject_dirs = _find_bids_subject_dirs(root, include_subjects)
234    subject_dirs = sorted(subject_dirs, key=lambda p: p.name)
235    if len(subject_dirs) == 0:
236        _logger.warning(f"Path {root} contains no matching subject dirs.")
237        return pa.Table.from_pylist([], schema=schema)
238
239    func = partial(_index_bids_subject_dir, schema=schema, dataset=dataset)
240
241    tables = []
242    file_count = 0
243    for sub, table in (
244        pbar := tqdm(
245            _pmap(func, subject_dirs, max_workers, chunksize, executor_cls),
246            desc=dataset,
247            total=len(subject_dirs),
248            disable=not show_progress,
249        )
250    ):
251        file_count += len(table)
252        pbar.set_postfix(dict(sub=sub, N=_hfmt(file_count)), refresh=False)
253        tables.append(table)
254
255    # NOTE: concat_tables produces a table where each column is a ChunkedArray, with one
256    # chunk per original subject table. Is it better to keep the original chunks (one
257    # per subject) or merge using `combine_chunks`?
258    table = pa.concat_tables(tables).combine_chunks()
259    return table

Index a BIDS dataset.

Arguments:

root: BIDS dataset root directory.
include_subjects: Glob pattern or list of patterns for matching subjects to include in the index.
max_workers: Number of indexing processes to run in parallel. Setting max_workers=0 (the default) uses the main process only. Setting max_workers=None starts as many workers as there are available CPUs. See concurrent.futures.ProcessPoolExecutor for details.
chunksize: Number of subjects per process task. Only used for ProcessPoolExecutor when max_workers > 0.
executor_cls: Executor class to use for parallel indexing.
show_progress: Show progress bar.

Returns:

An Arrow table index of the BIDS dataset.

def batch_index_dataset( roots: list[str | pathlib.Path | cloudpathlib.cloudpath.CloudPath], max_workers: int | None = 0, executor_cls: type[concurrent.futures._base.Executor] = <class 'concurrent.futures.process.ProcessPoolExecutor'>, show_progress: bool = False) -> Generator[pyarrow.lib.Table, NoneType, NoneType]: View Source

262def batch_index_dataset(
263    roots: list[str | PathT],
264    max_workers: int | None = 0,
265    executor_cls: type[Executor] = ProcessPoolExecutor,
266    show_progress: bool = False,
267) -> Generator[pa.Table, None, None]:
268    """Index a batch of BIDS datasets.
269
270    Args:
271        roots: List of BIDS dataset root directories.
272        max_workers: Number of indexing processes to run in parallel. Setting
273            `max_workers=0` (the default) uses the main process only. Setting
274            `max_workers=None` starts as many workers as there are available CPUs. See
275            `concurrent.futures.ProcessPoolExecutor` for details.
276        executor_cls: Executor class to use for parallel indexing.
277        show_progress: Show progress bar.
278
279    Yields:
280        An Arrow table index for each BIDS dataset.
281    """
282    file_count = 0
283    for dataset, table in (
284        pbar := tqdm(
285            _pmap(_batch_index_func, roots, max_workers, executor_cls=executor_cls),
286            total=len(roots) if isinstance(roots, Sequence) else None,
287            disable=show_progress not in {True, "dataset"},
288        )
289    ):
290        file_count += len(table)
291        pbar.set_postfix(dict(ds=dataset, N=_hfmt(file_count)), refresh=False)
292        yield table

Index a batch of BIDS datasets.

Arguments:

roots: List of BIDS dataset root directories.
max_workers: Number of indexing processes to run in parallel. Setting max_workers=0 (the default) uses the main process only. Setting max_workers=None starts as many workers as there are available CPUs. See concurrent.futures.ProcessPoolExecutor for details.
executor_cls: Executor class to use for parallel indexing.
show_progress: Show progress bar.

Yields:

An Arrow table index for each BIDS dataset.

133def find_bids_datasets(
134    root: str | PathT,
135    exclude: str | list[str] | None = None,
136    maxdepth: int | None = None,
137) -> Generator[PathT, None, None]:
138    """Find all BIDS datasets under a root directory.
139
140    Args:
141        root: Root path to begin search.
142        exclude: Glob pattern or list of patterns matching sub-directory names to
143            exclude from the search.
144        maxdepth: Maximum depth to search.
145
146    Yields:
147        Root paths of all BIDS datasets under `root`.
148    """
149    root = as_path(root)
150
151    if isinstance(exclude, str):
152        exclude = [exclude]
153    elif exclude is None:
154        exclude = []
155    exclude = [re.compile(fnmatch.translate(pat)) for pat in exclude]
156
157    entry_count = 1
158    ds_count = 0
159
160    if _is_bids_dataset(root):
161        ds_count += 1
162        yield root
163
164    # Tuple of path, depth
165    stack = [(root, 0)]
166
167    while stack:
168        top, depth = stack.pop()
169
170        inside_bids = _is_bids_dataset(top)
171        depth += 1
172
173        for entry in top.iterdir():
174            entry_count += 1
175
176            if any(re.fullmatch(pat, entry.name) for pat in exclude):
177                continue
178
179            if _is_bids_dataset(entry):
180                ds_count += 1
181                yield entry
182
183            # Checks if we should descend into this directory.
184            # Check not reached final depth.
185            descend = maxdepth is None or depth < maxdepth
186            # Heuristic checks whether the filename looks like a (visible) directory.
187            descend = descend and not (entry.suffix or entry.name.startswith("."))
188            # Only descend into specific subdirectories of BIDS directories.
189            descend = descend and (
190                not inside_bids or entry.name in _BIDS_NESTED_PARENT_DIRNAMES
191            )
192            # Finally, check if actually a directory (which is slow so we want to
193            # short-circuit as much as possible).
194            if descend and entry.is_dir():
195                stack.append((entry, depth))

Find all BIDS datasets under a root directory.

Arguments:

root: Root path to begin search.
exclude: Glob pattern or list of patterns matching sub-directory names to exclude from the search.
maxdepth: Maximum depth to search.

Yields:

Root paths of all BIDS datasets under root.

def get_arrow_schema() -> pyarrow.lib.Schema: View Source

 93def get_arrow_schema() -> pa.Schema:
 94    """Get Arrow schema of the BIDS dataset index."""
 95    entity_schema = get_bids_entity_arrow_schema()
 96    index_fields = {
 97        name: pa.field(name, cfg["dtype"], metadata=cfg["metadata"])
 98        for name, cfg in _INDEX_ARROW_FIELDS.items()
 99    }
100    fields = [
101        index_fields["dataset"],
102        *entity_schema,
103        index_fields["extra_entities"],
104        index_fields["root"],
105        index_fields["path"],
106    ]
107
108    metadata = {
109        **entity_schema.metadata,
110        "bids2table_version": importlib.metadata.version(__package__),
111    }
112    schema = pa.schema(fields, metadata=metadata)
113    return schema

Get Arrow schema of the BIDS dataset index.

def get_column_names() -> enum.StrEnum: View Source

116def get_column_names() -> enum.StrEnum:
117    """Get an enum of the BIDS index columns."""
118    # TODO: It might be nice if the column names were statically available. One option
119    # would be to generate a static _schema.py module at install time (similar to how
120    # _version.py is generated) which defines the static default schema and column
121    # names.
122    schema = get_arrow_schema()
123    items = []
124    for f in schema:
125        name = f.metadata["name".encode()].decode()
126        items.append((name, name))
127
128    BIDSColumn = enum.StrEnum("BIDSColumn", items)
129    BIDSColumn.__doc__ = "Enum of BIDS index column names."
130    return BIDSColumn

Get an enum of the BIDS index columns.

def parse_bids_entities(path: str | pathlib.Path) -> dict[str, str]: View Source

138def parse_bids_entities(path: str | Path) -> dict[str, str]:
139    """Parse entities from BIDS file path.
140
141    Parses all BIDS filename `"{key}-{value}"` entities as well as special entities:
142    datatype, suffix, ext (extension). Does not validate entities or cast to types.
143
144    Args:
145        path: BIDS path to parse.
146
147    Returns:
148        A dict mapping BIDS entity keys to values.
149    """
150    if isinstance(path, str):
151        path = Path(path)
152    entities = {}
153
154    filename = path.name
155    parts = filename.split("_")
156
157    datatype = _parse_bids_datatype(path)
158
159    # Get suffix and extension.
160    suffix_ext = parts.pop()
161    suffix, dot, ext = suffix_ext.partition(".")
162    ext = dot + ext if ext else None
163
164    # Suffix is actually an entity, put back in list.
165    if "-" in suffix:
166        parts.append(suffix)
167        suffix = None
168
169    # Split entities, skipping any that don't contain a '-'.
170    for part in parts:
171        if "-" in part:
172            key, val = part.split("-", maxsplit=1)
173            entities[key] = val
174
175    for k, v in zip(["datatype", "suffix", "ext"], [datatype, suffix, ext]):
176        if v is not None:
177            entities[k] = v
178    return entities

Parse entities from BIDS file path.

Parses all BIDS filename "{key}-{value}" entities as well as special entities: datatype, suffix, ext (extension). Does not validate entities or cast to types.

Arguments:

path: BIDS path to parse.

Returns:

A dict mapping BIDS entity keys to values.

def validate_bids_entities( entities: dict[str, typing.Any]) -> tuple[dict[str, str | int], dict[str, typing.Any]]: View Source

197def validate_bids_entities(
198    entities: dict[str, Any],
199) -> tuple[dict[str, BIDSValue], dict[str, Any]]:
200    """Validate BIDS entities.
201
202    Validates the type and allowed values of each entity against the BIDS schema.
203
204    Args:
205        entities: dict mapping BIDS keys to unvalidated entities
206
207    Returns:
208        A tuple of `(valid_entities, extra_entities)`, where `valid_entities` is a
209            mapping of valid BIDS keys to type-casted values, and `extra_entities` a
210            mapping of any leftover entity mappings that didn't match a known entity or
211            failed validation.
212    """
213    valid_entities = {}
214    extra_entities = {}
215
216    for name, value in entities.items():
217        if name in _BIDS_NAME_ENTITY_MAP:
218            entity = _BIDS_NAME_ENTITY_MAP[name]
219            cfg = _BIDS_ENTITY_SCHEMA[entity]
220            typ = _BIDS_FORMAT_PY_TYPE_MAP[cfg["format"]]
221
222            # Cast to target type.
223            try:
224                value = typ(value)
225            except ValueError:
226                _logger.warning(
227                    f"Unable to coerce {repr(value)} to type {typ} for entity '{name}'.",
228                )
229                extra_entities[name] = value
230                continue
231
232            # Check allowed values.
233            if "enum" in cfg and value not in cfg["enum"]:
234                _logger.warning(
235                    f"Value {value} for entity '{name}' isn't one of the "
236                    f"allowed values: {cfg['enum']}.",
237                )
238                extra_entities[name] = value
239                continue
240
241            valid_entities[name] = value
242        else:
243            extra_entities[name] = value
244
245    return valid_entities, extra_entities

Validate BIDS entities.

Validates the type and allowed values of each entity against the BIDS schema.

Arguments:

entities: dict mapping BIDS keys to unvalidated entities

Returns:

A tuple of (valid_entities, extra_entities), where valid_entities is a mapping of valid BIDS keys to type-casted values, and extra_entities a mapping of any leftover entity mappings that didn't match a known entity or failed validation.

def set_bids_schema(path: str | pathlib.Path | None = None) -> None: View Source

 78def set_bids_schema(path: str | Path | None = None) -> None:
 79    """Set the BIDS schema."""
 80    global _BIDS_SCHEMA, _BIDS_ENTITY_SCHEMA, _BIDS_NAME_ENTITY_MAP
 81    global _BIDS_ENTITY_ARROW_SCHEMA
 82
 83    schema = bidsschematools.schema.load_schema(path)
 84    entity_schema = {
 85        entity: schema.objects.entities[entity].to_dict()
 86        for entity in schema.rules.entities
 87    }
 88    # Also include special extra entities (datatype, suffix, extension).
 89    entity_schema.update(_BIDS_SPECIAL_ENTITY_SCHEMA)
 90    name_entity_map = {cfg["name"]: entity for entity, cfg in entity_schema.items()}
 91
 92    _BIDS_SCHEMA = schema
 93    _BIDS_ENTITY_SCHEMA = entity_schema
 94    _BIDS_NAME_ENTITY_MAP = name_entity_map
 95
 96    _BIDS_ENTITY_ARROW_SCHEMA = _bids_entity_arrow_schema(
 97        entity_schema,
 98        bids_version=schema["bids_version"],
 99        schema_version=schema["schema_version"],
100    )

Set the BIDS schema.

def get_bids_schema() -> bidsschematools.types.namespace.Namespace: View Source

128def get_bids_schema() -> Namespace:
129    """Get the current BIDS schema."""
130    return _BIDS_SCHEMA

Get the current BIDS schema.

def get_bids_entity_arrow_schema() -> pyarrow.lib.Schema: View Source

133def get_bids_entity_arrow_schema() -> pa.Schema:
134    """Get the current BIDS entity schema in Arrow format."""
135    return _BIDS_ENTITY_ARROW_SCHEMA

Get the current BIDS entity schema in Arrow format.

def format_bids_path(entities: dict[str, typing.Any], int_format: str = '%d') -> pathlib.Path: View Source

248def format_bids_path(entities: dict[str, Any], int_format: str = "%d") -> Path:
249    """Construct a formatted BIDS path from entities dict.
250
251    Args:
252        entities: dict mapping BIDS keys to values.
253        int_format: format string for integer (index) BIDS values.
254
255    Returns:
256        A formatted `Path` instance.
257    """
258    special = {"datatype", "suffix", "ext"}
259
260    # Formatted key-value entities.
261    entities_fmt = []
262    for name, value in entities.items():
263        if name not in special:
264            if isinstance(value, int):
265                value = int_format % value
266            entities_fmt.append(f"{name}-{value}")
267    name = "_".join(entities_fmt)
268
269    # Append suffix and extension.
270    if suffix := entities.get("suffix"):
271        name += f"_{suffix}"
272    if ext := entities.get("ext"):
273        name += ext
274
275    # Prepend parent directories.
276    path = Path(name)
277    if datatype := entities.get("datatype"):
278        path = datatype / path
279    if ses := entities.get("ses"):
280        path = f"ses-{ses}" / path
281    path = f"sub-{entities['sub']}" / path
282    return path

Construct a formatted BIDS path from entities dict.

Arguments:

entities: dict mapping BIDS keys to values.
int_format: format string for integer (index) BIDS values.

Returns:

A formatted Path instance.

def load_bids_metadata( path: str | pathlib.Path | cloudpathlib.cloudpath.CloudPath, inherit: bool = True) -> dict[str, typing.Any]: View Source

11def load_bids_metadata(path: str | PathT, inherit: bool = True) -> dict[str, Any]:
12    """Load the full JSON sidecar metadata for a BIDS file.
13
14    Sidecar files are loaded according to the inheritance principle in top-down order.
15
16    Args:
17        path: BIDS file path
18        inherit: Load the full metadata according to inheritance. Otherwise, load only
19            the first JSON sidecar found in the bottom-up search.
20
21    Returns:
22        A sidecar metadata dictionary.
23    """
24    path = as_path(path)
25    entities = _cache_parse_bids_entities(path)
26    query = dict(entities, ext=".json")
27
28    metadata = {}
29
30    parent = path.parent
31    if inherit:
32        sidecars = reversed(list(_find_bids_parents(parent, query)))
33    else:
34        sidecars = [next(_find_bids_parents(parent, query))]
35
36    for path in sidecars:
37        try:
38            data = _load_json(path)
39            metadata.update(data)
40        except (json.JSONDecodeError, TypeError):
41            continue
42    return metadata

Load the full JSON sidecar metadata for a BIDS file.

Sidecar files are loaded according to the inheritance principle in top-down order.

Arguments:

path: BIDS file path
inherit: Load the full metadata according to inheritance. Otherwise, load only the first JSON sidecar found in the bottom-up search.

Returns:

A sidecar metadata dictionary.

def cloudpathlib_is_available() -> bool: View Source

29def cloudpathlib_is_available() -> bool:
30    """Check if cloudpathlib is available."""
31    return _CLOUDPATHLIB_AVAILABLE

Check if cloudpathlib is available.