bids2table

bids2table is a library for efficiently indexing and querying large-scale BIDS neuroimaging datasets and derivatives. It aims to improve upon the efficiency of PyBIDS by leveraging modern data science tools.

bids2table represents a BIDS dataset index as a single table with columns for BIDS entities and file metadata. The index is constructed using Arrow and stored in Parquet format, a binary tabular file format optimized for efficient storage and retrieval.

Installation

A pre-release version of bids2table can be installed with

pip install bids2table

The latest development version can be installed with

pip install git+https://github.com/childmindresearch/bids2table.git

Quickstart

The main entrypoint to the library is the bids2table.bids2table function, which builds the index.

tab = bids2table("path/to/dataset")

You can also build the index in parallel

tab = bids2table("path/to/dataset", workers=8)

To save the index to disk as a Parquet dataset for later reuse, run

tab = bids2table("path/to/dataset", persistent=True)

By default this saves the index to an index.b2t directory under the dataset root directory. To change the output destination, use the index_path argument.

To generate and save an index from the command line, you can use the bids2table CLI.

usage: bids2table [-h] [--output OUTPUT] [--incremental] [--overwrite] [--workers COUNT]
                  [--worker_id RANK] [--verbose]
                  ROOT

See bids2table --help for more information.

Table representation

The generated index is represented as a bids2table.BIDSTable, which is just a subclass of a pandas.DataFrame. Each row in the table corresponds to a BIDS data file, and the columns are organized into groups:

The BIDSTable also makes it easy to access some of the key characteristics of your dataset. The BIDS datatypes, modalities, subjects, and entities present in the dataset are each accessible as properties of the table.

In addition, the associated JSON metadata for each file can be conveniently accessed via the BIDSTable.flat_meta property.

Filtering

To filter the table for files matching certain criteria, you can use the BIDSTable.filter method which selects for rows based on whether a specified column meets a condition.

filtered = (
    tab
    .filter("task", "rest")
    .filter("sub", items=["04", "06"])
    .filter("RepetitionTime", 2.5)
)

To apply multiple filters at once, you can also use BIDSTable.filter_multi

filtered = tab.filter_multi(
    task="rest"
    sub={"items": ["04", "06"]},
    RepetitionTime=2.5,
)

This is similar to the BIDSLayout.get() method in PyBIDS, where each key=value pair specifies the column to filter on and the condition to apply.

Advanced usage

For more advanced usage that goes beyond what's supported in this higher-level interface, you can also interact directly with the underlying DataFrame.

The column labels of the raw table indicate the group as a prefix, e.g. ent__* for BIDS entities. However, you may find one of the alternative views of the table more useful:

You should avoid manipulating the table in place if possible, as this may interfere with the higher-level accessors. If you must manipulate in place, consider converting the BIDSTable to a plain DataFrame first.

df = pd.DataFrame(tab)
  1r"""
  2[bids2table](https://github.com/childmindresearch/bids2table) is a library for efficiently
  3indexing and querying large-scale BIDS neuroimaging datasets and derivatives. It aims to
  4improve upon the efficiency of [PyBIDS](https://github.com/bids-standard/pybids) by
  5leveraging modern data science tools.
  6
  7bids2table represents a BIDS dataset index as a single table with columns for BIDS
  8entities and file metadata. The index is constructed using
  9[Arrow](https://arrow.apache.org/) and stored in [Parquet](https://parquet.apache.org/)
 10format, a binary tabular file format optimized for efficient storage and retrieval.
 11
 12## Installation
 13
 14A pre-release version of bids2table can be installed with
 15
 16```sh
 17pip install bids2table
 18```
 19
 20The latest development version can be installed with
 21
 22```sh
 23pip install git+https://github.com/childmindresearch/bids2table.git
 24```
 25
 26## Quickstart
 27
 28The main entrypoint to the library is the `bids2table.bids2table` function, which builds
 29the index.
 30
 31```python
 32tab = bids2table("path/to/dataset")
 33```
 34
 35You can also build the index in parallel
 36
 37```python
 38tab = bids2table("path/to/dataset", workers=8)
 39```
 40
 41To save the index to disk as a [Parquet](https://parquet.apache.org/) dataset for later
 42reuse, run
 43
 44```python
 45tab = bids2table("path/to/dataset", persistent=True)
 46```
 47
 48By default this saves the index to an `index.b2t` directory under the dataset root
 49directory. To change the output destination, use the `index_path` argument.
 50
 51To generate and save an index from the command line, you can use the `bids2table` CLI.
 52
 53```sh
 54usage: bids2table [-h] [--output OUTPUT] [--incremental] [--overwrite] [--workers COUNT]
 55                  [--worker_id RANK] [--verbose]
 56                  ROOT
 57```
 58
 59See `bids2table --help` for more information.
 60
 61## Table representation
 62
 63The generated index is represented as a `bids2table.BIDSTable`, which is just a subclass
 64of a `pandas.DataFrame`. Each row in the table corresponds to a BIDS data file, and the
 65columns are organized into groups:
 66
 67- dataset (`BIDSTable.ds`): dataset name, relative dataset path, and the JSON dataset
 68description
 69- entities (`BIDSTable.ent`): All [valid BIDS
 70entities](https://bids-specification.readthedocs.io/en/stable/appendices/entities.html)
 71plus an `extra_entities` dict containing any extra entities
 72- metadata (`BIDSTable.meta`): BIDS JSON metadata
 73- file info (`BIDSTable.finfo`): General file info including the full file path and last
 74modified time
 75
 76The `BIDSTable` also makes it easy to access some of the key characteristics of your
 77dataset. The BIDS datatypes, modalities, subjects, and entities present in the dataset
 78are each accessible as properties of the table.
 79
 80In addition, the associated JSON metadata for each file can be conveniently accessed via
 81the `BIDSTable.flat_meta` property.
 82
 83### Filtering
 84
 85To filter the table for files matching certain criteria, you can use the
 86`BIDSTable.filter` method which selects for rows based on whether a specified column
 87meets a condition.
 88
 89```python
 90filtered = (
 91    tab
 92    .filter("task", "rest")
 93    .filter("sub", items=["04", "06"])
 94    .filter("RepetitionTime", 2.5)
 95)
 96```
 97
 98To apply multiple filters at once, you can also use `BIDSTable.filter_multi`
 99
100```python
101
102filtered = tab.filter_multi(
103    task="rest"
104    sub={"items": ["04", "06"]},
105    RepetitionTime=2.5,
106)
107```
108
109This is similar to the
110[`BIDSLayout.get()`](https://bids-standard.github.io/pybids/generated/bids.layout.BIDSLayout.html#bids.layout.BIDSLayout)
111method in [`PyBIDS`](https://bids-standard.github.io/pybids/), where each `key=value`
112pair specifies the column to filter on and the condition to apply.
113
114### Advanced usage
115
116For more advanced usage that goes beyond what's supported in this higher-level
117interface, you can also interact directly with the underlying
118[`DataFrame`](https://pandas.pydata.org/docs/user_guide/index.html).
119
120The column labels of the raw table indicate the group as a prefix, e.g. `ent__*` for
121BIDS entities. However, you may find one of the alternative views of the table more
122useful:
123
124- `BIDSTable.nested`: Columns organized as a nested pandas
125[`MultiIndex`](https://pandas.pydata.org/docs/user_guide/advanced.html#hierarchical-indexing-multiindex).
126- `BIDSTable.flat`: Flattened columns without any nesting or group prefix.
127
128.. warning::
129    You should avoid manipulating the table in place if possible, as this may interfere
130    with the higher-level accessors. If you must manipulate in place, consider
131    converting the `BIDSTable` to a plain `DataFrame` first.
132
133    ```python
134    df = pd.DataFrame(tab)
135    ```
136"""
137
138# Register elbow extension types
139import elbow.dtypes  # noqa
140
141from ._b2t import bids2table
142from ._version import __version__, __version_tuple__  # noqa
143from .entities import BIDSEntities, parse_bids_entities
144from .table import BIDSFile, BIDSTable, join_bids_path
145
146__all__ = [
147    "bids2table",
148    "BIDSTable",
149    "BIDSFile",
150    "BIDSEntities",
151    "parse_bids_entities",
152    "join_bids_path",
153]
def bids2table( root: Union[str, pathlib.Path], *, with_meta: bool = True, persistent: bool = False, index_path: Union[str, pathlib.Path, NoneType] = None, exclude: Optional[List[str]] = None, incremental: bool = False, overwrite: bool = False, workers: Optional[int] = None, worker_id: Optional[int] = None, return_table: bool = True) -> Optional[bids2table.BIDSTable]:
 17def bids2table(
 18    root: StrOrPath,
 19    *,
 20    with_meta: bool = True,
 21    persistent: bool = False,
 22    index_path: Optional[StrOrPath] = None,
 23    exclude: Optional[List[str]] = None,
 24    incremental: bool = False,
 25    overwrite: bool = False,
 26    workers: Optional[int] = None,
 27    worker_id: Optional[int] = None,
 28    return_table: bool = True,
 29) -> Optional[BIDSTable]:
 30    """
 31    Index a BIDS dataset directory and load as a pandas DataFrame.
 32
 33    Args:
 34        root: path to BIDS dataset
 35        with_meta: extract JSON sidecar metadata. Excluding metadata can result in much
 36            faster indexing.
 37        persistent: whether to save index to disk as a Parquet dataset
 38        index_path: path to BIDS Parquet index to generate or load. Defaults to `root /
 39            "index.b2t"`. Index generation requires `persistent=True`.
 40        exclude: Optional list of directory names or glob patterns to exclude from indexing.
 41        incremental: update index incrementally with only new or changed files.
 42        overwrite: overwrite previous index.
 43        workers: number of parallel processes. If `None` or 1, run in the main
 44            process. Setting to <= 0 runs as many processes as there are cores
 45            available.
 46        worker_id: optional worker ID to use when scheduling parallel tasks externally.
 47            Specifying the number of workers is required in this case. Incompatible with
 48            overwrite.
 49        return_table: whether to return the BIDS table or just build the persistent
 50            index.
 51
 52    Returns:
 53        A `BIDSTable` representing the indexed dataset(s), or `None` if `return_table`
 54        is `False`.
 55    """
 56    if not (return_table or persistent):
 57        raise ValueError("persistent and return_table should not both be False")
 58
 59    root = Path(root).expanduser().resolve()
 60    if not root.is_dir():
 61        raise FileNotFoundError(f"root directory {root} does not exists")
 62
 63    if exclude is None:
 64        exclude = []
 65
 66    source = Crawler(
 67        root=root,
 68        include=["sub-*"],  # find subject dirs
 69        skip=["sub-*"] + exclude,  # but don't crawl into subject dirs
 70        dirs_only=True,
 71        follow_links=True,
 72    )
 73    extract = partial(extract_bids_subdir, exclude=exclude, with_meta=with_meta)
 74
 75    if index_path is None:
 76        index_path = root / "index.b2t"
 77    else:
 78        index_path = Path(index_path).resolve()
 79
 80    stale = overwrite or incremental or worker_id is not None
 81    if index_path.exists() and not stale:
 82        if return_table:
 83            logger.info("Loading cached index %s", index_path)
 84            tab = BIDSTable.from_parquet(index_path)
 85        else:
 86            logger.info("Found cached index %s; nothing to do", index_path)
 87            tab = None
 88        return tab
 89
 90    if not persistent:
 91        logger.info("Building index in memory")
 92        df = build_table(
 93            source=source,
 94            extract=extract,
 95            workers=workers,
 96            worker_id=worker_id,
 97        )
 98        tab = BIDSTable.from_df(df)
 99        return tab
100
101    logger.info("Building persistent Parquet index")
102    build_parquet(
103        source=source,
104        extract=extract,
105        output=index_path,
106        incremental=incremental,
107        overwrite=overwrite,
108        workers=workers,
109        worker_id=worker_id,
110        path_column="file__file_path",
111        mtime_column="file__mod_time",
112    )
113    tab = BIDSTable.from_parquet(index_path) if return_table else None
114    return tab

Index a BIDS dataset directory and load as a pandas DataFrame.

Arguments:
  • root: path to BIDS dataset
  • with_meta: extract JSON sidecar metadata. Excluding metadata can result in much faster indexing.
  • persistent: whether to save index to disk as a Parquet dataset
  • index_path: path to BIDS Parquet index to generate or load. Defaults to root / "index.b2t". Index generation requires persistent=True.
  • exclude: Optional list of directory names or glob patterns to exclude from indexing.
  • incremental: update index incrementally with only new or changed files.
  • overwrite: overwrite previous index.
  • workers: number of parallel processes. If None or 1, run in the main process. Setting to <= 0 runs as many processes as there are cores available.
  • worker_id: optional worker ID to use when scheduling parallel tasks externally. Specifying the number of workers is required in this case. Incompatible with overwrite.
  • return_table: whether to return the BIDS table or just build the persistent index.
Returns:

A BIDSTable representing the indexed dataset(s), or None if return_table is False.

class BIDSTable(pandas.core.frame.DataFrame):
 13class BIDSTable(pd.DataFrame):
 14    """
 15    A table representing one or more BIDS datasets.
 16    """
 17
 18    @cached_property
 19    def ds(self) -> pd.DataFrame:
 20        """
 21        The dataset (`ds`) subtable.
 22        """
 23        return self.nested["ds"]
 24
 25    @cached_property
 26    def ent(self) -> pd.DataFrame:
 27        """
 28        The entities (`ent`) subtable.
 29        """
 30        return self.nested["ent"]
 31
 32    @cached_property
 33    def meta(self) -> pd.DataFrame:
 34        """
 35        The metadata (`meta`) subtable.
 36        """
 37        return self.nested["meta"]
 38
 39    @cached_property
 40    def finfo(self) -> pd.DataFrame:
 41        """
 42        The file info (`finfo`) subtable.
 43        """
 44        return self.nested["finfo"]
 45
 46    @cached_property
 47    def files(self) -> List["BIDSFile"]:
 48        """
 49        Convert the table to a list of structured `BIDSFile`s.
 50        """
 51
 52        def to_dict(val):
 53            if pd.isna(val):
 54                return {}
 55            return dict(val)
 56
 57        return [
 58            BIDSFile(
 59                dataset=row["ds"]["dataset"],
 60                root=Path(row["ds"]["dataset_path"]),
 61                path=Path(row["finfo"]["file_path"]),
 62                entities=BIDSEntities.from_dict(row["ent"]),
 63                metadata=to_dict(row["meta"]["json"]),
 64            )
 65            for _, row in self.nested.iterrows()
 66        ]
 67
 68    @cached_property
 69    def datatypes(self) -> List[str]:
 70        """
 71        Get all datatypes present in the table.
 72        """
 73        return self.ent["datatype"].unique().tolist()
 74
 75    @cached_property
 76    def modalities(self) -> List[str]:
 77        """
 78        Get all modalities present in the table.
 79        """
 80        # TODO: Is this the right way to get the modality
 81        return self.ent["mod"].unique().tolist()
 82
 83    @cached_property
 84    def subjects(self) -> List[str]:
 85        """
 86        Get all unique subjects in the table.
 87        """
 88        return self.ent["sub"].unique().tolist()
 89
 90    @cached_property
 91    def entities(self) -> List[str]:
 92        """
 93        Get all entity keys with at least one non-NA entry in the table.
 94        """
 95        entities = self.ent.dropna(axis=1, how="all").columns.tolist()
 96        special = set(BIDSEntities.special())
 97        return [key for key in entities if key not in special]
 98
 99    @cached_property
100    def flat_meta(self) -> pd.DataFrame:
101        """
102        A table of flattened JSON metadata where each metadata field is converted to its
103        own column, with nested levels separated by `'.'`.
104
105        See also:
106
107        - [`pd.json_normalize`](https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html):
108        more general function in pandas.
109        """
110        # Need to replace None with empty dict for max_level=0 to work.
111        metadata = pd.json_normalize(
112            self["meta__json"].map(lambda v: v or {}), max_level=0
113        )
114        metadata.index = self.index
115        return metadata
116
117    @cached_property
118    def nested(self) -> pd.DataFrame:
119        """
120        A copy of the table with column labels organized in a nested
121        [`MultiIndex`](https://pandas.pydata.org/docs/user_guide/advanced.html#hierarchical-indexing-multiindex).
122        """
123        # Cast back to the base class since we no longer have the full BIDS table
124        # structure.
125        return pd.DataFrame(flat_to_multi_columns(self))
126
127    @cached_property
128    def flat(self) -> pd.DataFrame:
129        """
130        A copy of the table with subtable prefixes e.g. `ds__`, `ent__` removed.
131        """
132        return self.nested.droplevel(0, axis=1)
133
134    def filter(
135        self,
136        key: str,
137        value: Optional[Any] = None,
138        *,
139        items: Optional[Iterable[Any]] = None,
140        contains: Optional[str] = None,
141        regex: Optional[str] = None,
142        func: Optional[Callable[[Any], bool]] = None,
143    ) -> "BIDSTable":
144        """
145        Filter the rows of the table.
146
147        Args:
148            key: Column to filter. Can be a metadata field, BIDS entity name, or any
149                unprefixed column label in the `flat` table.
150            value: Keep rows with this exact value.
151            items: Keep rows whose value is in `items`.
152            contains: Keep rows whose value contains `contains` (string only).
153            regex: Keep rows whose value matches `regex` (string only).
154            func: Apply an arbitrary function and keep values that evaluate to `True`.
155
156        Returns:
157            A filtered BIDS table.
158
159        Example::
160            filtered = (
161                tab
162                .filter("task", "rest")
163                .filter("sub", items=["04", "06"])
164                .filter("RepetitionTime", 2.0)
165            )
166        """
167        # NOTE: Should be careful about reinventing a new style of query API. There are
168        # some obvious things this can't do:
169        #   - comparison operators <, >, <=, >=
170        #   - negation
171        #   - combining filters with 'or' instead of 'and'
172        # At the bottom of this rabbit hole are more general query interfaces like those
173        # already implemented in pandas, duckdb, polars. The goal should be not to
174        # create a new one, but to make the 95% of use cases as easy as possible, and
175        # empower users to interact with the underlying table using their more powerful
176        # tool of choice if necessary.
177        if sum(k is not None for k in [value, items, contains, regex, func]) != 1:
178            raise ValueError(
179                "Exactly one of value, items, contains, regex, or func must not be None"
180            )
181
182        try:
183            # JSON metadata field
184            # NOTE: Assuming all JSON metadata fields are uppercase.
185            if key[:1].isupper():
186                col = self.flat_meta[key]
187            # Long name entity
188            elif key in ENTITY_NAMES_TO_KEYS:
189                col = self.ent[ENTITY_NAMES_TO_KEYS[key]]
190            # Any other unprefixed column
191            else:
192                col = self.flat[key]
193        except KeyError as exc:
194            raise KeyError(
195                f"Invalid key {key}; expected a valid BIDS entity or metadata field "
196                "present in the dataset"
197            ) from exc
198
199        if value is not None:
200            mask = col == value
201        elif items is not None:
202            mask = col.isin(items)
203        elif contains is not None:
204            mask = col.str.contains(contains)
205        elif regex is not None:
206            mask = col.str.match(regex)
207        else:
208            mask = col.apply(func)
209        mask = mask.fillna(False).astype(bool)
210
211        return self.loc[mask]
212
213    def filter_multi(self, **filters) -> "BIDSTable":
214        """
215        Apply multiple filters to the table sequentially.
216
217        Args:
218            filters: A mapping of column labels to queries. Each query can either be
219                a single value for an exact equality check or a `dict` for a more
220                complex query, e.g. `{"items": [1, 2, 3]}`, that's passed through to
221                `filter`.
222
223        Returns:
224            A filtered BIDS table.
225
226        Example::
227            filtered = tab.filter_multi(
228                task="rest"
229                sub={"items": ["04", "06"]},
230                RepetitionTime=2.0,
231            )
232        """
233        tab = self.copy(deep=False)
234
235        for k, query in filters.items():
236            if not isinstance(query, dict):
237                query = {"value": query}
238            tab = tab.filter(k, **query)
239        return tab
240
241    def sort_entities(
242        self, by: Union[str, List[str]], inplace: bool = False
243    ) -> "BIDSTable":
244        """
245        Sort the values of the table by entities.
246
247        Args:
248            by: label or list of labels. Can be `"dataset"` or a short or long entity
249                name.
250            inplace: sort the table in place
251
252        Returns:
253            A sorted BIDS table.
254        """
255        if isinstance(by, str):
256            by = [by]
257
258        # TODO: what about sorting by other columns, e.g. file_path?
259        def add_prefix(k: str):
260            if k == "dataset":
261                k = f"ds__{k}"
262            elif k in ENTITY_NAMES_TO_KEYS:
263                k = f"ent__{ENTITY_NAMES_TO_KEYS[k]}"
264            else:
265                k = f"ent__{k}"
266            return k
267
268        by = [add_prefix(k) for k in by]
269        out = self.sort_values(by, inplace=inplace)
270        if inplace:
271            return self
272        return out
273
274    def with_meta(self, inplace: bool = False) -> "BIDSTable":
275        """
276        Returns a new BIDS table complete with JSON sidecar metadata.
277        """
278        out = self if inplace else self.copy()
279        file_paths = out.finfo["file_path"]
280        meta_json = file_paths.apply(lambda path: extract_metadata(path)["json"])
281        out.loc[:, "meta__json"] = meta_json
282        return out
283
284    @classmethod
285    def from_df(cls, df: pd.DataFrame) -> "BIDSTable":
286        """
287        Create a BIDS table from a pandas `DataFrame` generated by `bids2table`.
288        """
289        return cls(df)
290
291    @classmethod
292    def from_parquet(cls, path: Path) -> "BIDSTable":
293        """
294        Read a BIDS table from a Parquet file or dataset directory generated by
295        `bids2table`.
296        """
297        df = pd.read_parquet(path)
298        return cls.from_df(df)
299
300    @property
301    def _constructor(self):
302        # Makes sure that dataframe slices return a subclass instance
303        # https://pandas.pydata.org/docs/development/extending.html#override-constructor-properties
304        return BIDSTable

A table representing one or more BIDS datasets.

ds: pandas.core.frame.DataFrame

The dataset (ds) subtable.

ent: pandas.core.frame.DataFrame

The entities (ent) subtable.

meta: pandas.core.frame.DataFrame

The metadata (meta) subtable.

finfo: pandas.core.frame.DataFrame

The file info (finfo) subtable.

files: List[bids2table.BIDSFile]

Convert the table to a list of structured BIDSFiles.

datatypes: List[str]

Get all datatypes present in the table.

modalities: List[str]

Get all modalities present in the table.

subjects: List[str]

Get all unique subjects in the table.

entities: List[str]

Get all entity keys with at least one non-NA entry in the table.

flat_meta: pandas.core.frame.DataFrame

A table of flattened JSON metadata where each metadata field is converted to its own column, with nested levels separated by '.'.

See also:

nested: pandas.core.frame.DataFrame

A copy of the table with column labels organized in a nested MultiIndex.

flat: pandas.core.frame.DataFrame

A copy of the table with subtable prefixes e.g. ds__, ent__ removed.

def filter( self, key: str, value: Optional[Any] = None, *, items: Optional[Iterable[Any]] = None, contains: Optional[str] = None, regex: Optional[str] = None, func: Optional[Callable[[Any], bool]] = None) -> bids2table.BIDSTable:
134    def filter(
135        self,
136        key: str,
137        value: Optional[Any] = None,
138        *,
139        items: Optional[Iterable[Any]] = None,
140        contains: Optional[str] = None,
141        regex: Optional[str] = None,
142        func: Optional[Callable[[Any], bool]] = None,
143    ) -> "BIDSTable":
144        """
145        Filter the rows of the table.
146
147        Args:
148            key: Column to filter. Can be a metadata field, BIDS entity name, or any
149                unprefixed column label in the `flat` table.
150            value: Keep rows with this exact value.
151            items: Keep rows whose value is in `items`.
152            contains: Keep rows whose value contains `contains` (string only).
153            regex: Keep rows whose value matches `regex` (string only).
154            func: Apply an arbitrary function and keep values that evaluate to `True`.
155
156        Returns:
157            A filtered BIDS table.
158
159        Example::
160            filtered = (
161                tab
162                .filter("task", "rest")
163                .filter("sub", items=["04", "06"])
164                .filter("RepetitionTime", 2.0)
165            )
166        """
167        # NOTE: Should be careful about reinventing a new style of query API. There are
168        # some obvious things this can't do:
169        #   - comparison operators <, >, <=, >=
170        #   - negation
171        #   - combining filters with 'or' instead of 'and'
172        # At the bottom of this rabbit hole are more general query interfaces like those
173        # already implemented in pandas, duckdb, polars. The goal should be not to
174        # create a new one, but to make the 95% of use cases as easy as possible, and
175        # empower users to interact with the underlying table using their more powerful
176        # tool of choice if necessary.
177        if sum(k is not None for k in [value, items, contains, regex, func]) != 1:
178            raise ValueError(
179                "Exactly one of value, items, contains, regex, or func must not be None"
180            )
181
182        try:
183            # JSON metadata field
184            # NOTE: Assuming all JSON metadata fields are uppercase.
185            if key[:1].isupper():
186                col = self.flat_meta[key]
187            # Long name entity
188            elif key in ENTITY_NAMES_TO_KEYS:
189                col = self.ent[ENTITY_NAMES_TO_KEYS[key]]
190            # Any other unprefixed column
191            else:
192                col = self.flat[key]
193        except KeyError as exc:
194            raise KeyError(
195                f"Invalid key {key}; expected a valid BIDS entity or metadata field "
196                "present in the dataset"
197            ) from exc
198
199        if value is not None:
200            mask = col == value
201        elif items is not None:
202            mask = col.isin(items)
203        elif contains is not None:
204            mask = col.str.contains(contains)
205        elif regex is not None:
206            mask = col.str.match(regex)
207        else:
208            mask = col.apply(func)
209        mask = mask.fillna(False).astype(bool)
210
211        return self.loc[mask]

Filter the rows of the table.

Arguments:
  • key: Column to filter. Can be a metadata field, BIDS entity name, or any unprefixed column label in the flat table.
  • value: Keep rows with this exact value.
  • items: Keep rows whose value is in items.
  • contains: Keep rows whose value contains contains (string only).
  • regex: Keep rows whose value matches regex (string only).
  • func: Apply an arbitrary function and keep values that evaluate to True.
Returns:

A filtered BIDS table.

Example:: filtered = ( tab .filter("task", "rest") .filter("sub", items=["04", "06"]) .filter("RepetitionTime", 2.0) )

def filter_multi(self, **filters) -> bids2table.BIDSTable:
213    def filter_multi(self, **filters) -> "BIDSTable":
214        """
215        Apply multiple filters to the table sequentially.
216
217        Args:
218            filters: A mapping of column labels to queries. Each query can either be
219                a single value for an exact equality check or a `dict` for a more
220                complex query, e.g. `{"items": [1, 2, 3]}`, that's passed through to
221                `filter`.
222
223        Returns:
224            A filtered BIDS table.
225
226        Example::
227            filtered = tab.filter_multi(
228                task="rest"
229                sub={"items": ["04", "06"]},
230                RepetitionTime=2.0,
231            )
232        """
233        tab = self.copy(deep=False)
234
235        for k, query in filters.items():
236            if not isinstance(query, dict):
237                query = {"value": query}
238            tab = tab.filter(k, **query)
239        return tab

Apply multiple filters to the table sequentially.

Arguments:
  • filters: A mapping of column labels to queries. Each query can either be a single value for an exact equality check or a dict for a more complex query, e.g. {"items": [1, 2, 3]}, that's passed through to filter.
Returns:

A filtered BIDS table.

Example:: filtered = tab.filter_multi( task="rest" sub={"items": ["04", "06"]}, RepetitionTime=2.0, )

def sort_entities( self, by: Union[str, List[str]], inplace: bool = False) -> bids2table.BIDSTable:
241    def sort_entities(
242        self, by: Union[str, List[str]], inplace: bool = False
243    ) -> "BIDSTable":
244        """
245        Sort the values of the table by entities.
246
247        Args:
248            by: label or list of labels. Can be `"dataset"` or a short or long entity
249                name.
250            inplace: sort the table in place
251
252        Returns:
253            A sorted BIDS table.
254        """
255        if isinstance(by, str):
256            by = [by]
257
258        # TODO: what about sorting by other columns, e.g. file_path?
259        def add_prefix(k: str):
260            if k == "dataset":
261                k = f"ds__{k}"
262            elif k in ENTITY_NAMES_TO_KEYS:
263                k = f"ent__{ENTITY_NAMES_TO_KEYS[k]}"
264            else:
265                k = f"ent__{k}"
266            return k
267
268        by = [add_prefix(k) for k in by]
269        out = self.sort_values(by, inplace=inplace)
270        if inplace:
271            return self
272        return out

Sort the values of the table by entities.

Arguments:
  • by: label or list of labels. Can be "dataset" or a short or long entity name.
  • inplace: sort the table in place
Returns:

A sorted BIDS table.

def with_meta(self, inplace: bool = False) -> bids2table.BIDSTable:
274    def with_meta(self, inplace: bool = False) -> "BIDSTable":
275        """
276        Returns a new BIDS table complete with JSON sidecar metadata.
277        """
278        out = self if inplace else self.copy()
279        file_paths = out.finfo["file_path"]
280        meta_json = file_paths.apply(lambda path: extract_metadata(path)["json"])
281        out.loc[:, "meta__json"] = meta_json
282        return out

Returns a new BIDS table complete with JSON sidecar metadata.

@classmethod
def from_df(cls, df: pandas.core.frame.DataFrame) -> bids2table.BIDSTable:
284    @classmethod
285    def from_df(cls, df: pd.DataFrame) -> "BIDSTable":
286        """
287        Create a BIDS table from a pandas `DataFrame` generated by `bids2table`.
288        """
289        return cls(df)

Create a BIDS table from a pandas DataFrame generated by bids2table.

@classmethod
def from_parquet(cls, path: pathlib.Path) -> bids2table.BIDSTable:
291    @classmethod
292    def from_parquet(cls, path: Path) -> "BIDSTable":
293        """
294        Read a BIDS table from a Parquet file or dataset directory generated by
295        `bids2table`.
296        """
297        df = pd.read_parquet(path)
298        return cls.from_df(df)

Read a BIDS table from a Parquet file or dataset directory generated by bids2table.

Inherited Members
pandas.core.frame.DataFrame
DataFrame
axes
shape
to_string
style
items
iterrows
itertuples
dot
from_dict
to_numpy
to_dict
to_gbq
from_records
to_records
to_stata
to_feather
to_markdown
to_parquet
to_orc
to_html
to_xml
info
memory_usage
transpose
T
isetitem
query
eval
select_dtypes
insert
assign
set_axis
reindex
drop
rename
pop
shift
set_index
reset_index
isna
isnull
notna
notnull
dropna
drop_duplicates
duplicated
sort_values
sort_index
value_counts
nlargest
nsmallest
swaplevel
reorder_levels
eq
ne
le
lt
ge
gt
add
radd
sub
subtract
rsub
mul
multiply
rmul
truediv
div
divide
rtruediv
rdiv
floordiv
rfloordiv
mod
rmod
pow
rpow
compare
combine
combine_first
update
groupby
pivot
pivot_table
stack
explode
unstack
melt
diff
aggregate
agg
transform
apply
map
applymap
join
merge
round
corr
cov
corrwith
count
any
all
min
max
sum
prod
mean
median
sem
var
std
skew
kurt
kurtosis
product
cummin
cummax
cumsum
cumprod
nunique
idxmin
idxmax
mode
quantile
to_timestamp
to_period
isin
index
columns
plot
hist
boxplot
sparse
values
pandas.core.generic.NDFrame
attrs
flags
set_flags
ndim
size
swapaxes
droplevel
squeeze
rename_axis
equals
bool
abs
keys
empty
to_excel
to_json
to_hdf
to_sql
to_pickle
to_clipboard
to_xarray
to_latex
to_csv
take
xs
get
reindex_like
add_prefix
add_suffix
head
tail
sample
pipe
dtypes
astype
copy
infer_objects
convert_dtypes
fillna
ffill
pad
bfill
backfill
replace
interpolate
asof
clip
asfreq
at_time
between_time
resample
first
last
rank
align
where
mask
truncate
tz_convert
tz_localize
describe
pct_change
rolling
expanding
ewm
first_valid_index
last_valid_index
pandas.core.indexing.IndexingMixin
iloc
loc
at
iat
@dataclass
class BIDSFile:
307@dataclass
308class BIDSFile:
309    """
310    A structured BIDS file.
311    """
312
313    dataset: str
314    """Parent BIDS dataset."""
315    root: Path
316    """Path to parent dataset."""
317    path: Path
318    """File path."""
319    entities: BIDSEntities
320    """BIDS entities."""
321    metadata: Dict[str, Any] = field(default_factory=dict)
322    """BIDS JSON metadata."""
323
324    @property
325    def relative_path(self) -> Path:
326        """
327        The file path relative to the dataset root.
328        """
329        return self.path.relative_to(self.root)

A structured BIDS file.

BIDSFile( dataset: str, root: pathlib.Path, path: pathlib.Path, entities: bids2table.BIDSEntities, metadata: Dict[str, Any] = <factory>)
dataset: str

Parent BIDS dataset.

root: pathlib.Path

Path to parent dataset.

path: pathlib.Path

File path.

BIDS entities.

metadata: Dict[str, Any]

BIDS JSON metadata.

relative_path: pathlib.Path

The file path relative to the dataset root.

class BIDSEntities(bids2table.entities._BIDSEntitiesBase):
BIDSEntities( sub: str, ses: Optional[str] = None, datatype: Optional[str] = None, suffix: Optional[str] = None, ext: Optional[str] = None, extra_entities: Optional[Dict[str, Union[str, int]]] = <factory>, sample: Optional[str] = None, task: Optional[str] = None, tracksys: Optional[str] = None, acq: Optional[str] = None, ce: Optional[str] = None, trc: Optional[str] = None, stain: Optional[str] = None, rec: Optional[str] = None, dir: Optional[str] = None, run: Optional[int] = None, mod: Optional[str] = None, echo: Optional[int] = None, flip: Optional[int] = None, inv: Optional[int] = None, mt: Optional[str] = None, part: Optional[str] = None, proc: Optional[str] = None, hemi: Optional[str] = None, space: Optional[str] = None, split: Optional[int] = None, recording: Optional[str] = None, chunk: Optional[int] = None, seg: Optional[str] = None, res: Optional[str] = None, den: Optional[str] = None, label: Optional[str] = None, desc: Optional[str] = None)
sample: Optional[str] = None
task: Optional[str] = None
tracksys: Optional[str] = None
acq: Optional[str] = None
ce: Optional[str] = None
trc: Optional[str] = None
stain: Optional[str] = None
rec: Optional[str] = None
dir: Optional[str] = None
run: Optional[int] = None
mod: Optional[str] = None
echo: Optional[int] = None
flip: Optional[int] = None
inv: Optional[int] = None
mt: Optional[str] = None
part: Optional[str] = None
proc: Optional[str] = None
hemi: Optional[str] = None
space: Optional[str] = None
split: Optional[int] = None
recording: Optional[str] = None
chunk: Optional[int] = None
seg: Optional[str] = None
res: Optional[str] = None
den: Optional[str] = None
label: Optional[str] = None
desc: Optional[str] = None
Inherited Members
bids2table.entities._BIDSEntitiesBase
sub
ses
datatype
suffix
ext
extra_entities
special
from_dict
from_path
to_dict
to_path
with_update
@lru_cache()
def parse_bids_entities(path: Union[str, pathlib.Path]) -> Dict[str, str]:
291@lru_cache()
292def parse_bids_entities(path: StrOrPath) -> Dict[str, str]:
293    """
294    Parse all BIDS filename ``"{key}-{value}"`` entities as well as special entities:
295
296        - datatype
297        - suffix
298        - ext (extension)
299
300    from the file path.
301
302    .. note:: This function does not validate entities.
303    """
304    path = Path(path)
305    entities = {}
306
307    # datatype
308    match = re.search(
309        f"/({'|'.join(BIDS_DATATYPES)})/",
310        path.as_posix(),
311    )
312    datatype = match.group(1) if match is not None else None
313
314    filename = path.name
315    parts = filename.split("_")
316
317    # suffix and extension
318    suffix_ext = parts.pop()
319    idx = suffix_ext.find(".")
320    if idx < 0:
321        suffix, ext = suffix_ext, None
322    else:
323        suffix, ext = suffix_ext[:idx], suffix_ext[idx:]
324
325    # suffix is actually an entity, put back in list
326    if "-" in suffix:
327        parts.append(suffix)
328        suffix = None
329
330    # parse entities
331    for part in parts:
332        if "-" in part:
333            key, val = part.split("-", maxsplit=1)
334        else:
335            key, val = part, ""
336        entities[key] = val
337
338    for k, v in zip(["datatype", "suffix", "ext"], [datatype, suffix, ext]):
339        if v is not None:
340            entities[k] = v
341    return entities

Parse all BIDS filename "{key}-{value}" entities as well as special entities:

- datatype
- suffix
- ext (extension)

from the file path.

This function does not validate entities.
def join_bids_path( row: Union[pandas.core.series.Series, Dict[str, Any]], prefix: Union[str, pathlib.Path, NoneType] = None, valid_only: bool = True) -> pathlib.Path:
373def join_bids_path(
374    row: Union[pd.Series, Dict[str, Any]],
375    prefix: Optional[Union[str, Path]] = None,
376    valid_only: bool = True,
377) -> Path:
378    """
379    Reconstruct a BIDS path from a table row or entities dict.
380
381    Args:
382        row: row from a `BIDSTable` or `BIDSTable.ent` subtable.
383        prefix: output file prefix path.
384        valid_only: only include valid BIDS entities.
385
386    Example::
387
388        tab = BIDSTable.from_parquet("dataset/index.b2t")
389        paths = tab.apply(join_bids_path, axis=1)
390    """
391    # Filter in case input is a row from the raw dataframe and not the entities group.
392    row = _filter_row(row, group="ent")
393    entities = BIDSEntities.from_dict(row, valid_only=valid_only)
394    path = entities.to_path(prefix=prefix, valid_only=valid_only)
395    return path

Reconstruct a BIDS path from a table row or entities dict.

Arguments:
  • row: row from a BIDSTable or BIDSTable.ent subtable.
  • prefix: output file prefix path.
  • valid_only: only include valid BIDS entities.

Example::

tab = BIDSTable.from_parquet("dataset/index.b2t")
paths = tab.apply(join_bids_path, axis=1)