RFC: keeping track of subject ID

I hit a tricky problem recently when trying to monkey-patch the funloc data to emulate session-specific MRIs. I don't think there's actually any bug anywhere, but things are odd enough that I thought it was worth documenting what I encountered, and seeing what others think about it. Here's what I noticed:

1. when reading raw data, MNE-BIDS overwrites `raw.info["subject_info"]["his_id"]` with the value from the `participant_id` column of `participants.tsv`. This is not unreasonable, but surprised me, especially given that (I think?) it doesn't warn in cases where they don't match.
2. from the initial `raw`, `info["subject_info"]["his_id"]` gets propogated through `proc-sss`, `proc-filt`, `proc-clean_epo`, and `_ave`, as expected.
3. when we get to source space, the BIDS subject gets passed to `get_fs_subject()` which populates `config.fs_subject` (and might be, e.g., a template MRI, AKA a different subject name/ID). That is then passed to `mne.setup_source_spaces` and ends up in `surf["subject_his_id"]` for each `surf` in the source space (the first of which is also aliased at `source_space._subject`). `source_space._subject` then gets propogated to `forward["src"]._subject` and from there to `inverse_operator["src"]._subject` and then to `stc.subject`
4. When we go to apply the inverse to an evoked, the evoked will have `subject` coming ultimately from `participants.tsv`, while the `inverse` will have `subject` coming from `config.fs_subject` (which might be different), and `apply_inverse` will complain if they don't match.

I'm pretty sure what went wrong in my case was that the source space object already existed in the dataset (so step (3) above didn't happen), and the existing source space had the wrong subject identifier, which didn't cause problems until `apply_inverse`. I wonder if it's worth:

1. adding a warning to MNE-BIDS when overwriting the `his_id` if it doesn't match what's already in there (not a fix for my "problem", but maybe worth doing anyway)
2. adding a check in the pipeline for whether the subject ID in a *found, loaded* source space matches the subject ID that the pipeline would have assigned if it were creating the source space itself.  It would have saved a lot of debugging time not to have to wait until `apply_inverse` to learn there was a problem.

(Aside): Probably I'm wrong about this, but it seems the current situation would actually not work for cases where, e.g., the freesurfer subjects directory is maintained separately from the experimental data, and has different conventions for subject identifiers (e.g., FS subject `0027` is in one experiment `sub-01` and in another experiment is `sub-05`). My postdoc lab did something like that: all experiments used the same freesurfer `SUBJECTS_DIR`, in order to re-use structural MRIs for folks who participated in multiple experiments. Of course it's easy enough to work around it by making a copy of the subset of freesurfer subjects that you need for the current experiment, and changing folder and file names... but it seems like that shouldn't be necessary.

cc @hoechenberger @sappelhoff @larsoner for ideas/opinions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: keeping track of subject ID #1048

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development