Compare commits

...

22 Commits

Author SHA1 Message Date
be4489dc73 „README.md“ ändern 2021-03-29 16:45:25 +02:00
b621dbf9b0 „README.md“ ändern 2021-03-29 16:35:19 +02:00
f554a5822f Merge branch 'master' of git.ffhartmann.de:Julius/semesterproject_lecture_eeg 2021-03-29 16:32:42 +02:00
dd2df8f1a8 Added the report to the repo 2021-03-29 16:32:28 +02:00
fc8cde3a88 „decoding_tf_analysis.py“ ändern 2021-03-29 16:28:21 +02:00
5f5e7ffcac Fixed merge conflict 2021-03-29 16:27:41 +02:00
d09480c5bb Added the plotting option for log-scaling of the tf spectrum 2021-03-29 16:25:38 +02:00
6161e088c6 „README.md“ ändern 2021-03-29 14:26:05 +02:00
02f604cdeb „README.md“ ändern 2021-03-29 12:43:11 +02:00
0d513bfb96 „README.md“ ändern 2021-03-29 12:42:56 +02:00
6c1a555179 „README.md“ ändern 2021-03-29 12:42:41 +02:00
a5b97a3a65 „decoding_tf_analysis.py“ ändern 2021-03-28 20:14:00 +02:00
7f616a4a34 „README.md“ ändern 2021-03-28 17:36:46 +02:00
106a3ad434 „README.md“ ändern 2021-03-28 17:35:40 +02:00
26dd78a410 „README.md“ ändern 2021-03-28 17:34:48 +02:00
cd20b9b776 „README.md“ ändern 2021-03-28 15:47:41 +02:00
c6a6fd7012 Merge branch 'master' of git.ffhartmann.de:Julius/semesterproject_lecture_eeg 2021-03-28 15:45:23 +02:00
cfa950b85b Edited some comments 2021-03-28 15:45:14 +02:00
86044cc7c7 „Dataset/preprocessed/sub-014_task-N170_cleaned.fif“ löschen 2021-03-28 13:40:50 +02:00
4875cf29cb „Dataset/preprocessed/sub-003_task-N170_cleaned.fif“ löschen 2021-03-28 13:40:42 +02:00
c44cb7cb13 „Dataset/preprocessed/sub-001_task-N170_cleaned.fif“ löschen 2021-03-28 13:40:34 +02:00
6f8c1af6c6 „README.md“ ändern 2021-03-28 04:54:08 +02:00
10 changed files with 108 additions and 41 deletions

View File

@@ -1,21 +1,36 @@
## Semesterproject of the lecture "Semesterproject Signal processing and Analysis ## Semesterproject of the lecture "Semesterproject Signal processing and Analysis of human brain potentials (eeg)" WS 2020/21
of human brain potentials (eeg) WS 2020/21
This repository holds the code of the semesterproject as well as the report. This repository holds the code of the semesterproject as well as the report, created by Julius Voggesberger.
The main files are 'preprocessing_and_cleaning.py', 'erp_analysis.py' and 'decoding_tf_analyis.py'. As the dataset for the project, the N170-dataset was chosen.
The files hold: As the three subjects, to be manually pre-processed, the subjects 001, 003 and 014 were chosen.
- preprocessing_and_cleaning.py : Holds the pre-processing pipeline of the project. By executing the file all subjects are pre-processed. Subjects 001, 003, 014 are pre-processed with manually selected pre-processing information, all other subjects are pre-processed with the given pre-processing information. Pre-processed cleaned data is saved in the BIDS file structure as 'sub-XXX_task-N170_cleaned.fif' where XXX is the subject number. The rest of the subjects were pre-processed with provided pre-processing information.
Details can be found in the comments of the code.
- erp_analysis.py : Holds the code for the erp-analysis. Computes the peak-differences and t-tests for several experimental contrasts. Details can be found in the comments of the code.
- decoding_tf_analysis.py : Holds the code for the decoding and time-frequency analysis. Details can be found in the comments of the code.
The folder 'utils' holds helper functions for some plots needed for the analysis and to load data, generate strings etc. and holds the code given in the lecture. ### Structure
The folder 'test' holds mostly unittests that test helper functions and one function which visually checks if N170 peaks are extracted correctly. ```
├── Dataset: The dataset of the project as well as the manually selected bad segments are stored here.
| ├── n170: Store the dataset here.
| └── preprocessed: Bad segments are stored here.
├── cached_data: Data that is generated in the analysis part is stored here.
| ├── decoding_data: Results of the classifiers.
| ├── erp_peaks: ERP peaks needed for the ERP analysis.
| └── tf_data: Time-frequency data needed for the tf-analysis.
├── test: Contains unittests and one visual check.
├── utils: Contains helper methods
| ├── ccs_eeg_semesterproject: Methods given in the lecture.
| ├── ccs_eeg_utils_reduced: Method for reading in BIDS provided in the lecture.
| ├── file_utils.py: Methods for reading in files and getting epochs.
| └── plot_utils.py: Methods for manually created plots.
├── preprocessing_and_cleaning.py: The preprocessing pipeline.
├── erp_analysis.py: The ERP-Analysis and computation of ERP peaks.
├── decoding_tf_analysis.py: Decoding and time-frequency analysis.
└── semesterproject_report_voggesberger: The report of the project.
```
For the code to work properly, the N170 dataset needs to be provided. ### Running the project
When first running the analysis, it may take a while. After running it one time the data is cached, so that it can be reused if the analysis should be executed again. Be careful though, as a parameter has to be explicitly set in the code, so that the already computed data is used. This parameter is a boolean given to each analysis function which caches data. To run the project python 3.7 is required and anaconda recommended.\
To ensure reproducability, randomstates were used for methods which are non-deterministic.
This code was created using Python 3.7 and the following libraries: The randomstates used are either '123' or '1234'.\
The following libraries are needed:
- Matplotlib 3.3.3 - Matplotlib 3.3.3
- MNE 0.22.0 - MNE 0.22.0
- MNE-Bids 0.6 - MNE-Bids 0.6
@@ -23,3 +38,18 @@ This code was created using Python 3.7 and the following libraries:
- Scikit-Learn 0.23.2 - Scikit-Learn 0.23.2
- Pandas 1.2.0 - Pandas 1.2.0
- Scipy 1.5.4 - Scipy 1.5.4
For the code to work, the N170 dataset needs to be provided and put into the folder 'Dataset/n170/', so that the file structure 'Dataset/n170/sub-001', etc. exists.
The pre-processed raw objects are saved in their respective subject folder, in 'Dataset/n170/'.
When first running the analysis, it may take a while.
After running it one time the data is cached, so that it can be reused if the analysis should be executed again at a later time.
For the cached data to be used, a boolean parameter has to be set in the respective analysis method.
It may be necessary to set the parent directory 'semesterproject_lecture_eeg' as 'Sources Root' for the project, if pycharm is used as an IDE.
### Parameters
Parameters have to be changed manually in the code, if different settings want to be tried.
### Visualisation
The visualisation methods that were used to generate the visualisations in the report, are contained in the code, if they were created manually.
If a visualisation method from mne was used to create the visualisation, it may exist in the code or not.

View File

@@ -46,6 +46,7 @@ def events_to_labels(evts, events_dict, mask=None): # TODO Test schreiben
def permutation_test(baseline, score, n_iter): def permutation_test(baseline, score, n_iter):
""" """
An implementation of a permutation test for classification scores. An implementation of a permutation test for classification scores.
:param baseline: The classification scores of the baseline, i.e. selection by chance :param baseline: The classification scores of the baseline, i.e. selection by chance
:param score: The classification scores which are tested for significance :param score: The classification scores which are tested for significance
:param n_iter: number of permutations :param n_iter: number of permutations
@@ -120,6 +121,8 @@ def decoding(dataset, filename, compute_metric=True, mask=None):
# Compute index of time point 0 # Compute index of time point 0
index = math.floor((len(metric[0]) / time_scale) * 100) index = math.floor((len(metric[0]) / time_scale) * 100)
baseline = np.array(metric[:index]).flatten() baseline = np.array(metric[:index]).flatten()
# Plot the result
plt.plot(np.linspace(-200, 1000, 1127), np.mean(metric, axis=0)) plt.plot(np.linspace(-200, 1000, 1127), np.mean(metric, axis=0))
plt.ylabel('Accuracy (%)') plt.ylabel('Accuracy (%)')
plt.xlabel('Time (ms)') plt.xlabel('Time (ms)')
@@ -129,11 +132,12 @@ def decoding(dataset, filename, compute_metric=True, mask=None):
# Compute the permutation tests # Compute the permutation tests
for t in range(len(metric[0][index:])): for t in range(len(metric[0][index:])):
score_t = np.asarray(metric[:, t + index]) score_t = np.asarray(metric[:, t + index])
p = permutation_test(baseline, score_t, 100) p = permutation_test(baseline, score_t, 1000)
p_values.append(p) p_values.append(p)
if t % 50 == 0: if t % 50 == 0:
print(str(t) + " Out of " + str(len(metric[0][index:]))) print(str(t) + " Out of " + str(len(metric[0][index:])))
# Plot the result
plt.plot(times[index:], p_values) plt.plot(times[index:], p_values)
plt.ylabel('P-Value') plt.ylabel('P-Value')
plt.xlabel('Time (ms)') plt.xlabel('Time (ms)')
@@ -143,14 +147,17 @@ def decoding(dataset, filename, compute_metric=True, mask=None):
def create_tfr(raw, condition, freqs, n_cycles, response='induced', baseline=None, plot=False): def create_tfr(raw, condition, freqs, n_cycles, response='induced', baseline=None, plot=False):
""" """
Compute the time frequency representation (TFR) of data for a given condition via morlet wavelets Compute the time frequency representation (TFR) of data for a given condition via Morlet wavelets
:param raw: the data :param raw: the data
:param condition: the condition for which to compute the TFR. Given as a list of tuples of the form (stimulus, condition) # TODO ambiguous use of condition :param condition: the condition for which to compute the TFR. Given as a list of tuples of the form (stimulus, texture)
:param freqs: the frequencies for which to compute the TFR :param freqs: the frequencies for which to compute the TFR
:param n_cycles: the number of cycles used by the morlet wavelets :param n_cycles: the number of cycles used by the Morlet wavelets
:param response: type of expected TFR. Can be total, induced or evoked. Default is induced :param response: type of expected TFR. Can be total, induced or evoked. Default is induced,
the others were not used for the report, only for exploration
:param baseline: baseline used to correct the power. A tuple of the form (start, end). :param baseline: baseline used to correct the power. A tuple of the form (start, end).
Default is None and no baseline correction will be applid Default is None and no baseline correction will be applied
:param plot: True if results should be plotted, else false.
:return: The TFR or the given data for a given condition. Has type AverageTFR :return: The TFR or the given data for a given condition. Has type AverageTFR
""" """
epochs, _ = get_epochs(raw, condition, tmin=-0.2, tmax=1) epochs, _ = get_epochs(raw, condition, tmin=-0.2, tmax=1)
@@ -168,6 +175,7 @@ def create_tfr(raw, condition, freqs, n_cycles, response='induced', baseline=Non
power_induced = tfr_morlet(epochs.subtract_evoked(), freqs=freqs, n_cycles=n_cycles, return_itc=False, n_jobs=4) power_induced = tfr_morlet(epochs.subtract_evoked(), freqs=freqs, n_cycles=n_cycles, return_itc=False, n_jobs=4)
power = mne.combine_evoked([power_total, power_induced], weights=[1, -1]) power = mne.combine_evoked([power_total, power_induced], weights=[1, -1])
if plot: power.plot(picks='P7') if plot: power.plot(picks='P7')
# Apply a baseline correction to the power data
power.apply_baseline(mode='ratio', baseline=baseline) power.apply_baseline(mode='ratio', baseline=baseline)
if plot: if plot:
plot_oscillation_bands(power) plot_oscillation_bands(power)
@@ -175,7 +183,7 @@ def create_tfr(raw, condition, freqs, n_cycles, response='induced', baseline=Non
return power return power
def time_frequency(dataset, filename, compute_tfr=True): def time_frequency(dataset, filename, scaling='lin', compute_tfr=True):
""" """
Runs time frequency analysis Runs time frequency analysis
@@ -183,15 +191,14 @@ def time_frequency(dataset, filename, compute_tfr=True):
:param filename: Filename of either the file from which the TFRs will be loaded :param filename: Filename of either the file from which the TFRs will be loaded
or to which they will be saved or to which they will be saved
:param compute_tfr: If True the TFRs will be created, else the TFRs will be loaded from a precomputed file :param compute_tfr: If True the TFRs will be created, else the TFRs will be loaded from a precomputed file
:param scaling: default 'lin' for linear scaling, else can be 'log' for logarithmic scaling
""" """
# Parameters # Parameters
# Frequency space (from, to, steps) -> Control frequency resolution : Between num=50-80 good for 1-50Hz if scaling == 'lin':
# freqs = np.linspace(0.1, 50, num=50) # Use this for linear space scaling freqs = np.linspace(0.1, 50, num=50) # Use this for linear space scaling
freqs = np.logspace(*np.log10([0.1, 50]), num=50) else:
# Number of cycles -> Controls time resolution ? At ~freqs/2 good for high frequency resolution freqs = np.logspace(*np.log10([0.1, 50]), num=50)
n_cycles = freqs / 2 # 1 for high time resolution & freq smoothing, freqs/2 for high freq resolution & time smooth n_cycles = freqs / 2
# Baseline -> Should not go post-stimulus, i.e. > 0 -> Best ist pre-stimulus (e.g. -400 to -200ms)
baseline = [-0.5, 0]
cond1 = [] cond1 = []
cond2 = [] cond2 = []
times = None times = None
@@ -209,6 +216,8 @@ def time_frequency(dataset, filename, compute_tfr=True):
raw.set_montage('standard_1020', match_case=False) raw.set_montage('standard_1020', match_case=False)
# Create the two conditions we want to compare # Create the two conditions we want to compare
# IMPORTANT: If different conditions should be compared you have to change them here, by altering the second
# argument passed to create_tfr
power_cond1 = create_tfr(raw, [('face', 'intact')], freqs, n_cycles, 'induced', (-0.2, 0)) power_cond1 = create_tfr(raw, [('face', 'intact')], freqs, n_cycles, 'induced', (-0.2, 0))
print(' CONDITION 1 LOADED') print(' CONDITION 1 LOADED')
cond1.append(power_cond1) cond1.append(power_cond1)
@@ -219,25 +228,32 @@ def time_frequency(dataset, filename, compute_tfr=True):
cond2.append(power_cond2) cond2.append(power_cond2)
print(' DONE') print(' DONE')
# Save the data so we can access the results more easily
np.save('cached_data/tf_data/' + filename + '_cond1', cond1) np.save('cached_data/tf_data/' + filename + '_cond1', cond1)
np.save('cached_data/tf_data/' + filename + '_cond2', cond2) np.save('cached_data/tf_data/' + filename + '_cond2', cond2)
else: else:
# If the data should not be recomputed, load the given filename
cond1 = np.load('cached_data/tf_data/' + filename + '_cond1.npy', allow_pickle=True).tolist() cond1 = np.load('cached_data/tf_data/' + filename + '_cond1.npy', allow_pickle=True).tolist()
cond2 = np.load('cached_data/tf_data/' + filename + '_cond2.npy', allow_pickle=True).tolist() cond2 = np.load('cached_data/tf_data/' + filename + '_cond2.npy', allow_pickle=True).tolist()
if times is None: if times is None:
times = cond1[0].times times = cond1[0].times
# Some plots
mne.grand_average(cond1).plot(picks=['P7'], vmin=-3, vmax=3, title='Grand Average P7') mne.grand_average(cond1).plot(picks=['P7'], vmin=-3, vmax=3, title='Grand Average P7')
mne.grand_average(cond2).plot(picks=['P7'], vmin=-3, vmax=3, title='Grand Average P7') mne.grand_average(cond2).plot(picks=['P7'], vmin=-3, vmax=3, title='Grand Average P7')
plot_oscillation_bands(mne.grand_average(cond1)) plot_oscillation_bands(mne.grand_average(cond1))
plot_oscillation_bands(mne.grand_average(cond2)) plot_oscillation_bands(mne.grand_average(cond2))
# Compute the cluster permutation
F, clusters, cluster_p_values, h0 = mne.stats.permutation_cluster_test( F, clusters, cluster_p_values, h0 = mne.stats.permutation_cluster_test(
[mne.grand_average(cond1).data, mne.grand_average(cond2).data], n_jobs=4, verbose='INFO', [mne.grand_average(cond1).data, mne.grand_average(cond2).data], n_jobs=4, verbose='INFO',
seed=123) seed=123)
plot_tf_cluster(F, clusters, cluster_p_values, freqs, times) plot_tf_cluster(F, clusters, cluster_p_values, freqs, times, scaling)
if __name__ == '__main__': if __name__ == '__main__':
mne.set_log_level(verbose=VERBOSE_LEVEL) mne.set_log_level(verbose=VERBOSE_LEVEL)
ds = 'N170' ds = 'N170'
decoding(ds, 'faces_vs_cars', True) decoding(ds, 'faces_vs_cars', True)
time_frequency(ds, 'face_intact_vs_all_0.1_50hz_ncf2', True) time_frequency(ds, 'face_intact_vs_all_0.1_50hz_ncf2', 'log', True)

View File

@@ -93,16 +93,18 @@ def create_peak_difference_feature(df, max_subj=40):
def analyze_erp(channels, precompute=True): def analyze_erp(channels, precompute=True):
""" """
Execute several statistical tests for different hypothesis, to analyze ERPs Execute several statistical tests for different hypothesis, to analyse ERPs
:param channels: The channels for which the tests are executed :param channels: The channels for which the tests are executed
:param precompute: If true, the peak-difference data will be computed. Else it will be loaded from a precomputed file, :param precompute: If true, the peak-difference data will be computed. Else it will be loaded from a precomputed file,
if it exists. This should only be set 'False' if the method was already executed once! if it exists. This should only be set 'False' if the method was already executed once!
""" """
if precompute: if precompute:
# Precompute the erp peaks
precompute_erp_df('N170') precompute_erp_df('N170')
for c in channels: for c in channels:
print("CHANNEL: " + c) print("CHANNEL: " + c)
# Load the erp peak data and create the features for the t-tests
erp_df = pd.read_csv('cached_data/erp_peaks/erp_peaks_' + c + '.csv', index_col=0) erp_df = pd.read_csv('cached_data/erp_peaks/erp_peaks_' + c + '.csv', index_col=0)
feature_df = create_peak_difference_feature(erp_df) feature_df = create_peak_difference_feature(erp_df)
# 1. H_a : There is a difference between the N170 peak of recognizing faces and cars # 1. H_a : There is a difference between the N170 peak of recognizing faces and cars

View File

@@ -52,7 +52,7 @@ def filter_data(raw):
""" """
Filter the data of a single subject with a bandpass filter. Filter the data of a single subject with a bandpass filter.
The lower bound ist 0.5Hz to compensate the slow drifts. The lower bound ist 0.5Hz to compensate the slow drifts.
The upper bound is 50Hz to compensate the high frequencies, including the power line spike at 60Hz The upper bound is 48Hz to compensate the high frequencies, including the power line spike at 60Hz
:param raw: The data to be filtered :param raw: The data to be filtered
:return: The filtered data :return: The filtered data
""" """

Binary file not shown.

View File

@@ -18,7 +18,7 @@ def load_bad_annotations(filepath, fileending="badSegments.csv"):
def load_preprocessed_data(subject, dataset): def load_preprocessed_data(subject, dataset):
""" """
Load the raw object as well as the annotations of the preprocessed file Load the raw object of the preprocessed file
:param subject: The subject, for which we want to load the raw object :param subject: The subject, for which we want to load the raw object
:param dataset: The currently viewed dataset :param dataset: The currently viewed dataset
:param selected_subjects: The manually preprocessed subjects :param selected_subjects: The manually preprocessed subjects

View File

@@ -56,15 +56,17 @@ def plot_grand_average(dataset):
linestyles=['solid', 'solid', 'dotted', 'dotted']) linestyles=['solid', 'solid', 'dotted', 'dotted'])
def plot_tf_cluster(F, clusters, cluster_p_values, freqs, times): def plot_tf_cluster(F, clusters, cluster_p_values, freqs, times, scaling='lin'):
""" """
Plot teh F-Statistic values of permutation clusters with p-values <= 0.05 in color and > 0.05 in grey. Plot the F-Statistic values of permutation clusters with p-values <= 0.05 in color and > 0.05 in grey.
Currently only works well for the linear scaling. For the logarithmic scaling a different x-axis has to be chosen
:param F: F-Statistics of the permutation clusters :param F: F-Statistics of the permutation clusters
:param clusters: all permutation clusters :param clusters: all permutation clusters
:param cluster_p_values: p-values of the clusters :param cluster_p_values: p-values of the clusters
:param freqs: frequency domain :param freqs: frequency domain
:param times: time domain :param times: time domain
:param scaling: default 'lin' for linear scaling, else can be 'log' for logarithmic scaling
""" """
good_c = np.nan * np.ones_like(F) good_c = np.nan * np.ones_like(F)
for clu, p_val in zip(clusters, cluster_p_values): for clu, p_val in zip(clusters, cluster_p_values):
@@ -74,16 +76,33 @@ def plot_tf_cluster(F, clusters, cluster_p_values, freqs, times):
bbox = [times[0], times[-1], freqs[0], freqs[-1]] bbox = [times[0], times[-1], freqs[0], freqs[-1]]
plt.imshow(F, aspect='auto', origin='lower', cmap=cm.gray, extent=bbox, interpolation='None') plt.imshow(F, aspect='auto', origin='lower', cmap=cm.gray, extent=bbox, interpolation='None')
a = plt.imshow(good_c, cmap=cm.RdBu_r, aspect='auto', origin='lower', extent=bbox, interpolation='None') a = plt.imshow(good_c, cmap=cm.RdBu_r, aspect='auto', origin='lower', extent=bbox, interpolation='None')
if scaling == 'log':
ticks = [1, 4, 8, 12, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50]
labels = [round(freqs[i], 2) for i in range(len(freqs)) if i + 1 in ticks]
plt.yticks(ticks, labels)
plt.colorbar(a) plt.colorbar(a)
plt.xlabel('Time (s)') plt.xlabel('Time (s)')
plt.ylabel('Frequency (Hz)') plt.ylabel('Frequency (Hz)')
plt.show() plt.show()
def plot_oscillation_bands(condition): def plot_oscillation_bands(condition):
"""
Plot the oscillation bands for a given condition in the time from 130ms to 200ms
:param condition: the condition to plot the oscillation bands for
"""
fig, axis = plt.subplots(1, 5, figsize=(25, 5)) fig, axis = plt.subplots(1, 5, figsize=(25, 5))
condition.plot_topomap(baseline=(-0.2, 0), fmin=0, fmax=4, title='Delta', axes=axis[0], show=False, vmin=0, vmax=1.5, tmin=0, tmax=1) condition.plot_topomap(baseline=(-0.2, 0), fmin=0, fmax=4, title='Delta', axes=axis[0], show=False, vmin=0,
condition.plot_topomap(baseline=(-0.2, 0), fmin=4, fmax=8, title='Theta', axes=axis[1], show=False, vmin=0, vmax=0.7, tmin=0, tmax=1) vmax=1.5, tmin=0.13, tmax=0.2)
condition.plot_topomap(baseline=(-0.2, 0), fmin=8, fmax=12, title='Alpha', axes=axis[2], show=False, vmin=-0.15, vmax=0.2, tmin=0, tmax=1) condition.plot_topomap(baseline=(-0.2, 0), fmin=4, fmax=8, title='Theta', axes=axis[1], show=False, vmin=0,
condition.plot_topomap(baseline=(-0.2, 0), fmin=13, fmax=30, title='Beta', axes=axis[3], show=False, vmin=-0.18, vmax=0.2, tmin=0, tmax=1) vmax=0.7, tmin=0.13, tmax=0.2)
condition.plot_topomap(baseline=(-0.2, 0), fmin=30, fmax=45, title='Gamma', axes=axis[4], vmin=0, vmax=0.2, tmin=0, tmax=1) condition.plot_topomap(baseline=(-0.2, 0), fmin=8, fmax=12, title='Alpha', axes=axis[2], show=False, vmin=-0.25,
vmax=0.2, tmin=0.13, tmax=0.2)
condition.plot_topomap(baseline=(-0.2, 0), fmin=13, fmax=30, title='Beta', axes=axis[3], show=False, vmin=-0.21,
vmax=0.2, tmin=0.13, tmax=0.2)
condition.plot_topomap(baseline=(-0.2, 0), fmin=30, fmax=45, title='Gamma', axes=axis[4], vmin=-0.05, vmax=0.2,
tmin=0.13,
tmax=0.2)