„README.md“ ändern

Merge branch 'master' of git.ffhartmann.de:Julius/semesterproject_lecture_eeg
2021-03-29 16:45:25 +02:00 · 2021-03-29 16:35:19 +02:00 · 2021-03-29 16:32:42 +02:00 · 2021-03-29 16:32:28 +02:00 · 2021-03-29 16:28:21 +02:00 · 2021-03-29 16:27:41 +02:00
10 changed files with 108 additions and 41 deletions
--- a/Dataset/preprocessed/sub-001_task-N170_cleaned.fif
+++ b/Dataset/preprocessed/sub-001_task-N170_cleaned.fif
--- a/Dataset/preprocessed/sub-003_task-N170_cleaned.fif
+++ b/Dataset/preprocessed/sub-003_task-N170_cleaned.fif
--- a/Dataset/preprocessed/sub-014_task-N170_cleaned.fif
+++ b/Dataset/preprocessed/sub-014_task-N170_cleaned.fif
--- a/README.md
+++ b/README.md
@@ -1,21 +1,36 @@
-## Semesterproject of the lecture "Semesterproject Signal processing and Analysis
+## Semesterproject of the lecture "Semesterproject Signal processing and Analysis of human brain potentials (eeg)" WS 2020/21
 of human brain potentials (eeg) WS 2020/21
-This repository holds the code of the semesterproject as well as the report.
+This repository holds the code of the semesterproject as well as the report, created by Julius Voggesberger.
-The main files are 'preprocessing_and_cleaning.py', 'erp_analysis.py' and 'decoding_tf_analyis.py'.
+As the dataset for the project, the N170-dataset was chosen.
-The files hold:
+As the three subjects, to be manually pre-processed, the subjects 001, 003 and 014 were chosen.
- preprocessing_and_cleaning.py : Holds the pre-processing pipeline of the project. By executing the file all subjects are pre-processed. Subjects 001, 003, 014 are pre-processed with manually selected pre-processing information, all other subjects are pre-processed with the given pre-processing information. Pre-processed cleaned data is saved in the BIDS file structure as 'sub-XXX_task-N170_cleaned.fif' where XXX is the subject number.
+The rest of the subjects were pre-processed with provided pre-processing information.
 Details can be found in the comments of the code.
 - erp_analysis.py : Holds the code for the erp-analysis. Computes the peak-differences and t-tests for several experimental contrasts. Details can be found in the comments of the code.
 - decoding_tf_analysis.py : Holds the code for the decoding and time-frequency analysis. Details can be found in the comments of the code.
-The folder 'utils' holds helper functions for some plots needed for the analysis and to load data, generate strings etc. and holds the code given in the lecture.
+### Structure
-The folder 'test' holds mostly unittests that test helper functions and one function which visually checks if N170 peaks are extracted correctly.
+```
 ├── Dataset: The dataset of the project as well as the manually selected bad segments are stored here. 
 |   ├── n170: Store the dataset here. 
 |   └── preprocessed: Bad segments are stored here. 
 ├── cached_data: Data that is generated in the analysis part is stored here. 
 |   ├── decoding_data: Results of the classifiers. 
 |   ├── erp_peaks: ERP peaks needed for the ERP analysis. 
 |   └── tf_data: Time-frequency data needed for the tf-analysis. 
 ├── test: Contains unittests and one visual check. 
 ├── utils: Contains helper methods 
 |   ├── ccs_eeg_semesterproject: Methods given in the lecture. 
 |   ├── ccs_eeg_utils_reduced: Method for reading in BIDS provided in the lecture. 
 |   ├── file_utils.py: Methods for reading in files and getting epochs. 
 |   └── plot_utils.py: Methods for manually created plots. 
 ├── preprocessing_and_cleaning.py: The preprocessing pipeline. 
 ├── erp_analysis.py: The ERP-Analysis and computation of ERP peaks.
 ├── decoding_tf_analysis.py: Decoding and time-frequency analysis.
 └── semesterproject_report_voggesberger: The report of the project.
 ```
-For the code to work properly, the N170 dataset needs to be provided.
+### Running the project
-When first running the analysis, it may take a while. After running it one time the data is cached, so that it can be reused if the analysis should be executed again. Be careful though, as a parameter has to be explicitly set in the code, so that the already computed data is used. This parameter is a boolean given to each analysis function which caches data.
+To run the project python 3.7 is required and anaconda recommended.\
-
+To ensure reproducability, randomstates were used for methods which are non-deterministic.
-This code was created using Python 3.7 and the following libraries:
+The randomstates used are either '123' or '1234'.\
 The following libraries are needed:
 - Matplotlib 3.3.3
 - MNE 0.22.0
 - MNE-Bids 0.6
@@ -23,3 +38,18 @@ This code was created using Python 3.7 and the following libraries:
 - Scikit-Learn 0.23.2
 - Pandas 1.2.0
 - Scipy 1.5.4
 For the code to work, the N170 dataset needs to be provided and put into the folder 'Dataset/n170/', so that the file structure 'Dataset/n170/sub-001', etc. exists.
 The pre-processed raw objects are saved in their respective subject folder, in 'Dataset/n170/'.
 When first running the analysis, it may take a while. 
 After running it one time the data is cached, so that it can be reused if the analysis should be executed again at a later time.
 For the cached data to be used, a boolean parameter has to be set in the respective analysis method.
 It may be necessary to set the parent directory 'semesterproject_lecture_eeg' as 'Sources Root' for the project, if pycharm is used as an IDE.
 ### Parameters
 Parameters have to be changed manually in the code, if different settings want to be tried.
 ### Visualisation
 The visualisation methods that were used to generate the visualisations in the report, are contained in the code, if they were created manually.
 If a visualisation method from mne was used to create the visualisation, it may exist in the code or not.
--- a/decoding_tf_analysis.py
+++ b/decoding_tf_analysis.py
@@ -46,6 +46,7 @@ def events_to_labels(evts, events_dict, mask=None):  # TODO Test schreiben
 def permutation_test(baseline, score, n_iter):
    """
    An implementation of a permutation test for classification scores.
    :param baseline: The classification scores of the baseline, i.e. selection by chance
    :param score: The classification scores which are tested for significance
    :param n_iter: number of permutations
@@ -120,6 +121,8 @@ def decoding(dataset, filename, compute_metric=True, mask=None):
    # Compute index of time point 0
    index = math.floor((len(metric[0]) / time_scale) * 100)
    baseline = np.array(metric[:index]).flatten()
    # Plot the result
    plt.plot(np.linspace(-200, 1000, 1127), np.mean(metric, axis=0))
    plt.ylabel('Accuracy (%)')
    plt.xlabel('Time (ms)')
@@ -129,11 +132,12 @@ def decoding(dataset, filename, compute_metric=True, mask=None):
    # Compute the permutation tests
    for t in range(len(metric[0][index:])):
        score_t = np.asarray(metric[:, t + index])
-        p = permutation_test(baseline, score_t, 100)
+        p = permutation_test(baseline, score_t, 1000)
        p_values.append(p)
        if t % 50 == 0:
            print(str(t) + " Out of " + str(len(metric[0][index:])))
    # Plot the result
    plt.plot(times[index:], p_values)
    plt.ylabel('P-Value')
    plt.xlabel('Time (ms)')
@@ -143,14 +147,17 @@ def decoding(dataset, filename, compute_metric=True, mask=None):
 def create_tfr(raw, condition, freqs, n_cycles, response='induced', baseline=None, plot=False):
    """
-    Compute the time frequency representation (TFR) of data for a given condition via morlet wavelets
+    Compute the time frequency representation (TFR) of data for a given condition via Morlet wavelets
    :param raw: the data
-    :param condition: the condition for which to compute the TFR. Given as a list of tuples of the form (stimulus, condition) # TODO ambiguous use of condition
+    :param condition: the condition for which to compute the TFR. Given as a list of tuples of the form (stimulus, texture)
    :param freqs: the frequencies for which to compute the TFR
-    :param n_cycles: the number of cycles used by the morlet wavelets
+    :param n_cycles: the number of cycles used by the Morlet wavelets
-    :param response: type of expected TFR. Can be total, induced or evoked. Default is induced
+    :param response: type of expected TFR. Can be total, induced or evoked. Default is induced,
                    the others were not used for the report, only for exploration
    :param baseline: baseline used to correct the power. A tuple of the form (start, end).
-        Default is None and no baseline correction will be applid
+        Default is None and no baseline correction will be applied
    :param plot: True if results should be plotted, else false.
    :return: The TFR or the given data for a given condition. Has type AverageTFR
    """
    epochs, _ = get_epochs(raw, condition, tmin=-0.2, tmax=1)
@@ -168,6 +175,7 @@ def create_tfr(raw, condition, freqs, n_cycles, response='induced', baseline=Non
        power_induced = tfr_morlet(epochs.subtract_evoked(), freqs=freqs, n_cycles=n_cycles, return_itc=False, n_jobs=4)
        power = mne.combine_evoked([power_total, power_induced], weights=[1, -1])
    if plot: power.plot(picks='P7')
    # Apply a baseline correction to the power data
    power.apply_baseline(mode='ratio', baseline=baseline)
    if plot:
        plot_oscillation_bands(power)
@@ -175,7 +183,7 @@ def create_tfr(raw, condition, freqs, n_cycles, response='induced', baseline=Non
    return power
-def time_frequency(dataset, filename, compute_tfr=True):
+def time_frequency(dataset, filename, scaling='lin', compute_tfr=True):
    """
    Runs time frequency analysis
@@ -183,15 +191,14 @@ def time_frequency(dataset, filename, compute_tfr=True):
    :param filename: Filename of either the file from which the TFRs will be loaded
        or to which they will be saved
    :param compute_tfr: If True the TFRs will be created, else the TFRs will be loaded from a precomputed file
    :param scaling: default 'lin' for linear scaling, else can be 'log' for logarithmic scaling
    """
    # Parameters
-    # Frequency space (from, to, steps) -> Control frequency resolution : Between num=50-80 good for 1-50Hz
+    if scaling == 'lin':
-    # freqs = np.linspace(0.1, 50, num=50) # Use this for linear space scaling
+        freqs = np.linspace(0.1, 50, num=50)  # Use this for linear space scaling
-    freqs = np.logspace(*np.log10([0.1, 50]), num=50)
+    else:
-    # Number of cycles -> Controls time resolution ? At ~freqs/2 good for high frequency resolution
+        freqs = np.logspace(*np.log10([0.1, 50]), num=50)
-    n_cycles = freqs / 2  # 1 for high time resolution & freq smoothing, freqs/2 for high freq resolution & time smooth
+    n_cycles = freqs / 2
    # Baseline -> Should not go post-stimulus, i.e. > 0 -> Best ist pre-stimulus (e.g. -400 to -200ms)
    baseline = [-0.5, 0]
    cond1 = []
    cond2 = []
    times = None
@@ -209,6 +216,8 @@ def time_frequency(dataset, filename, compute_tfr=True):
            raw.set_montage('standard_1020', match_case=False)
            # Create the two conditions we want to compare
            # IMPORTANT: If different conditions should be compared you have to change them here, by altering the second
            # argument passed to create_tfr
            power_cond1 = create_tfr(raw, [('face', 'intact')], freqs, n_cycles, 'induced', (-0.2, 0))
            print('    CONDITION 1 LOADED')
            cond1.append(power_cond1)
@@ -219,25 +228,32 @@ def time_frequency(dataset, filename, compute_tfr=True):
            cond2.append(power_cond2)
            print('    DONE')
        # Save the data so we can access the results more easily
        np.save('cached_data/tf_data/' + filename + '_cond1', cond1)
        np.save('cached_data/tf_data/' + filename + '_cond2', cond2)
    else:
        # If the data should not be recomputed, load the given filename
        cond1 = np.load('cached_data/tf_data/' + filename + '_cond1.npy', allow_pickle=True).tolist()
        cond2 = np.load('cached_data/tf_data/' + filename + '_cond2.npy', allow_pickle=True).tolist()
    if times is None:
        times = cond1[0].times
    # Some plots
    mne.grand_average(cond1).plot(picks=['P7'], vmin=-3, vmax=3, title='Grand Average P7')
    mne.grand_average(cond2).plot(picks=['P7'], vmin=-3, vmax=3, title='Grand Average P7')
    plot_oscillation_bands(mne.grand_average(cond1))
    plot_oscillation_bands(mne.grand_average(cond2))
    # Compute the cluster permutation
    F, clusters, cluster_p_values, h0 = mne.stats.permutation_cluster_test(
        [mne.grand_average(cond1).data, mne.grand_average(cond2).data], n_jobs=4, verbose='INFO',
        seed=123)
-    plot_tf_cluster(F, clusters, cluster_p_values, freqs, times)
+    plot_tf_cluster(F, clusters, cluster_p_values, freqs, times, scaling)
 if __name__ == '__main__':
    mne.set_log_level(verbose=VERBOSE_LEVEL)
    ds = 'N170'
    decoding(ds, 'faces_vs_cars', True)
-    time_frequency(ds, 'face_intact_vs_all_0.1_50hz_ncf2', True)
+    time_frequency(ds, 'face_intact_vs_all_0.1_50hz_ncf2', 'log', True)
--- a/erp_analysis.py
+++ b/erp_analysis.py
@@ -93,16 +93,18 @@ def create_peak_difference_feature(df, max_subj=40):
 def analyze_erp(channels, precompute=True):
    """
-    Execute several statistical tests for different hypothesis, to analyze ERPs
+    Execute several statistical tests for different hypothesis, to analyse ERPs
    :param channels: The channels for which the tests are executed
    :param precompute: If true, the peak-difference data will be computed. Else it will be loaded from a precomputed file,
                        if it exists. This should only be set 'False' if the method was already executed once!
    """
    if precompute:
        # Precompute the erp peaks
        precompute_erp_df('N170')
    for c in channels:
        print("CHANNEL: " + c)
        # Load the erp peak data and create the features for the t-tests
        erp_df = pd.read_csv('cached_data/erp_peaks/erp_peaks_' + c + '.csv', index_col=0)
        feature_df = create_peak_difference_feature(erp_df)
        # 1. H_a : There is a difference between the N170 peak of recognizing faces and cars
--- a/preprocessing_and_cleaning.py
+++ b/preprocessing_and_cleaning.py
@@ -52,7 +52,7 @@ def filter_data(raw):
    """
    Filter the data of a single subject with a bandpass filter.
    The lower bound ist 0.5Hz to compensate the slow drifts.
-    The upper bound is 50Hz to compensate the high frequencies, including the power line spike at 60Hz
+    The upper bound is 48Hz to compensate the high frequencies, including the power line spike at 60Hz
    :param raw: The data to be filtered
    :return: The filtered data
    """
--- a/semesterproject_report_voggesberger.pdf
+++ b/semesterproject_report_voggesberger.pdf
--- a/utils/file_utils.py
+++ b/utils/file_utils.py
@@ -18,7 +18,7 @@ def load_bad_annotations(filepath, fileending="badSegments.csv"):
 def load_preprocessed_data(subject, dataset):
    """
-    Load the raw object as well as the annotations of the preprocessed file
+    Load the raw object of the preprocessed file
    :param subject: The subject, for which we want to load the raw object
    :param dataset: The currently viewed dataset
    :param selected_subjects: The manually preprocessed subjects
--- a/utils/plot_utils.py
+++ b/utils/plot_utils.py
@@ -56,15 +56,17 @@ def plot_grand_average(dataset):
                                     linestyles=['solid', 'solid', 'dotted', 'dotted'])
-def plot_tf_cluster(F, clusters, cluster_p_values, freqs, times):
+def plot_tf_cluster(F, clusters, cluster_p_values, freqs, times, scaling='lin'):
    """
-    Plot teh F-Statistic values of permutation clusters with p-values <= 0.05 in color and > 0.05 in grey.
+    Plot the F-Statistic values of permutation clusters with p-values <= 0.05 in color and > 0.05 in grey.
    Currently only works well for the linear scaling. For the logarithmic scaling a different x-axis has to be chosen
    :param F: F-Statistics of the permutation clusters
    :param clusters: all permutation clusters
    :param cluster_p_values: p-values of the clusters
    :param freqs: frequency domain
    :param times: time domain
    :param scaling: default 'lin' for linear scaling, else can be 'log' for logarithmic scaling
    """
    good_c = np.nan * np.ones_like(F)
    for clu, p_val in zip(clusters, cluster_p_values):
@@ -74,16 +76,33 @@ def plot_tf_cluster(F, clusters, cluster_p_values, freqs, times):
    bbox = [times[0], times[-1], freqs[0], freqs[-1]]
    plt.imshow(F, aspect='auto', origin='lower', cmap=cm.gray, extent=bbox, interpolation='None')
    a = plt.imshow(good_c, cmap=cm.RdBu_r, aspect='auto', origin='lower', extent=bbox, interpolation='None')
    if scaling == 'log':
        ticks = [1, 4, 8, 12, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50]
        labels = [round(freqs[i], 2) for i in range(len(freqs)) if i + 1 in ticks]
        plt.yticks(ticks, labels)
    plt.colorbar(a)
    plt.xlabel('Time (s)')
    plt.ylabel('Frequency (Hz)')
    plt.show()
 def plot_oscillation_bands(condition):
    """
        Plot the oscillation bands for a given condition in the time from 130ms to 200ms
        :param condition: the condition to plot the oscillation bands for
        """
    fig, axis = plt.subplots(1, 5, figsize=(25, 5))
-    condition.plot_topomap(baseline=(-0.2, 0), fmin=0, fmax=4, title='Delta', axes=axis[0], show=False, vmin=0, vmax=1.5, tmin=0, tmax=1)
+    condition.plot_topomap(baseline=(-0.2, 0), fmin=0, fmax=4, title='Delta', axes=axis[0], show=False, vmin=0,
-    condition.plot_topomap(baseline=(-0.2, 0), fmin=4, fmax=8, title='Theta', axes=axis[1], show=False, vmin=0, vmax=0.7, tmin=0, tmax=1)
+                           vmax=1.5, tmin=0.13, tmax=0.2)
-    condition.plot_topomap(baseline=(-0.2, 0), fmin=8, fmax=12, title='Alpha', axes=axis[2], show=False, vmin=-0.15, vmax=0.2, tmin=0, tmax=1)
+    condition.plot_topomap(baseline=(-0.2, 0), fmin=4, fmax=8, title='Theta', axes=axis[1], show=False, vmin=0,
-    condition.plot_topomap(baseline=(-0.2, 0), fmin=13, fmax=30, title='Beta', axes=axis[3], show=False, vmin=-0.18, vmax=0.2, tmin=0, tmax=1)
+                           vmax=0.7, tmin=0.13, tmax=0.2)
-    condition.plot_topomap(baseline=(-0.2, 0), fmin=30, fmax=45, title='Gamma', axes=axis[4], vmin=0, vmax=0.2, tmin=0, tmax=1)
+    condition.plot_topomap(baseline=(-0.2, 0), fmin=8, fmax=12, title='Alpha', axes=axis[2], show=False, vmin=-0.25,
                           vmax=0.2, tmin=0.13, tmax=0.2)
    condition.plot_topomap(baseline=(-0.2, 0), fmin=13, fmax=30, title='Beta', axes=axis[3], show=False, vmin=-0.21,
                           vmax=0.2, tmin=0.13, tmax=0.2)
    condition.plot_topomap(baseline=(-0.2, 0), fmin=30, fmax=45, title='Gamma', axes=axis[4], vmin=-0.05, vmax=0.2,
                           tmin=0.13,
                           tmax=0.2)
Author	SHA1	Message	Date
Julius	be4489dc73	„README.md“ ändern	2021-03-29 16:45:25 +02:00
Julius	b621dbf9b0	„README.md“ ändern	2021-03-29 16:35:19 +02:00
Julius Voggesberger	f554a5822f	Merge branch 'master' of git.ffhartmann.de:Julius/semesterproject_lecture_eeg	2021-03-29 16:32:42 +02:00
Julius Voggesberger	dd2df8f1a8	Added the report to the repo	2021-03-29 16:32:28 +02:00
Julius	fc8cde3a88	„decoding_tf_analysis.py“ ändern	2021-03-29 16:28:21 +02:00
Julius Voggesberger	5f5e7ffcac	Fixed merge conflict	2021-03-29 16:27:41 +02:00
Julius Voggesberger	d09480c5bb	Added the plotting option for log-scaling of the tf spectrum	2021-03-29 16:25:38 +02:00
Julius	6161e088c6	„README.md“ ändern	2021-03-29 14:26:05 +02:00
Julius	02f604cdeb	„README.md“ ändern	2021-03-29 12:43:11 +02:00
Julius	0d513bfb96	„README.md“ ändern	2021-03-29 12:42:56 +02:00
Julius	6c1a555179	„README.md“ ändern	2021-03-29 12:42:41 +02:00
Julius	a5b97a3a65	„decoding_tf_analysis.py“ ändern	2021-03-28 20:14:00 +02:00
Julius	7f616a4a34	„README.md“ ändern	2021-03-28 17:36:46 +02:00
Julius	106a3ad434	„README.md“ ändern	2021-03-28 17:35:40 +02:00
Julius	26dd78a410	„README.md“ ändern	2021-03-28 17:34:48 +02:00
Julius	cd20b9b776	„README.md“ ändern	2021-03-28 15:47:41 +02:00
Julius Voggesberger	c6a6fd7012	Merge branch 'master' of git.ffhartmann.de:Julius/semesterproject_lecture_eeg	2021-03-28 15:45:23 +02:00
Julius Voggesberger	cfa950b85b	Edited some comments	2021-03-28 15:45:14 +02:00
Julius	86044cc7c7	„Dataset/preprocessed/sub-014_task-N170_cleaned.fif“ löschen	2021-03-28 13:40:50 +02:00
Julius	4875cf29cb	„Dataset/preprocessed/sub-003_task-N170_cleaned.fif“ löschen	2021-03-28 13:40:42 +02:00
Julius	c44cb7cb13	„Dataset/preprocessed/sub-001_task-N170_cleaned.fif“ löschen	2021-03-28 13:40:34 +02:00
Julius	6f8c1af6c6	„README.md“ ändern	2021-03-28 04:54:08 +02:00