No.
You can submit your result individually or as a team. There will be an option on the submission form to identify team members.
Novela Neurotech and NeuroTechX join forces to accelerate epilepsy research and EEG data mining.
Novela Neurotech collaborates with NeuroTechX to accelerate epilepsy research and (scalp) EEG data mining through online crowdsourcing and open access datasets.
We propose a month-long challenge on seizure prediction using the TUH EEG Seizure dataset. The goal is to have the best performance across subjects while using as little channels as possible.
The competition is based on TUH Seizure Corpus (TUSZ) v1.5.1.
Temple University Hospital recently released the TUH EEG Corpus, the world’s largest publicly accessible archive of clinical EEG recordings.
https://www.isip.piconepress.com/projects/tuh_eeg/
You need a username/password to access the data. Follow the instructions and you will receive it automatically by email within seconds (okay, give it a minute).
https://www.isip.piconepress.com/projects/tuh_eeg/html/request_access.php
Do not share your credentials!
Please read carefully the download instructions to avoid downloading hundreds of gigabytes of data! You should end up with 55GB. We are using the TUH EEG Seizure Corpus (TUSZ).
For the full instructions go to:
https://www.isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml#c_tusz
We highly recommend using rsync with the following command (it will ask you for the password):
rsync -auxvL nedc_tuh_eeg@www.isip.piconepress.com:~/data/tuh_eeg_seizure/v1.5.1/ .
It will fetch the list for a minute or two and then start downloading the files…
** Update: TUH released a feature folder for the dataset, which adds a 70GB. So expect ~120GB.
You can delete the folder if you don’t use their features, or you can download the folders and files separately if you don’t want to download this extra 70GB.
You can also run this script which will run the rsync command in a loop in case of error. That way you can run the command overnight and wake up with the full dataset downloaded.
*You need to add L to rsync options -auxv in the script.
If you are using Windows (we don’t judge you), we recommend getting access to Linux or Mac to download the data and then transfer it via USB drive. Unfortunately, rsync doesn’t work on Windows.
If you are having difficulty downloading the data, please contact us.
(contact info at the bottom & contact button at the top).
Once you’ve downloaded the data, you will notice that it’s already been split into Training (train) and Validation (dev) sets.
The whole folder should be ~57GB.
The architecture of the files should look like this:
You should leverage all the .edf files (EEG data) under the train folder and keep the .edf files (EEG data) of the dev_test folder for validation (e.g. cross-validation).
Don’t worry about double-dipping for the final test/evaluation as a new dataset will be released one (1) week prior to the deadline. Use all the 57GB you’ve just downloaded to the fullest to come up with the best model that generalizes across subjects while trying to minimize the number of channels.
There is a file called _INSTRUCTIONS.txt in the root folder of the downloaded data. Make sure to read it as it contains all the information you need! Here is a snippet of what you’ll find in that file.
The annotation files have the following structure:
[filename] [start time] [stop time] [class] [confidence]
00000492_s003_t000 0.0000 6.2825 bckg 1.0000
00000492_s003_t000 6.2825 35.8875 seiz 1.0000
Before you attempt to process the data, please review the following papers that describe the data:
_DOCS/00_dbeeg.pdf: An overview of the corpus
_DOCS/01_elect.pdf: An overview of how the data is stored in an edf file
_DOCS/02_annot.pdf: A description of how we annotated the data (unfinished)
_DOCS/03_score.pdf: An overview of the scoring algorithm
There is a file called _AAREADME.txt in the root folder of the downloaded data. Make sure to read it as it describes the folder architecture and the file names’ valuable information. Here is a snippet of what you’ll find in that file.
Filename
edf/dev/01_tcp_ar/002/00000258/s002_2003_07_21/00000258_s002_t000.edf
Components
edf: contains the edf data
dev: part of the dev_test set (vs.) train
01_tcp_ar: data that follows the averaged reference (AR) configuration, while annotations use the TCP channel configutation
002: a three-digit identifier meant to keep the number of subdirectories in a directory manageable. This follows the TUH EEG v1.1.0 convention.
00000258: official patient number that is linked to v1.1.0 of TUH EEG
s002_2003_07_21: session two (s002) for this patient. The session was archived on 07/21/2003.
00000258_s002_t000.edf: the actual EEG file. These are split into a series of files starting with t000.edf, t001.edf, … These represent pruned EEGs, so the original EEG is split into these segments, and uninteresting parts of the original recording were deleted (common in clinical practice).
>> Beware: not all EDF files have the same electrodes configuration!
total files: 4597
total sessions: 1185
total patients: 592
files with seizures: 867
sessions with seizures: 343
patients with seizures: 202
total number of seizures: 2,370
total duration: 2,708,284.00 secs
total background duration: 2,540,144,77 secs (93.79%)
total duration of files with seizures: 635,490.00 secs (23.46%)
total seizure duration: 168,139.23 secs (6.21%)
total files: 1013
total sessions: 238
total patients: 50
files with seizures: 280
sessions with seizures: 104
patients with seizures: 40
total number of seizures: 673
total duration: 613,232.00 secs
total background duration: 554,786.89 secs (90.47%)
total duration of files with seizures: 230,031.00 secs (37.51%)
total seizure duration: 58,445.11 secs (9.53%)
Demonstration software that describes how to properly read EDF files. Please read our document describing the organization of the electrode data in an EDF file here to understand why this software is critical to your ability to correctly process EEG data.
You can download it here.
You can also rsync it with the following command (create a new folder, e.g. tools):
rsync -auxvL nedc_tuh_eeg@www.isip.piconepress.com:~/data/nedc_pystream/ .
Only the EEG data is available. There will be no annotation, that’s your job!
You wouldn’t give the exam with the answers in it to your students, would you.
To download the Eval set, you only have to re-run the rsync command:
rsync -auxvL nedc_tuh_eeg@www.isip.piconepress.com:~/data/tuh_eeg_seizure/v1.5.1/ .
It will automatically detect the new folder on the server and will automatically download it.
EVAL SET (EVAL)
total files: 1023
total sessions: 152
total patients: 50
total duration: 543,135.00 secs
Here is the scoring script that we will be using to judge/score/evaluate the performance of your model (i.e. results).
Once you decompress the file, you’ll get the following folder architecture.
*Edit: the scoring script is now the version v3.3.2
Read the _AAREADME.txt file.
The scoring script comes with an example of a hyp.txt vs ref.txt and produces an answers folder.
Then open the summary.txt to see the in-depth performance of your model!
The objective of the competition is also to try to reduce the number of electrodes, however, there are many different ways to look at such a constraint. Since we want to encourage creativity and to not impose a direction, people are free to select the channels required to come up with their final answers, the way they see fit.
For each “answer” (i.e. row in the hyp.txt file), report on that line/row the minimum number of channels that you used to produce your answer.
e.g. with 12 channels used, the row would read:
00000492_s003_t000 6.2825 35.8875 1.0 12 F3-REF, F4-REF, C3-REF, C4-REF …
Note #1: The channel names are optional and won’t affect the scoring. Only the number of channels is mandatory (missing values will be considered as 19). The name of the channels is only for “informational purposes”.
Note #2: Even if you don’t use the confidence, you have to still put 1.0 as we will consider the 5th value on the row to be the number of channels.
Here are the possible options for reporting a seizure:
00000492_s003_t000 6.2825 35.8875
00000492_s003_t000 6.2825 35.8875 1.0
00000492_s003_t000 6.2825 35.8875 1.0 12
00000492_s003_t000 6.2825 35.8875 1.0 12 F3-REF, F4-REF …
For the first two, we assume 19 channels when computing the performance metric.
Points = %SENS – alpha * FAs/24hr – beta * (avg # chans)/19
where alpha = 2.5 and beta = 7.5
The scoring algorithm used to deal with the overlap between the hypothesis versus ground truth (i.e. reference) for the TP, TN, FP, FN is Time-Aligned Event Scoring (TAES).
Please read the [data_root]/_DOCS/03_score.pdf file to know more about TAES.
For more on epilepsy, seizure prediction and performance/scoring, we suggest the following references.
Give a great overview of the current state of the field and also talk about datasets & competitions. They also highlight the importance and difficulty of (fair) scoring algorithms.
Lots of useful information of various models and approaches (ML and DL) for seizure predictions.
Only available on arxiv and hasn’t been peer-reviewed (yet?).
Example of an attempt at reducing the number of channels.
Deep dive in evaluation metrics. Must read!
Only available on arXiv and hasn’t been peer-reviewed (yet?).
Similar previous competitions might also be a good source of information to look at open-source models.
If you would like to know more about Deep Learning for EEG, we encourage you to read:
Deep learning-based electroencephalography analysis: a systematic review (Roy et al., 2019)
(shameless promotion, I had to…)
Here are some DL-EEG papers on Seizure Prediction:
HOW IT WORKS
The Neureka™ 2020 Epilepsy Challenge is open to all.
Rules
Here are a few burning questions you might have regarding the the Neureka™ 2020 Epilepsy Challenge.
Before you participate and submit your results, please read all the rules! The one on the right are just a few of them.
If you have any question, join the conversation on NeuroTechX Slack, channel #NeurekaEpilepsy2020 or contact us via email information at the bottom of the page.
See All Rules
No.
You can submit your result individually or as a team. There will be an option on the submission form to identify team members.
Yes.
This is an online and international competition. Anyone can participate. As for the prize money, you need to be able to receive money via online solutions (e.g. Paypal).
Winning solutions need to be made available under a popular OSI-approved license in order to be eligible for recognition and prize money.
No.
We will judge the submission on the Test/Eval dataset that we will release one (1) week prior to the submission deadline. You will receive the data but not the annotations.
We evaluate the submissions on three (3) things: Prediction Performance*, Prediction Time before the seizure, and the Number of Electrodes.
*See Performance section above for more details.
We want to congratulate the @KU_Leuven team "Biomed Irregulars" that won the first place on our #NeurekaEpilepsy2020. @NovelaNeurotech
https://ai.kuleuven.be/news/201cbiomed-irregulars201d-won-the-first-place-in-neureka-challenge
"It's a proof-of-concept of the power of adopting an #OpenData approach for solving neurological disorders"
@SingularityHub reports on @NeuroTechX's #NeurekaEpilepsy2020 challenge, and the power of collaborative science during #lockdown
http://bit.ly/3dHtsW6
Really happy to be part of the winning team in #NeurekaEpilepsy2020 challenge! Proud to also represent @KU_Leuven. Thanks to @_yroy_ Joseph Picone @NovelaNeurotech & @NeuroTechX for organizing this challenge!
Congratulations to winners of the #NeurekaEpilepsy2020 Challenge!
We are thankful to all participants and the great submissions and Joseph Picone at Temple University for pioneering the open big brain data era.
https://neureka-challenge.com/results/
#neurotechX #eeg #epilepsy #ai #deeplearning
What an amazing community CHALLENGE it's been! We're happy to announce the two top winners for #NeurekaEpilepsy2020
We've been blown away at the responses. Thanks everyone for participating, & check out the results page👇 for exciting next steps
https://neureka-challenge.com/results/
Proud of @NeuroSyd team who ranked 2nd globally in #NeurekaEpilepsy2020 challenge, towards enabling more reliable seizure detection in critical-care applications. Special thanks to @_yroy_ Joseph Picone @NovelaNeurotech & @NeuroTechX | @Eng_IT_Sydney
Novela Neurotech and NeuroTechX join forces to accelerate epilepsy research and EEG data mining.