IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 52, NO. 4, AUGUST 2022 677 Modified Spatio-Temporal Matched Filtering for Brain Responses Classification Marian P. Kotas , Michal Piela , and Sonia H. Contreras-Ortiz Abstract—In this article, we apply themethod of spatio-temporal amplitude, therefore, considered as a suitable measure of the filtering (STF) to electroencephalographic (EEG) data processing brain activity during visual tasks [1]–[8]. The idea to employ this for brain responses classification. The method operates similarly potential as a control signal in BCI was first introduced in 1988 to linear discriminant analysis (LDA) but contrary to most applied classifiers, it uses the whole recorded EEG signal as a source of by Farwell and Donchin [1]. Since then, the major challenge information instead of the precisely selected brain responses, only. remains the same and involves very low energy of the desired This way it avoids the limitations of LDA and improves the classi- EP in reference to the spontaneous activity of the brain. Conse- fication accuracy. We emphasize the significance of the STF learn- quently, its proper classification requires multiple presentation ing phase. To preclude the negative influence of super–Gaussian of the same stimulus and averaging of the consecutive responses artifacts on accomplishment of this phase, we apply the discrete cosine transform (DCT) basedmethod for their rejection. Later, we of the brain. This in turn prolongs the time of an experiment and estimate thenoise covariancematrixusingall dataavailable, andwe thereby a period of high concentration a user needs to maintain. It improve the STF template construction. The further modifications is inconvenient, particularly for people whose medical condition are related with the constructed filters operation and consist in affects their ability to stay focused long on the mental or visual the changes of the STF interpretation rules. Consequently, a new task. Therefore, an effort is made to shorten this time by reducing tool for evoked potentials (EPs) classification has been developed. Applied to the analysis of signals stored in a publicly available a number of averaging steps necessary to recognize an EP. database, prepared for the assessment ofmodern algorithms aimed Development of new, more effective classifiers is one of the in EPs detection (in the frames of the 2019 IFMBE Scientific ways to achieve this objective. The aim of the classifiers is to Challenge), it allowed to achieve the second best result, very close discriminate between two types of brain responses: target which to the best one, and significantly better than the ones achieved by are triggered by the stimuli, the user is focused on, and nontarget, other contestants of the challenge. i.e., caused by other stimuli, the user is trying to neglect. Popular Index Terms—Brain–computer interfaces (BCI), discrete cosine algorithms that assure good performance are based on support transform (DCT), generalized matched filtering (GMF), spatio– vector machines [9], [10], ensemble of weighted support vector temporal filtering (STF), visual evoked potentials (EPs). machines [11], [12], linear discriminant analysis (LDA) [13], I. I stepwise LDA [14], shrinkage LDA [15], Fisher’s and BayesianNTRODUCTION LDA (FLDA, BLDA) [16], [17], sparse Bayesian LDA [18], and CAPABILITY for quick and reliable classification of brain regularized group sparse LDA [19].responses plays a key role in designing modern and effi- While the above listed methods perform an outright classifi- cient brain-computer interfaces (BCI). Responses of the brain cation of the brain responses stored in signal matrices, there are in BCI are triggered by external (visual, auditory) or internal also methods, which precede the classification by prior signal (mental simulation of a physical action) stimuli, therefore, are enhancement and feature extraction using spatial (the xDAWN called as evoked potentials (EP). Among a group of brain’s re- algorithm [20]) or spatial and temporal projections [21]. In [22], sponses to a visual stimulus, P300 is characterized by the greatest a two-step projection using spatial LDA and temporal canonical correlation analysis was proposed to form feature vectors for Manuscript received July 9, 2021; revised January 29, 2022 and March 18, later classification; a further extension of this approach was 2022; accepted March 31, 2022. Date of publication May 5, 2022; date of current version July 14, 2022. This work was supported in part by statutory described in [23]. funds of the Department of Cybernetics, Nanotechnology and Data Processing, Although the linear methods appeared to be advantageous in Silesian University of Technology, BK-2022 and co-financed by European Union application to EP detection [24], there are attempts to utilize also through the European Social Fund under Grant POWR.03.02.00-00-I029. This work was performed using the infrastructure supported by POIG.02.03.01-24- the nonlinear ones, like e.g., convolutional neural networks [25]– 099/13 Grant: GeCONiI-Upper Silesian Center for Computational Science and [27], self-organizing fuzzy neural networks [28], or extreme Engineering. This article was recommended by Associate Editor T. H Falk. learning machines [29], [30]. Particularly fast development can (Corresponding author: Marian P. Kotas.) Marian P. Kotas and Michal Piela are with the Department of Cybernetics, be observed in applications of convolutional neural networks. Nanotechnology and Data Processing, Silesian University of Technology, 44- In [25], the first convolutional layer performs spatial filtering 101 Gliwice, Poland (e-mail: mkotas@polsl.pl; michal.piela@polsl.pl). and the second, the temporal one; later, the fully connected Sonia H. Contreras-Ortiz is with the Universidad Tecnológica de Bolívar, Biomedical Engineering Program, 150003 Cartagena, Colombia (e-mail: scon- multilayer perceptron is employed for final classification. The treras@utb.edu.co). authors achieved good results using an ensemble of such clas- Color versions of one or more figures in this article are available at sifiers (called as a multiclassifier system). In [31], the order of https://doi.org/10.1109/THMS.2022.3168421. Digital Object Identifier 10.1109/THMS.2022.3168421 spatial and temporal filtering was reversed. This architecture This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ 678 IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 52, NO. 4, AUGUST 2022 was also adopted in [32]. Similarly, in [33] and [34], where the convolutional neural network was preceded by additional preprocessing steps to improve the classification results. Spatio-temporal filtering (STF) is an intrinsically linear method, applied, e.g.: to detection of low amplitude signals of repeatable morphology embedded in high energy noise [35], [36] or to motor imagery (MI) tasks classification [37]–[40], or to re- lated operations like selection of the proper EEG channels [41], or extraction of movement related potentials [42] for later MI classification. The alternating spatial–temporal optimization and projection was applied to event related potentials classification in [43]. We have applied STF to tackle the fundamental problem Fig. 1. Illustration of BCI experiment; a moment when visual stimulus (target) is captured. of LDA based approaches to EP classification, i.e., the sin- gularity of the within-class covariance matrix which has to be inverted [16]. This singularity results usually from a large dimen- sion of a feature vector (it is equal to the product of a number of accomplished by comparing the outputs of the classifier applied time samples covering a brain response and a number of channels to these six events. The one with the highest score indicates recorded) and a limited number of the brain responses used the picture classified as a target. However, because of a very for the classifiers learning. Since the within-class dispersion low amplitude of the brain responses to the visual stimuli, it results mostly from the presence of noise in the feature vectors, is very rare to achieve proper discrimination at this step of the this problem can be seen as an issue of the noise properties experiment. estimation. Accomplishment of this operation on the basis of Therefore, this part of the experiment, so called run (of target and nontarget responses, only, seems to be unjustified. We six flashes), when each visual stimulus occurs only once, is propose to reject this restriction and perform the noise covari- insufficient. Multiple repetition of such runs of flashes is re- ance matrix estimation using different time segments of EEG quired. In each consecutive run, the order of pictures flash- signals, not necessarily those corresponding to the precisely ing is random, independent from the other runs. This way selected brain responses. Moreover, taking into account that STF of presentation involves an element of surprise, necessary to is most effective in the colored Gaussian noise (CGN) [44], we elicit an EP. For every individual picture, the outputs obtained propose to reject super–Gaussian artifacts for better estimation in the consecutive runs are averaged. Typically, taking into of the CGN properties. This improves the EP classifiers learning account more subsequent runs, the averaged response to a without the necessity of inversion regularization [16], or feature target stimulus is increased when compared to the averaged vectors dimension reduction [20], [21]. responses to the nontarget ones. Consequently, increasing the A further improvement can be achieved if we face the fact number of runs analyzed should increase the classification that brain responses can have different time delays with respect accuracy. to the stimuli (flashes) and it is not always justified to apply the The described series of runs, related with one particular target classifiers to the same time segments after the stimuli. object, is called as a block. During the experiments, this process The rest of this article is organized as follows. Section II (block) is usually repeated for every object (on the screen) explains the operation of a BCI system. Section III provides regarded as a target. All blocks executed form a session. In the detailed description of the original STF method; its modifica- considered example, a session contains six blocks. tions are presented in Section IV. Results of experiments and In the experimental section, we will analyze the dataset their discussion are presented in Section V. Finally, Section VI BCIAUT-P300, recorded during similar experiments. It was cre- concludes this article. ated for a purpose of the 2019 IFMBE Scientific Challenge [32]. The dataset contains recordings of 15 subjects diagnosed with autism spectrum disorder. The structure of the data is as follows: II. EVOKED POTENTIALS (EPS) BASED BCI there are seven learning and seven testing sessions recorded by During EPs based BCI experiments, a subject faces a screen every subject. Each learning session contains 20 blocks. During filled with different objects (letters, signs or pictures, etc.) and each of the blocks, the user is focused on one picture (the one is focused on the selected one. Fig. 1 shows an example where selected as a target, out of eight available on the user interface). the user interface contains six household appliances, and the Every block is built of 10 runs, each of which gathers single brain user focuses on the lamp. The six objects are flashed one at responses to all possible events (one target and seven nontargets). a time in a random order. The flash of the item the user is The testing subset is organized in a similar manner; however, focused on (lamp) is considered as the target event; flashes of the number of runs per block varies between 4 and 10. The EEG the other items are the nontarget ones. We expect the target signals recorded are sampled with the frequency of 250 Hz, to elicit higher responses of the brain than the nontargets. notch-filtered at 50 Hz, and bandpass-filtered between 2 and Once all the objects have been flashed (one target and five 30 Hz. Signals were acquired using eight-channel configuration nontargets), we can compare the elicited brain responses. It is (electrodes were positioned at C3, Cz, C4, CPz, P3, Pz, P4, POz). KOTAS et al.: MSTMF FOR BRAIN RESPONSES CLASSIFICATION 679 III. SPATIO-TEMPORAL FILTERING time. Therefore, we create an auxiliary setΨt(Δ)with argument Lets denote the m channel signal vector as x(k) = Δ > δ. On this basis, we create the set corresponding to spatio– [x1(k)x2(k), . . . , x (k)] T m . To exploit the spatial and temporal temporal vectors that should be suppressed relationships between the signal components, we have defined, Ψsupp(Δ) = Ψ−Ψt(Δ) (4) similarly as in [35], the following vector representation of the multichannel signals: ⎡ ⎤ where Ψ denotes the set of all time indices for which vectorsx(k) have been constructed. Thus Ψsupp(Δ) indicates all vectors ⎢ x1(k − J · τ)⎢ ⎥⎥ x (k) whose time location is rather distant from the vectors ⎢⎢x1(k − (J − 1)τ)⎢ ⎥⎥⎥ maximized. ⎢ . . . ⎥ Finally, we create the follo∑wing objective function:⎢⎢x (k + (J − 1)τ)⎥⎥ 1 T (k) 2⎢ 1 ⎥ |Ψt(δ)| ∑k∈Ψ (δ)(h x )t⎢ x (k + J · τ) ⎥ Q(h) = 1 (5)1 T (k) 2 x(k) = ⎢⎢ ⎥⎥ |Ψsupp(Δ)| k∈Ψsupp(Δ)(h x )⎢ . . . ⎥ (1)⎢⎢ − · ⎥⎥ where | · | denotes the set cardinality.⎢ xm(k J τ)⎢ − − ⎥⎥ We gather the spatio–temporal vectors that are maximized in⎢xm(k (J 1)τ)⎢ ⎥⎥ matrix Apxn = [a1,a2, . . . ,an ], and those which we wanta a⎣⎢ . . . ⎦⎥ to suppress in matrix Bpxn = [b ,b , . . . ,bb 1 2 n ], where n =b axm(k + (J − 1)τ) |Ψt(δ)| and nb = |Ψsupp(Δ)|. This way we obtain the following x (k + J · τ) function:m 1 T 2 containing 2J + 1 time samples from m channels available (the ‖h A‖ hTCahQ(h) = na = (6) x(k) vector length is p = (2J + 1)m). 1 ‖hTB‖2 hTC hn bb Parameter τ is introduced to perform a kind of signal re- 1 T 1 T sampling (from the original frequency fs to fs/τ , applied in where Ca = n AA and Cb =a n BB .b spatio-temporal vectors). We assume that this new frequency Function Q(h) has the form of a Rayleigh quotient whose should not be smaller than 40 Hz, thus parameter τ can be maximization can be achieved using the generalized eigende- calculated according to: τ =  fs[Hz]40 , where · denotes the composition largest integer not greater than the argument. Cah = λCbh. (7) The filtering operation can be expressed as T (k) The solution, we are interested in, is the generalized eigenvectory(k) = h x (2) e1 that corresponds to the greatest eigenvalue. Here, we face the and if we want to perform detection of some signal pattern of a problem of the eigenvector sign ambiguity [45]. We solve it by finite length Tp, the time interval 2 · J · τ should not be⌈shorte⌉r. using either h = e1 or h = −e1, so as the average response of Thus, parameter J can be calculated according to: fs·TJ = p , the filter to the learning vectors stored in matrix A was positive.2·τ · This original version of the STF method [based on eigende-where denotes the smallest integer not smaller than the composition (7)] will further simply be denoted as OSTF. argument. B. STF Interpretation A. STF Construction Lets denote the onset of the oth object ith flash as oki. The Let’s denote the moments when the target object is flashed as STF filter response to this flash is expected at oki + J · τ (see ki, i = 1, 2, . . . , It (It denotes the total number of these flashes). the explanation in Section III-A). It is therefore sufficient to We assume that the brain response to the ith flash begins at · · calculate and store for further analysis the valueski and ends at ki + 2 J τ . This time segment corresponds to spatio-temporal vector x(ki+J ·τ). For this vector, the filter q = y(oi,o ki + J · τ); o = 1, 2, . . . , Io; i = 1, 2, . . . , Ir (8) should produce the maximal output value. Since the slope of the where Io denotes the number of objects on the screen, and I isoutput signal is limited, more of its neighboring time samples r · · · · − · the number of analyzed runs of flashes.will be magnified ( , y(ki + J τ 1), y(ki + J τ), y(ki + J · τ + 1), · · · Thus, the oth column of the formed Q matrix contains). Thus, to construct the filter we first specify the IrxIo the scores obtained for successive flashes of the oth object. If following set indicating the spatio–temporal vectors that should we want to select the target object on the basis of one run of be maximized flashes only (Ir = 1), we simply choose o for which q1,o has Ψt(δ) = {k| |k − (ki + J · τ)| ≤ δ · fs, i = 1, 2, . . . , It} the greatest value. However, we usually take into account more (3) runs (Ir > 1), and we select the column for which the mean of where δ denotes the assumed time range. the corresponding Ir values is highest. We want the filter to be very sensitive to the corresponding The described interpretation rules share an important feature set of vectors (x(k), k ∈ Ψt(δ)). On the other hand, we want with most classifiers applied to EP discrimination. They select it to produce as low output as possible in other moments of the same located segments of the EEG signal following the 680 IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 52, NO. 4, AUGUST 2022 stimuli (contained in vectors x(ki+J ·τ), i = 1, 2, · · · ) to perform 1) Detection of Outlying Artifacts: The operation is per- classification. Therefore, they will further be referred to as the formed in each channel separately. Its first step involves cal- classical decision rules (CDR). culation of the 5th and 95th percentile of all values in the channel. Then, multiplied by d these percentiles are regarded as C. Averaging of the Maximized Spatio–Temporal Vectors thresholds that limit the negative and positive values accepted, respectively. In the experimental section, we use d = 3 (during In [36], we propose to calculate the average of the vectors initial experiments, for d ∈ [2, 5], we have observed similar stored in A ∑ performance of EP classification). In each place, where then1 a thresholds are crossed, we search for the signal maximum (or ā = ai. (9) minimum), and starting from this position, we search for the na i=1 artifact onset and offset. The criterion is based on the signal Objective function (6) can now be replaced by derivative: in order to find the first and the last sample of the artifact, we search for the positions (to the left and to the right hT (āāT )h from the extremum, respectively), where the derivative polarity Q(h) = (10) hTCbh changes. All samples that belong to the found artifacts are marked as outlying; for signal x(k) recorded during a single and consequently, the eigendecomposition (7) based formula is block, we form matrix Z containing the markers, which classify substituted by the product the samples of x({k) as outlying (0) or proper (1)h = C−1b ā (11) 0, if xj(i) is outlying zi,j = . (14) which is computationally identical to the one used for con- 1, if xj(i) is proper struction of the highly acknowledged generalized matched fil- ters [44]. Therefore, this solution will further be referred to Simultaneously, the xj(i) values marked as outlying are consid- as the generalized spatio–temporal matched filtering (GSTMF). ered as missing and are replaced with zeros. GSTMF and OSTF will be regarded as reference methods for 2) Missing Values Interpolation: To this end, we apply the the assessment of the modified spatio–temporal matched filter iterative interpolation procedure, developed in [47] and facili- (MSTMF), which we propose in this article. tated as MATLAB function inpaintn by Garcia. The procedure is based on the discrete cosine transform (DCT) based smoothing IV. M S –T proposed in [48]. The values of signal x(k) are stored in matrixODIFIED PATIO EMPORAL MATCHED FILTERING X (subsequent channels in subsequent columns of X). Then, Z Although the brain responses to the nontarget flashes are defined by (14) and X are submitted to the inpaintn procedure. very tiny, in [16], it was shown that taking into account these On the basis of the output matrix X̂ (ŷ{k+1} defined in [47, eq. responses during classifiers learning (and not only the target (20)], where {k + 1} denotes the number of iteration), we obtain responses) improves discrimination between them. To make use the signal with outlying artifacts suppressed x̂(k). of this finding, we construct setΨnt(δ) using (3) withki replaced by k′i, i = 1, 2, . . . , Int (corresponding to the moments when the B. Modification of the Decision Rules nontarget objects are flashed). On this basis, we form matrix A′ that contains vectors beginning close to these nontarget The CDR, described in Section III-B, are less effective if the flashes {x(k+Jτ)| k ∈ Ψ (δ)}. By averaging these vectors, we brain responses to the flashes are of changing delays. To over-nt construct the average nontarget response ā′. come this inconvenience, we propose to calculate the average Now, objective function Q(h) takes the following form: MSTMF responses to the Ir flashes of each object T − ′ − ′ T i∑=I i=Ih (ā ā )(ā ā ) h r ∑ro Q(h) = (12) ȳo(k) = h T x( ki+J ·τ+k) = y(oki + J · τ + k). hTCbh i=1 i=1 and its solution is given by (15) The maximum of this average response to the oth object is h = C−1b (ā− ā′). (13) expected at k = 0. However, we search for its true position around this point The EEG signals can contain outlying super–Gaussian ar- tifacts, which can have a detrimental influence on the filter ko = argmax|k|≤r ȳo(k) (16)s template construction. Because of this, we precede the learning phase of the proposed method with the procedure for outlying where variable rs limiting the range of the search can be es- artifacts rejection. tablished using rs = Ts · fs (following the preliminary experi- ments, Ts = 0.1 s has been applied). The value ȳo(ko) is taken as the classifier score for the oth A. Rejection of Outlying Artifacts (ROA) object. The object that achieves the highest value is regarded The procedure consists of the operation of outlying artifacts as the target one. This approach to STF interpretation will be detection and of the stage of the detected artifacts suppression. referred to as the modified decision rules (MDR). KOTAS et al.: MSTMF FOR BRAIN RESPONSES CLASSIFICATION 681 Fig. 2. Results of outlying artifacts suppression: original 64-channel EEG Fig. 3. Results of outlying artifacts suppression: the modified 64-channel EEG recording; in bold black: the channels with artifacts detected, in gray: the recording; in bold black: the modified channels, in gray: the unchanged channels. channels accepted. The amplitude scale has been changed with respect to Fig. 2. C. MSTMF Responses as Feature Vectors To exploit not only the height of the STF responses to the respective objects, but also their shape, we take the values: ȳo(ko + k), k = −rs,−rs + 1, . . . , rs, as the features of the oth object [rs is the parameter used in (16)]. Those feature vectors can undergo classification for better discrimination between the target and nontarget brain responses. Since the amplitudes of the signals at the MSTMF output can vary substantially, we nor- malize them by dividing by their standard deviation (SD). This operation is accomplished during each individual classification task (for which the normalizing SD value is calculated on the basis of the signals recorded during all Ir flashes of the respective objects). In the experimental section, the SVM with linear kernel function will be applied for classification. The obtained combi- nation of methods will simply be denoted as MSTMF+SVM (as opposed to MSTMF+CDR and MSTMF+MDR). Fig. 4. LDA classifier (subplots A,B) versus MSTMF filter (C,D). Vectors are constructed according to (1); vertical dotted lines mark segments of 2J + 1 = V. R D 43 values, corresponding to individual EEG channels (whose names are given inESULTS AND ISCUSSION subplot A). From the top: A) the difference between the centers of the target and A. Signals Preprocessing nontarget classes, B) coefficients of the LDA classifier, C) the difference ā− ā ′ (calculated to determine MSTMF) and D) the obtained MSTMF template; a.u. Both in the learning phase and during the STFs operation, the stands for arbitrary units. . processed EEG signals undergo prior band-pass filtering [32]. ROA, which is rather time consuming, is applied during the learning phase, only. Figs. 2 and 3 show exemplary results B. Construction of the Modified Spatio–Temporal Matched of this operation. Fig. 2 contains an original 64-channel EEG Filters recording with the noise artifacts; channels detected as noisy We have set parameters τ and J to 6 and 21, respectively, are marked in bold black. Exactly the same time frame of the according to the formulas given below (1) and (2), so that the signal is included in Fig. 3; here, the artifacts are significantly filter length (2 · J · τ/fs) covered approximately 1 s period. reduced (in bold black are the plots of the channels modified). Analysis of such a segment is oriented toward detection of the In dataset BCIAUT-P300, used in this study, the signals are not whole group of EPs, not only P300. stored in a continuous way (where all the channels and time For each subject, we apply all learning data to calculate samples are accessible). Instead, for each stimulus (a flash of the mean target (ā) and nontarget (ā′) brain responses, and a single object on the screen) a signal segment of 1.4 s length the covariance matrix C . The exemplary difference ā− ā′b is is written down: beginning 200 ms before the flash onset. Such presented in Fig. 4.C and the final MSTMF filter, in subplot D. length is insufficient for outlying artifacts rejection. To make For reference, we have plotted the results of using LDA (in A this operation possible, all time segments corresponding to the and B). flashes of 8 objects during 10 runs of flashes are first conflated. Subplots A and C are very similar, with the plot in C being After this operation, we perform outlying artifacts rejection. slightly more smooth (as it is explained in Appendix A, for 682 IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 52, NO. 4, AUGUST 2022 TABLE I dataset used, Δ has hardly any influence on the classification ACCURACY (± STANDARD ERROR OF MEAN (SEM)) OF TARGET STIMULI D U M P , B LDA, results. Thus, a fixed value of Δ = 0.1 s has been applied. ForETECTION SING THE ETHODS ROPOSED THE AYESIAN AND THE METHODS PARTICIPATING IN 2019 IFMBE SCIENTIFIC CHALLENGE (FOR THEIR parameter δ, we performed a kind of cross validation. The filters PRECISE DESCRIPTION SEE [32]) were constructed on the basis of six learning sessions and applied to the seventh one. After rotation of this validation session and averaging of the results, an estimate of the accuracy was obtained for each value of δ. Following, we could select the one for which the best validation accuracy has been achieved. This value has been used in the final tests. BLDA has been applied using different decimation factors [16], and the best results achieved are presented. In the lower part of the table, for reference, the results of 2019 IFMBE Scientific Challenge are provided. Watching the table, we can notice that using the generalized eigendecomposition (7) based OSTF with the CDR gives rather poor results. The large standard error of mean: SEM=8.01, shows that the method could have been fairly effective for some subjects and must have failed for some other ones. When formula (11) is applied in the learning phase, the filter (GSTMF) performs much better, competitively to BLDA and to most a kind of low-pass filtering is introduced). However, the sophisticated methods competing in the challenge reported.δ > 0 final templates constructed are quite different; although we can The high accuracy of such a relatively simple method must be see some similarity of the plots in B and D, the latter is of much perceived as an outcome of the proper estimation of the noise better quality. It seems to be well matched to the local properties covariance matrix Cb: on the basis of the whole information of the brain responses, whereas the former is embedded in wide available, contained in the recorded EEG signals. This has band contaminations, and therefore can be expected to be more resulted in a fairly effective filter but the further modifications susceptible to noise. This substantial difference is primarily of a mean nontarget brain response (ā ′) subtraction, and DCT caused by more effective estimation of the noise covariance based outlying artifacts rejection, applied by MSTMF, have still matrix (if compared to the within class scatter matrix of raised its performance.Cb the LDA classifier). However, it is application of the MDR that has resulted in This can be explained in the following way: is estimated most significant improvement of the classification results. TheCb on the basis of spatio–temporal vectors (k) indicated by set reasons of such an outcome are illustrated in Fig. 5. Presented arex Ψsupp(Δ). As it has been expressed below its definition (4), the the MSTMF responses to a block of nine runs of flashes (each run set indicates vectors whose time location is rather distant from consisting of one target and seven nontarget flashes). Individual the vectors maximized (corresponding to target flashes). Thus, filter responses to the successive nine target flashes are presented almost all vectors corresponding to nontarget flashes can be used in subplot A. Below in subplot B are the filter responses to nine for estimation. As it has already been mentioned, the signals flashes of a single nontarget object; we have selected the oneCb we use for experiments consist of 1.4 s long segments, corre- (among seven) that has produced the highest average nontarget sponding to all individual flashes of the respective objects. With MSTMF response. Both in A and B, the plots seem to be the sampling frequency fs = 250 Hz and the assumed values of dominated by high energy responses to the spontaneous EEG. and , applying (1) we construct 98 spatio-temporal vectors A correct classification on the basis of such single responsesJ τ for each of such signal segments. All such vectors corresponding does not seem possible. It is their averaging that is necessary to nontarget flashes are used to estimate and also a part of to achieve the goal, and the average of plots from A is drawnCb vectors corresponding to the target ones (their accepted number with blue color in subplot C. We can observe a high amplitude depends on the assumed value of parameter ). By contrast, peak, whose dominant component is the MSTMF response toΔ only 1 vector corresponding to each individual flash (both target the desired EP. What is surprising, however, is that the peak and nontarget) is used to estimate the within class scatter matrix maximum is significantly shifted to the right from k = 0. Thus, in LDA. application of CDR would cause the classification error: the height at k = 0 of the blue target response (marked by the red cross) is lower than the height of the corresponding nontarget C. Brain Responses Classification one in D. Only the search for the maximum of the filter response We have applied all the methods described: OSTF, GSTMF, leads to the correct classification because the height of the green and MSTMF, and as a particular reference: the BLDA [16], i.e., point in C is greater than the corresponding one in D. Thus, in the method that uses Bayesian regularization to overcome the spite of the rather great height of the average target response, it is LDA limitations (the software has been provided by the authors). only application of MDR that allowed to avoid the classification The results are presented in the upper part of Table I. error. For each of our methods, we had to select the values of This observation is confirmed by the increased classification parameters δ and Δ. It has, however, appeared that for the accuracy of MSTMF+MDR. It has risen up to above 90%. Using KOTAS et al.: MSTMF FOR BRAIN RESPONSES CLASSIFICATION 683 TABLE III COMPUTATIONAL TIMES (± STD) inadequacy of using the eigendecomposition based approach for so noisy signals. Comparing MSTMF (applied with the MDR or with the SVM classifier) with the reference method (BLDA), we can notice sta- tistically significant improvement of the classification accuracy. This seems surprising because BLDA uses advanced Bayesian regularization for estimation of the classifier template. However, the need to apply regularization results from the abovementioned limitations of the LDA based approaches, and although the Bayesian regularization can effectively solve the problem of singularity of the scatter matrix to be inverted, it does not solve the problem of changing delays of the brain responses. Thus, the most significant improvement of MSTMF accuracy is related with the modification of the decision rules. Averaging Fig. 5. Operation of the decision rules (MDR versus CDR): A) individual of the MSTMF responses to individual flashed objects and the MSTMF responses to target flashes during a block of the successive Ir = 9 runs of flashes [variable k on the horizontal axis is introduced by (15)], B) the search for maxima of such average responses, applied in MDR, MSTMF responses to Ir flashes of a single nontarget object, C) the average have allowed to raise the classification accuracy to above 90%, target MSTMF responses (the one in blue is the mean of the plots in A; the and have made the MSTMF method not only significantly better black ones correspond to different blocks of flashes), D) the average nontarget MSTMF response (the mean of the plots in B). The double-sided arrow shows than the reference BLDA, but even comparable to the winner of the region of the search for the maxima of the MSTMF responses. The red 2019 challenge, i.e., to the convolutional neural network based crosses mark the values of the scores obtained for k = 0 (according to CDR), on the EEGnet software [31]. Moreover, compared to the second and the green points, the ones obtained using MDR (the maxima in the specified range of the search). The additional comments are in the text. best competitor of the challenge, MSTMF appeared significantly more effective. This justifies using the modifications proposed, TABLE II which are aimed to extract all the information from the whole STATISTICAL SIGNIFICANCE (S-SIGNIFICANT, N-NONSIGNIFICANT) OF THE EEG signal recorded. DIFFERENCES BETWEEN THE METHODS TESTED: LETTER S MEANS THAT THE The introduced modifications inevitably increase the compu- P-VALUE < 0.001 tational costs of the methods. To illustrate this, in Table III, we have provided the times necessary to accomplish the learning stage and the operation stage of the selected versions of the STF method. Learning and testing times shown in Table III are the subject-averaged execution times. It should be noted that for each subject the total length of the learning signals is approx- MSTMF responses as feature vectors and the SVM classifier has imately equal to 37.3 min, and the length of the test signals is allowed to improve the results only slightly. Nevertheless, the around 56 min. As far as the learning stage is realized, we can increase of accuracy to nearly 91% is worth noticing. notice rather great differences between the respective methods. To verify statistical significance of the differences between However, the operation stage is much much faster, and should the methods tested, we have compared them pairwise, denoting not preclude any of the studied versions of the STF method from as the number of the cases classified correctly by the first, practical applications. For every method mentioned in Table III,f12 and incorrectly by the second method compared, and as the the analysis of a 1.4 s long segment of the signal (related to onef21 number of opposite ones. On this basis, we have calculated the event) is executed in around 3 ms time. squared McNeymar statistic [49] One of the fundamental questions related with the develop- ment of real working BCI systems is: how many EEG channels (f − f )2 are required to achieve satisfactory results. To give some insight12 21 z = (17) f + f into the matter, we have performed the following experiment.12 21 For MSTMF, we decreased the number of leads available (us- which obeys the Chi-square distribution with one degree of ing the same cross validation as for selection of parameter δ) freedom [50]. and calculated the classification accuracy achievable. Fig. 6.A The results obtained are presented in Table II. For all methods presents the results obtained. It reveals that it is advantageous to tested, we have confirmed statistical significance of improve- measure at least six channels, since for five of them the accuracy ment with respect to the original OSTF method. This confirms discernibly drops down. The another question is: how dense 684 IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 52, NO. 4, AUGUST 2022 and the second, in the proper interpretation of the filtering results. To meet the first requirement, we have rejected the limitation of using only target and nontarget brain responses to estimate the noise covariance matrix. Instead, we have used all the data satisfying the necessary conditions, solving in this simple way, the problem of singularity of the matrix to be inverted. Moreover, we have proposed outlying artifacts rejection before the filter construction, what has still improved its performance. Applied with the CDR, the developed MSTMF filter has allowed to obtain relatively high accuracy of EP classification. To meet the second requirement, we have addressed the problem of changing delays of the brain responses with respect to the visual stimuli. A delayed brain response at the MSTMF Fig. 6. Illustration of the factors that influence the classification accuracy of the MSTMF based BCI system: A) Accuracy as a function of the channels input inevitably results in a delayed filter response at its output. number, B) Accuracy as a function of τ . Therefore, it appears beneficial to look for the maxima of these responses. This search is preceded by averaging of the MSTMF should the brain responses be resampled, in the spatio–temporal responses to the flashes of the respective objects. Since averaging vectors constructed, to achieve good classification results. This of the target responses causes their enhancement, it helps to find is related with the value of parameter . In Fig. 6.B, we can the correct maxima. These operations referred to as the MDRτ notice that for growing over the value of 10, the classification have significantly increased the EP classification accuracy.τ accuracy decreases, discernibly. The applied value of , With the very limited costs of MSTMF operation, we believeτ = 6 calculated according to the formula given below (1), seems to that our method can contribute to construction of more reliable be a good compromise between the computational costs of the and faster brain–computer interfaces. method and its achievable classification accuracy. The proper estimation of the noise covariance matrix is highly APPENDIX A ON THE NOISE INFLUENCE ON STFS important. It not only allows to construct the filter with visually CONSTRUCTION good properties, like the one presented in Fig. 4, but also assures In the following, we will consider the filters defined by (11), effective suppression of stationary CGN of even extremely high related directly to the acknowledged generalized matched filter- energy. The problem can arise if the noise in test signals changes ing (GMF). The GMF allows for detection of a known signal its properties with respect to this from the learning data. Such embedded in a CGN [44]. situation must be taken into account because we cannot assume In the problem studied, the noise component is a complicated the noise stationarity during different sessions, and even in long mixture produced mostly by the ongoing brain activity but also time intervals. A kind of remedy could be found if we had by the noncortical biologic sources (e.g., eye blinks and move- rather large learning database. This would allow us to construct ments, or muscle and heart contractions) and the environmental many filters matched to different local noise properties. Later sources (e.g., powerline, radio and electrical interference) [51]. the weighted sum of the obtained ensemble of filters could be Some of these components, like e.g., sinusoidal powerline in- effective for the noise present in test signals. Development of terference are sub-Gaussian (with flatter distribution than Gaus- such an ensemble of filters seems to be a promising direction for sian [52]) and some are super-Gaussian (sharper, with so called the future study. “heavy” tails, associated with relatively high probability of large However, we expect that it can be even more promising to amplitude peaks). This non-Gaussianity allows their separation apply the MDR in the already developed EP classifiers. Taking using independent component analysis [52]. into account possible changes of the brain responses time de- However, the recorded and processed signals are the mixtures lays (with respect to the stimuli) and looking for their correct of such components, and owing to the central limit theorem, localization, corresponding to the locally highest scores of the we can make a simplifying assumption of their Gaussianity, classifiers, should help to raise the accuracy of many existing unless one of these components has a predominant amplitude. algorithms. Because of a strong correlation between the EEG channels and the successive signal samples, we can assume the noise to be VI. CONCLUSION colored. The constructed average pattern ā can be expressed as In this study, we have proposed application of spatio–temporal filtering to classification of visual EPs elicited by target and n1 ∑a nontarget stimuli. We have shown that using the whole recorded ā = (s+wi) = s+ w̄ (18)na EEG signal as a source of information instead of the same located i=1 brain responses to those stimuli, only, significantly improves the where s is the fixed desired signal and wi, the additive noise. To classification accuracy. achieve noise suppression, the noise vectors wi should be [53] There are two major factors that contribute to this outcome. 1) of zero mean (E{wki} = 0, where wki is the kth entry of The first one consists in the appropriate construction of the STF, vector wi, E denotes expectation). KOTAS et al.: MSTMF FOR BRAIN RESPONSES CLASSIFICATION 685 2) uncorrelated with the desired signal (E{sn · wki} = 0). [11] S. Kundu and S. Ari, “P300 detection with brain–computer interface 3) uncorrelated among themselves (E{w · w } = application using PCA and ensemble of weighted SVMs,” IETE J. Res.,ki nj vol. 64, no. 3, pp. 406–416, 2018.0, for i = j. [12] Y-R. Lee and H-N. Kim, “A data partitioning method for increasing It should be noted that for Gaussian signals uncorrelatedness ensemble diversity of an eSVM-based P300 speller,” Biomed. Signal is equivalent to independence, with the latter being a stronger Process. Control, vol. 39, pp. 53–63, 2018. [13] C. Guger et al., “How many people are able to control a P300-based brain- condition. computer interface (BCI)?,” Neurosci. Lett., vol. 462, pp. 94–98, 2009. Since initially all signals are band-pass filtered, the zero mean [14] D. J. Krusienski et al., “Toward enhanced P300 speller performance,” J. assumption is naturally satisfied. Moreover, since the noise com- Neurosci. Methods, vol. 167, no. 1, pp. 15–21, Jan. 2008. [15] B. Blankertz et al., “Single-trial analysis and classification of ERP com- ponent is not related with the moments, the EPs are triggered, we ponents’a tutorial,” NeuroImage, vol. 56, no. 2, pp. 814–825, May 2011. can assume the noise vectors to be independent from the desired [16] U. Hoffmann, J. Vesin, T. Ebrahimi, and K. Diserens, “An efficient signal. When parameter δ of set Ψ [defined by (3)] equals 0, P300-based brain computer interface for disabled subjects,” J. Neurosci.,t vol. 167, no. 1, pp. 115–125, 2008. matrixA contains a single spatio–temporal vector for each target [17] Y. Miao et al., “An ERP-based BCI with peripheral stimuli: Validation flash. Thus, for a relatively high distances between the flashes, with ALS patients,” Cogn. Neurodynamics, vol. 14, pp. 21–33, 2020. we can also assume the noise vectors (w ) to be independent [18] Y. Zhang, G. Zhou, J. Jin, Q. Zhao, X. Wang, and A. Cichocki, “Sparsei Bayesian classification of EEG for brain–computer interface,” IEEETrans. from each other (and consequently uncorrelated). Neural Netw. Learn. Syst., vol. 27, no. 11, pp. 2256–2267, Nov. 2016. Following, for δ = 0, we can expect effective suppression of [19] Q. Wu, Y. Zhang, J. Liu, J. Sun, A. Cichocki, and F. Gao, “Regularized noise as a result of averaging. However, for δ > 0 matrix A group sparse discriminant analysis for P300-based brain–computer inter- face,” Int. J. Neural Syst., vol. 29, no. 6, 2019, Art. no. 1950002. contains many successive signal vectors related with the same · − · · [20] B. Rivet, A. Souloumiac, V. Attina, and G. Gibert, “xDAWN algorithmtarget flash: {. . . ,x(k+J τ 1),x(k+J τ),x(k+J τ+1), . . .}, thus to enhance evoked potentials: Application to brain–computer interface,” we cannot expect the corresponding w vectors to be uncor- IEEE Trans. Biomed. Eng., vol. 56, no. 8, pp. 2035–2043, Aug. 2009.i [21] F. S. Rizi, V. Abootalebi, and M. T. Sadeghi, “Spatial and spatio-temporal related; nevertheless, their averaging will result in a kind of filtering based on common spatial patterns and Max-SNR for detec- low-pass filtering, which can additionally decrease the level of tion of P300 component,” Biocybernetics Biomed. Eng., vol. 37, no. 3, noise in the constructed pattern ā. pp. 365–372, 2017. [22] X. Xiao, M. Xu, J. Jin, Y. Wang, T. -P. Jung, and D. Ming, “Discrim- However, to avoid strong violation of the assumption concern- inative canonical pattern matching for single-trial classification of ERP ing the fixed shape of the desired component, which is expressed components,” IEEE Trans. Biomed. Eng., vol. 67, no. 8, pp. 2266–2275, by (18), the value of δ should not be large. Actually, it should Aug. 2020. [23] X. Xiao et al., “Enhancement for P300-speller classification using multi- be emphasized that the concept of using δ > 0 can only be window discriminative canonical pattern matching,” J. Neural Eng., applied because of the very low frequency content of the visual vol. 18, no. 4, 2021, Art. no. 046079. EPs (which are additionally smoothed by the applied band-pass [24] D. J. Krusienski et al., “A comparison of classification techniques for the P300 speller,” J. Neural Eng., vol. 3, no. 4, pp. 299–305, 2006. filtering). [25] H. Cecotti and A. Graser, “Convolutional neural networks for P300 detec- tion with application to brain-computer interfaces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 3, pp. 433–445, Mar. 2011. [26] L. Vařeka, “Evaluation of convolutional neural networks using a large REFERENCES multi-subject P300 dataset,” Biomed. Signal Process. Control, vol. 58, 2020, Art. no. 101837. [1] L. A. Farwell and E. Donchin, “Talking off the top of your head: Toward [27] O. Y. Kwon, M. H. Lee, C. Guan, and S. W. Lee, “Subject-independent a mental prosthesis utilizing event-related brain potentials,” Electroen- brain–computer interfaces based on deep convolutional neural networks,” cephalogr. Clin. Neuriophysiol., vol. 70, no. 6, pp. 510–523, 1988. IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 10, pp. 3839–3852, [2] S. Sutton, M. Braren, J. Zubin, and E. R. John, “Evoked potential correlates Oct. 2020. of stimulus uncertainty,” Science, vol. 150, pp. 1187–1188, 1965. [28] D. Coyle, G. Prasad, and T. M. McGinnity, “Extracting features for a [3] Q. T. Obeidat, T. A. Campbell, and J. Kong, “Introducing the edges brain-computer interface by self-organising fuzzy neural network-based paradigm: A P300 brain–computer interface for spelling written words,” time series prediction,” in Proc. 26th Annu. Int. Conf. IEEE Eng. Med. IEEE Trans. Human-Mach. Syst., vol. 45, no. 6, pp. 727–738, Dec. 2015. Biol. Soc., San Francisco, USA, 2004, pp. 4371–4374, 2004. [4] S. Noorzadeh, B. Rivet, and C. Jutten, “3-D interface for the P300 speller [29] W. Kong et al., “Weighted extreme learning machine for P300 detection BCI,” IEEE Trans. Human-Mach. Syst., vol. 50, no. 6, pp. 604–612, with application to brain computer interface,” J. Ambient Intell.Humanized Dec. 2020. Comput., pp. 1–11, 2018. [5] J. Qu et al., “A novel three-dimensional P300 speller based on stereo visual [30] Z. Jin et al., “EEG classification using sparse Bayesian extreme learning stimuli,” IEEE Trans. Human-Mach. Syst., vol. 48, no. 4, pp. 392–399, machine for brain–computer interface,” Neural Comput. Appl., vol. 32, Aug. 2018. pp. 6601–6609, 2020. [6] P. Stegman, C. S. Crawford, M. Andujar, A. Nijholt, and J. E. Gilbert, [31] V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, “Brain–computer interface software: A review and discussion,” IEEE and B. J. Lance, “EEGNet: A compact convolutional neural network for Trans. Human-Mach. Syst., vol. 50, no. 2, pp. 101–115, Apr. 2020. EEG-based brain–computer interfaces,” J. Neural Eng., vol. 15, no. 5, [7] A. Cruz, G. Pires, A. Lopes, C. Carona, and U. J. Nunes, “A self-paced 2018, Art. no. 056013. BCI with a collaborative controller for highly reliable wheelchair driving: [32] M. Simões et al., “BCIAUT-P300: A multi-session and multi-subject Experimental tests with physically disabled individuals,” IEEE Trans. benchmark dataset on autism for P300-based brain-computer-interfaces,” Human-Mach. Syst., vol. 51, no. 2, pp. 109–119, Apr. 2021. Front. Neurosci., vol. 14, 2020, Art. no. 568104. [8] A. Nourmohammadi, M. Jafari, and T. O. Zander, “A survey on unmanned [33] M. Liu, W. Wu, Z. Gu, Z. Yu, F. Qi, and Y. Li, “Deep learning based on aerial vehicle remote control using brain–computer interface,” IEEETrans. batch normalization for P300 signal detection,”Neurocomputing, vol. 275, Human-Mach. Syst., vol. 48, no. 4, pp. 337–348, Aug. 2018. pp. 288–297, 2018. [9] A. Rakotomamonjy and V. Guigue, “BCI competition III: Dataset II – [34] F. Li, X. Li, F. Wang, D. Zhang, Y. Xia, and F. He, “A novel P300 classi- ensemble of SVMs for BCI P300 speller,” IEEE Trans. Biomed. Eng., fication algorithm based on a principal component analysis-convolutional vol. 55, no. 3, pp. 1147–1154, Mar. 2008. neural network,” Appl. Sci., vol. 10, no. 4, 2020, Art. no. 1546. [10] M. Thulasidas, C. Guan, and J. Wu, “Robust classification of EEG signal [35] M. Kotas, J. Jeżewski, K. Horoba, and A. Matonia, “Application of for brain-computer interface,” IEEE Trans. Neural Syst. Rehabil. Eng., spatio-temporal filtering to fetal electrocardiogram enhancement,” Com- vol. 14, no. 1, pp. 24–29, Mar. 2006. put. Methods Progr. Biomed., vol. 104, pp. 1–9, 2010. 686 IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 52, NO. 4, AUGUST 2022 [36] J. Giraldo-Guzmán, M. Kotas, F. Castells, S. H. Contreras-Ortiz, and [52] R. Vigario, J. Sarela, V. Jousmiki, M. Hamalainen, and E. Oja, “Indepen- M. Urina-Triana, “Estimation of PQ distance dispersion for atrial fib- dent component approach to the analysis of EEG and MEG recordings,” rillation detection,” Comput. Methods Progr. Biomed., vol. 208, 2021, IEEE Trans. Biomed. Eng., vol. 47, no. 5, pp. 589–593, May 2000. Art. no. 106167. [53] E. J. Berbari, S. M. Collins, and R. Arzbaecher, “Evaluation of esophageal [37] S. Lemm, B. Blankertz, G. Curio, and K-R. Müller, “Spatio-spectral filters electrodes for recording his-purkinje activity based upon signal variance,” for improving the classification of single trial EEG,” IEEE Trans. Biomed. IEEE Trans. Biomed. Eng., vol. BME-33, no. 10, pp. 922–928, Oct. 1986. Eng., vol. 52, no. 9, pp. 1541–1548, Sep. 2005. [38] G. Dornhege, B. Blankertz, M. Krauledat, F. Losch, G. Curio, and K. Muller, “Combined optimization of spatial and temporal filters for im- proving brain-computer interfacing,” IEEE Trans. Biomed. Eng., vol. 53, no. 11, pp. 2274–2281, Nov. 2006. [39] F. Qi, Y. Li, and W. Wu, “RSTFC: A novel algorithm for spatio-temporal filtering and classification of single-trial EEG,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 12, pp. 3070–3082, Dec. 2015. [40] A. Jiang, J. Shang, X. Liu, Y. Tang, H. K. Kwan, and Y. Zhu, “Efficient CSP Marian P. Kotas received the M.Sc., Ph.D., and D.Sc. degrees in biomedical algorithm with spatio-temporal filtering for motor imagery classification,” electronics from the Silesian University of Technology, Gliwice, Poland, in 1991, IEEE Trans. Neural Syst. Rehabil. Eng., vol. 28, no. 4, pp. 1006–1016, 1996, and 2012, respectively. Apr. 2020. He is currently an Associate Professor with the Department of Cybernetics, [41] F. Qi et al., “Spatiotemporal-filtering-based channel selection for single- Nanotechnology and Data Processing, Silesian University of Technology. His trial EEG classification,” IEEETrans. Cybern., vol. 51, no. 2, pp. 558–567, current research interests include linear and nonlinear filtering of biomedical Feb. 2021. signals, multivariate data processing and pattern recognition. [42] J. Lu, K. Xie, and D. J. McFarland, “Adaptive spatio-temporal filtering for movement related potentials in EEG-based brain–computer interfaces,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 22, no. 4, pp. 847–857, Jul. 2014. [43] Y. Zhang, G. Zhou, Q. Zhao, J. Jin, X. Wang, and A. Cichocki, “Spatial- temporal discriminant analysis for ERP-based brain-computer interface,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 21, no. 2, pp. 233–243, Mar. 2013. [44] S. M. Kay, “Fundamentals of statistical signal processing. detection the- Michal Piela received the B.S. and M.Sc. degrees in biomedical engineering ory,” in Signal Process. Ser.. Hoboken, NJ, USA: Prentice-Hall, vol. 2, from the University of Science and Technology, Kraków, Poland, in 2010 and 1998. 2011, respectively. He is currently working toward the Ph.D. degree with the [45] R. Bro, E. Acar, and Tamara G. Kolda, “Resolving the sign ambiguity Department of Cybernetics, Nanotechnology and Data Processing, Silesian in the singular value decomposition,” J. Chemometrics, vol. 22, no. 2, University of Technology, Gliwice, Poland. pp. 135–140, 2008. His current research interests include biomedical signal processing, brain- [46] G. D. Dawson, “A summation technique for detecting small signals in a computer interfaces, and evoked potentials detection. large irregular background,” J. Physiol., vol. 115, no. 1, 1951. [47] D. Garcia, “Robust smoothing of gridded data in one and higher dimen- sions with missing values,” Comput. Statist. Data Anal., vol. 54, no. 4, pp. 1167–1178, 2010. [48] M. J. Buckley, “Fast computation of a discretized thin-plate smoothing spline for image data,” Biometrika, vol. 81, no. 2, pp. 247–258, 1994. [49] G. M. Foody, “Thematic map comparison: Evaluating the statistical sig- nificance of differences in classification accuracy,” Photogrammetric Eng. Remote Sens., vol. 70, no. 5, pp. 627–633, 2004. Sonia H. Contreras-Ortiz received the B.S. and M.Sc. degrees in electronic [50] A. Agresti, An Introduction to Categorical Data Analysis. Wiley, 2018. engineering from Universidad Industrial de Santander, Bucaramanga, Colom- [51] M. N. Anastasiadou, M. Christodoulakis, E. S. Papathanasiou, S. S. bia, in 2001, M.Sc. in 2004 respectively, and the Ph.D. degree in biomedical Papacostas, and G. D. Mitsis, “Unsupervised detection and removal of engineering from University of Connecticut, Storrs, CT, USA, in 2011. muscle artifacts from scalp EEG recordings using canonical correlation She is currently the Director of the Biomedical Engineering program with analysis, wavelets and random forests,” Clin. Neuriophysiol., vol. 128, Universidad Tecnológica de Bolívar, in Cartagena de Indias, Colombia. Her no. 9, pp. 1755–1769, 2017. research interests include biosignal and medical image processing and analysis.