EEG Data



Acknowledgments for this data should made to Henri Begleiter at the Neurodynamics Laboratory at the State University of New York Health Center at Brooklyn.

The data is being stored at the University of California, Irvine, at

References to this data should include

%A L. Ingber
%T EEG Database
%I UCI Machine Learning Repository
%C Irvine, CA
%D 1997


Tables of ASA-fitted parameters and 60 PostScript figures, each figure containing 4 6-electrode graphs of potential and CMI for testing and training data sets, for each of the 10 subjects for each of 3 experimental paradigms, are in


This 64-electrode 3.9-msec epoch (spanning 1 sec) recorded data is being provided as is, and may be deleted from this archive without notice, or moved to other archives with the only notice given in this file.

Sorry, I cannot assume the task of offering code to automate downloading, re-formatting or filtering of this data for your project; please consult your local computer experts for such help. The only code that I make publicly available and support is ASA (Adaptive Simulated Annealing), available at no charge from the archive. Contact me at [email protected] re consulting/contracting for modeling/optimizing data/systems. Terms of use and downloading policies are discussed in the file

I am not an expert in the clinical aspects of the alcoholism studies which generated this data, and therefore cannot offer advice on these clinical paradigms. For many studies, it will suffice that the data is clean raw EEG, and this should be useful to many people.

All data is compressed in gzip (.gz) format, resulting in a reduction of size of files by about a factor of four. See utils_file_formats.txt for some links to information on gzip, PostScript, PDF, tar, and shar utilties.


The data used in for the 3 matching paradigms, c_1 (one presentation only), c_m (match to previous presentation) and c_n (no-match to previous presentation), for the 2 subjects, alcoholic a_co2a0000364 and control c_co2c0000337, 10 runs each, may be downloaded as a tar'd directory containing 6 files, [4000K]


The complete data set used as described, 10 alcoholic and 10 control subjects, 10 runs per subject per paradigm, was used to produce a set of training results.

A follow-up study described in used the same 10 alcoholic and 10 control subjects, but with 10 out-of-sample runs per subject per paradigm (using a small percentage of training data to fill out 10 runs per subject per paradigm for 6 of the 60 cases), producing a set of test results.

This data can be downloaded as [39000K] [39000K]

Examine the header in each file to determine under which paradigm the run took place.


Each of the 11,075 typical gzip'd files in the 122 tar'd directories is about 65K. The entire set of data is about 700 MBytes. For example, a single file obtained from expanding tar'd directory co2c0000337.tar.gz would be co2c0000337/co2c0000337.rd.047.gz, where co[23]c... represents a control subject and co[23]a... represents an alcoholic subject. Examine the header in each file to determine under which paradigm the run took place. All these classifications for purposes of calculation are easily automated.

NOTE: There are 17 trials with empty files in co2c1000367. Some trials have "err" notices, e.g., search/grep for "err" and see "S2 match err" or "S2 nomatch err" etc.

Access to all these tar'd directories is available from

The full dataset in one tarfile can be downloaded as [701M]


Lester Ingber <[email protected]>
