The PAF Prediction Challenge Database

This database is described in

Moody GB, Goldberger AL, McClennen S, Swiryn SP. Predicting the Onset of Paroxysmal Atrial Fibrillation: The Computers in Cardiology Challenge 2001. Computers in Cardiology 28:113-116 (2001).

Please cite this publication when referencing this material, and also include the standard citation for PhysioNet:

Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 101(23):e215-e220 [Circulation Electronic Pages;]; 2000 (June 13).

Update (16 March 2001): Four of the records in the learning set have been replaced; these are p02, n24, n47, n48. Thanks to Isaac Henry, Christoph Maier, Joseph Mietus, and Juan Millet for their timely and valuable feedback on this database.

This database of two-channel ECG recordings has been created for use in the Computers in Cardiology Challenge 2001, an open competition with the goal of developing automated methods for predicting paroxysmal atrial fibrillation (PAF). See the challenge announcement for information about the competition, and see Predicting Onset of Atrial Fibrillation for a brief overview of the clinical problem, its significance, and suggestions for further reading on the subject.

The database is divided into a learning set (records with names of the form n* and p*) and a test set (records with names of the form t*).

If you wish to download all of the files in this directory without selecting each one individually, try using a utility for batch HTTP transfers such as wget, available here in source form for all versions of UNIX and as a precompiled binary for MS-Windows. Most Linux distributions include wget. Once you have installed wget, retrieve these files using

wget -r -np

(or substitute the name of a nearby PhysioNet mirror for above). The files in this directory occupy about 195 megabytes.

The learning set consists of 50 record sets. Each record set contains two 30-minute records with consecutive record names (e.g., p15 and p16), and two 5-minute ``continuation'' records with names ending in c (e.g., p15c and p16c). All four records in each record set are excerpts of longer continuous ECG recordings of a single subject; the 50 record sets come from 48 different subjects.

The records with names beginning with p come from subjects who have paroxysmal atrial fibrillation (PAF). The second (even-numbered) record in each pair of 30-minute records contains the ECG immediately preceding an episode of PAF, which can be verified by examining the like-numbered continuation record. Thus, for example, record p16 immediately precedes the episode of PAF in record p16c. The first (odd-numbered) record of the set (for example, record p15) contains 30 minutes of the ECG during a period that is distant from any episode of PAF (there is no PAF during the 45-minute period before the beginning or after the end of the 30-minute record). The corresponding 5-minute continuation record (e.g., record p15c) shows that (at least!) the minutes immediately following the ``PAF-distant'' record do not contain PAF. Note: Please be aware that a few of the 30-minute records in this group may contain very short bursts of PAF that escaped notice while the learning set was being compiled.

The records with names beginning with n come from subjects who do not have documented atrial fibrillation, either during the period from which the records were excerpted or at any other time. The subjects include healthy controls, patients referred for long-term ambulatory ECG monitoring, and patients in intensive care units.

The test set is similarly constructed of 50 record sets (from 50 different subjects); unlike the learning set, there are no continuation records. The test set records are named t01, t02, ... t100. As in the learning set, pairs of consecutively numbered records come from the same long-term ECG recording of a single subject. Approximately half of the record sets in the test set come from subjects with PAF; part 1 of the challenge is to identify these record sets, and part 2 is to identify which record in each pair immediately precedes PAF.

Several files are associated with each record. The files with names of the form *.dat contain the digitized ECGs (16 bits per sample, least significant byte first in each pair, 128 samples per signal per second, samples from each channel alternating, nominally 200 A/D units per millivolt). The .hea files are (text) header files that specify the names and formats of the associated signal files; these header files are needed by the software available from this site. The .qrs files are machine-generated (binary) annotation files, provided for the convenience of those who do not wish to use their own QRS detectors. Please note that the .qrs files are unaudited and contain errors. You may wish to correct these errors (if you do, please send your corrections to us). Otherwise, you may use these annotations in uncorrected form if you wish to investigate methods of PAF prediction that are robust with respect to small numbers of QRS detection errors, or you may ignore these annotations entirely and work directly from the signal files.

You may wish to begin your exploration of this database using the PhysioBank ATM, which allows you to view any of these records using your web browser (no special software is required). PhysioToolkit includes a large amount of free software that may be useful for studying these records further.


Special thanks to Steven Swiryn of Northwestern University, who provided many of the recordings excerpted here.