HDFQS is a file format for Quantified Self data storage, which is:
- Easy to back up
- Based on an open and standard format
Many health monitors (e.g. FitBit) currently support exporting data in CSV or XML format. While these are open standard formats, they are generally much slower, and take up more space, when compared with binary file formats. They are also harder to search for data by time. Using Excel or OpenOffice allows for more ease of plotting data, however, they also suffer from problems with speed when the volume of data grows large.
Each type of data (e.g. Heart Rate, Accelerometer) is stored in a separate table. All tables have a time column, a timezone column, and any number of additional columns for the data values. Many data types will likely have just one additional column (e.g. a column for heart rate, weight, etc). Data types marking events (e.g. step times) will have no additional columns. Other data types (e.g. 3-axis accelerometer) will have multiple additional columns (e.g. x, y, z).
Time is stored as ns since the epoch, in UTC.
The timezone column specifies the timezone in which the data is collected. This allows calculation of the local time the data was collected, if required. Timezone is specified as the number of 15 minute blocks from UTC. This enables handling of certain timezones which are 15 minutes from a standard timezone, and minimizes the space required to store the value.
An HDFQS file can contain multiple tables. They are located in a hierarchical tree, similar to a filesystem.
The first level of hierarchy above the root is the location. All data from wearable devices are in the location “self”. Other data from fixed locations, such as temperature loggers in a particular room or building, would be in its respective location (e.g. “livingroom” or “office”).
The second level of hierarchy is the category. While any categories may be created, the basic categories which should cover most data sources are:
- Activity – anything related to movement, such as accelerometer, gyroscope.
- Environment – measurements of the environment such as temperature, humidity.
- Health – health-related data such as heart rate, ECG.
- Location – location data such as GPS, Wi-Fi networks in range.
- Social – data related to interactions with other people, such as dialed/received calls, bluetooth networks in range.
The data tables are located within the categories. They are named for what they measure (e.g. “temperature” or “ecg”), with common abbreviations used (e.g. “hr” for heart rate). As multiple devices can measure the same variable, the device name can optionally be appended with an underscore (e.g. “calories_gwf” or “calories_fitbit”).
Data in HDFQS can be spread over multiple HDF5 files. New data can either be appended to the most recent file, or a new file can be created. This allows for incremental backups, instead of dealing with a single huge file. Any backup tool may be used, but git annex is recommended, as it maintains checksums to protect the integrity of the individual files.