Introduction to Dummydata¶

Dummydata is a package that allows to generate geospatial data fields with predefined statistical properties and store these as netCDF files.

Installation¶

Currently the package is available from github and can be installed in addition via pip or conda .

using github¶

To install the package from the git sources, just do the following:

# to get the development version
cd <SOME TEMPORARY DIRECTORY>
wget https://github.com/pygeo/dummydata/archive/master.zip
unzip master.zip
cd dummydata-master
python setup.py install

using pip¶

To install the package using pip, just do the following:

pip install dummydata

using conda (not working yet)¶

To install via conda do the following:

conda install [-n YOURENV] -c conda-forge dummydata

How it works¶

Dummydata allows to generate either two dimensional data fiels with a time vector (e.g. sea surface temperature fields) or a 3D variable with an additional vertical coordinate.

Currently regular lat/lon grids are supported for coordinates.

A small example that generates a random dataset with dimensions (time, lat, lon) is provided as follows

from dummydata import Model2

# generate a 2D variable
M2 = Model2(start_year=2003,stop_year=2014)

This generates a monthly timeseries starting 1st of January 2003 and ending 31.12.2014. A netCDF file will be automatically generated and closed. To generate a field of vertical air temperture profiles a script would could look as follows:

from dummydata import Model3

# generate a 3D variable
M3 = Model3(var='ta', oname='air_temperature',start_year=1998,stop_year=2002)

This will generate a file air_temperature.nc from 1998 to 2002 with a variable named ta.

The dummy data which is generated includes common metadata for different variable types. The tool therefore contains already a set of predefined variables with predefined metadata. The current list of supported variables can be found in the file meta.py. In case a user wants to add additional variable options, the necessary metadata information has to be included in the dictionary specified in meta.py.

Characteristics and options¶

The following options are currently available:

var : string : optional

specifies the name of the variable to be generated; note that the variable name needs to be part of the defined variables in meta.py

oname : string : optional

name of netCDF output file to be generated

start_year : int : obligatory

start year for dataset to be generated

stop_year : int : obligatory

stop year for dataset to be generated

method : string : obligatory

method to be used for data generation. At the moment the following options are supported:

‘uniform’ generates a white noise field
‘constant’: generates a field with constant values; the constant argument needs to be provided in that case as well.

constant : float : obligatory when method=’constant’

specifies the constant value to be used

append_coordinates : bool

specifies if fields with coordinates should be appended

append_cellsize : bool

specifies if fields with the cellsize information should be appended to the output file

Some further examples¶

from dummydata import Model2, Model3

# generate a 2D dataset with the value 5. everywhere
M2 = Model2(method='constant', constant=5., oname='myconst5',start_year=1998,stop_year=2002)

Current limitations¶

only monthly sampling frequencies supported at the moment
no min/max can be specified to specify the range of the values
specification of metadata is currently rather limited and done in meta.py which is not very user friendly. As an alternative user specific configuration files could be used.