Introduction to Dummydata¶
Dummydata is a package that allows to generate geospatial data fields with predefined statistical properties and store these as netCDF files.
Installation¶
Currently the package is available from github and can be installed in addition via pip or conda .
using github¶
To install the package from the git sources, just do the following:
# to get the development version
cd <SOME TEMPORARY DIRECTORY>
wget https://github.com/pygeo/dummydata/archive/master.zip
unzip master.zip
cd dummydata-master
python setup.py install
using conda (not working yet)¶
To install via conda do the following:
conda install [-n YOURENV] -c conda-forge dummydata
How it works¶
Dummydata allows to generate either two dimensional data fiels with a time vector (e.g. sea surface temperature fields) or a 3D variable with an additional vertical coordinate.
Currently regular lat/lon grids are supported for coordinates.
A small example that generates a random dataset with dimensions (time, lat, lon) is provided as follows
from dummydata import Model2
# generate a 2D variable
M2 = Model2(start_year=2003,stop_year=2014)
This generates a monthly timeseries starting 1st of January 2003 and ending 31.12.2014. A netCDF file will be automatically generated and closed. To generate a field of vertical air temperture profiles a script would could look as follows:
from dummydata import Model3
# generate a 3D variable
M3 = Model3(var='ta', oname='air_temperature',start_year=1998,stop_year=2002)
This will generate a file air_temperature.nc from 1998 to 2002 with a variable named ta.
The dummy data which is generated includes common metadata for different variable types. The tool therefore contains already a set of predefined variables with predefined metadata. The current list of supported variables can be found in the file meta.py. In case a user wants to add additional variable options, the necessary metadata information has to be included in the dictionary specified in meta.py.
Characteristics and options¶
The following options are currently available:
- var : string : optional
- specifies the name of the variable to be generated; note that the variable name needs to be part of the defined variables in meta.py
- oname : string : optional
- name of netCDF output file to be generated
- start_year : int : obligatory
- start year for dataset to be generated
- stop_year : int : obligatory
- stop year for dataset to be generated
- method : string : obligatory
method to be used for data generation. At the moment the following options are supported:
- ‘uniform’ generates a white noise field
- ‘constant’: generates a field with constant values; the constant argument needs to be provided in that case as well.
- constant : float : obligatory when method=’constant’
- specifies the constant value to be used
- append_coordinates : bool
- specifies if fields with coordinates should be appended
- append_cellsize : bool
- specifies if fields with the cellsize information should be appended to the output file
Some further examples¶
from dummydata import Model2, Model3
# generate a 2D dataset with the value 5. everywhere
M2 = Model2(method='constant', constant=5., oname='myconst5',start_year=1998,stop_year=2002)
Current limitations¶
- only monthly sampling frequencies supported at the moment
- no min/max can be specified to specify the range of the values
- specification of metadata is currently rather limited and done in meta.py which is not very user friendly. As an alternative user specific configuration files could be used.