CyberShake Data

From SCECpedia
Jump to navigationJump to search

This page provides an overview of CyberShake Data, and how to access it.

CyberShake data can be broken down into the following elements based on when it is used in simulations:

  1. Input data needed for CyberShake runs, such as which ruptures go with which site. This information is stored in the CyberShake database.
  2. Temporary data generated during CyberShake production runs. This data remains on the cluster and is purged.
  3. Output data products generated by CyberShake runs. This data is transferred from the cluster to SCEC disks, and some of it is inserted into the CyberShake database for quick access.

We will focus on (1) and (3).

CyberShake database overview

CyberShake data is served through 2 on-disk relational database servers running MySQL/MariaDB, and an SQLite file for each past study.

MySQL/MariaDB Databases

The two databases used to store CyberShake data are focal.usc.edu ('focal') and moment.usc.edu ('moment').

Examples of accessing data stored in these databases can be found at Accessing_CyberShake_Database_Data.

Moment DB

Moment is the production database server. Currently, it maintains all the necessary inputs, metadata on all CyberShake runs, and results for Study 15.12 and Study 17.3.

Read-only access to moment is:

host: moment.usc.edu
user: cybershk_ro
password: CyberShake2007
database: CyberShake

Focal DB

Focal is the database server for external user queries. We plan to remove all but the most recent few studies from focal, but this is still in progress, so for now focal has all inputs, metadata, and results up through Study 15.12.

Read-only access to focal is:

host: focal.usc.edu
user: cybershk_ro
password: CyberShake2007
database: CyberShake

SQLite files

CyberShake input data

At the beginning of a CyberShake run, the database is queried to determine site information (name, latitude, longitude). This can be found in the CyberShake_Sites table.

The database is also used to determine which ruptures fall within the 200 km cutoff. This information is used to construct the necessary volume and select the correct rupture files for processing. This can be found in the CyberShake_Site_Ruptures table, which contains a list of ruptures for each site which fall within a given cutoff.

Both of these tables are populated by Kevin when we select new sites for CyberShake processing.

CyberShake output data

CyberShake runs produce the following output data, divided into data staged back from the cluster, and local data products:

Data staged from cluster:

  • Seismograms
  • Peak spectral acceleration, X and Y component and geometric mean
  • RotD results (for some studies)
  • Duration results (for some studies)

Local data products:

  • Hazard curves
  • Disaggregation results
  • Hazard maps

CyberShake data staged from cluster

The data products below are all generated on the remote system, then staged back to SCEC storage as part of the workflow. Some of these data products are inserted into the database.

Seismograms

Seismogram access is detailed at Accessing CyberShake Seismograms.

Acceleration Data

In CyberShake, we have two kinds of acceleration intensity measure data:

  1. X and Y component and geometric mean data
  2. RotD50 and RotD100 data (since Study 15.4).

Accessing this data depends on which periods you want, as some of it is in the database, and the rest of it is in files. Accessing this data is detailed at Accessing CyberShake Peak Acceleration Data.

Duration

Duration metric data was populated to the database for Study 15.12 (a study with stochastic components) but not for Study 17.3. Accessing duration data also depends on if you want what's in the database or not. Details on this are available in Accessing CyberShake Duration Data.

CyberShake data products generated locally

These data products are generated locally, on shock.usc.edu, in the final stages of the workflow.

Hazard Curves

Hazard curves are produced by combining the intensity measure data in the database, at a certain period, with the probability of each event. The code for performing this is part of OpenSHA.

Hazard curves are located in the directory /home/scec-00/cybershk/opensha/curves/<site short name>. The convention for the hazard curve name for a particular run, component, and period is:

<site short name>_ERF<erf ID>_Run<Run ID>_SA_<period>sec_<component>_<yyyy>_<mm>_<dd>.<png or pdf>

Note that the year, month, and day are when the run was completed, not when the hazard curve is produced.

In general, hazard curves are automatically generated for the same periods which are inserted into the database.

Disaggregations

Disaggregations calculate how much each CyberShake source (by 'source' we mean UCERF source) contributes to the overall hazard at a certain point on the hazard curve.

Disaggregations are automatically performed at an exeedance probability of 4e-4 (2% in 50 years). These disaggregation files are available in /home/scec-00/cybershk/opensha/disagg . The convention for the disaggregation file name is:

<site short name>_ERF<erf ID>_Run<Run ID>_DisaggPOE_<probability level>_SA_<period>sec_<yyyy>_<mm>_<dd>.<pdf, png, or txt>

Note that the year, month, and day are when the run was completed, not when the hazard curve is produced.

The PDF and PNG files are images, showing a breakdown of what magnitude events at what distance contributed to the hazard. The PDF and text files also have a numerical breakdown, by source, of the percent contribution.

Hazard Maps

Hazard maps calculate the hazard for a region, by sampling many hazard curves at a certain probability or IM level, calculating the difference between that curve and a GMPE basemap, and interpolating these differences on top of a GMPE basemap.

Hazard maps are generated at the conclusion of each study by Kevin. Maps are posted on the wiki page for each study, under 'Data Products'.


Related Entries