A variety of storage systems with varying characteristics make up the data platform available to the ITS-RC community. As datasets increase in size, complexity, and/or sensitivity, more attention is required to ensure best match of resources to the tasks at hand.

/proj

  • High capacity storage
  • OK to compute against, however as IO increases, /work becomes the better option
  • OK to hold inactive data sets like a near-line archive
  • If you have 30TB or more of truly inactive data, we can assist in moving it to Cloud based cold archive, achiving greater data durability and lower storage costs

/work

  • High performance storage
  • New as of 2022
  • Intended for active data only, aka hot or warm data
  • more info

/pine

  • /pine will be placed into READ ONLY mode on May 23, 2023 -- all running jobs attempting to write to /pine begining on this date will fail. Since the files can no longer be modified or "touched", all data will be deleted from the /pine file system according to the scheduled deletion script cleaning out files of the maximum age.

  • High performance storage

  • Near end of life hardware

/ms

  • High capacity
  • Read-only at this time
  • more info

Cloud based cold data archive

  • The most durable, least expensive storage option available, presuming there is low probability of needing to retrieve/access frequently or at large data volumes

The following two systems are built upon old hardware that is no longer on maintenance, and thus at non-trivial risk of loss as compared to the storage systems listed above. This is the prior generation of /proj hardware; high capacity, decent performance.

/overflow

Useful for:

  • Data that can be re-acquired
  • Intermediate results of analysis and workflows
  • Staging area to re-organize data on its way somewhere else, such as Cloud based cold archive
  • etc.

/datacommons

  • Read only
  • Suggestions for data sets to add are welcome
  • Disposable data; if systems fail, data can be re-acquired

Examples of data sets in /datacommons:

  • 1,000 Genomes
  • Berkeley DeepDrive, bdd100k
  • SCAMPS Dataset, Camera Measurement of Physiology
  • Indoor Scene Recognition
  • Stanford Dogs

 

Last Update 6/4/2023 9:45:42 AM