A variety of storage systems with varying characteristics make up the data platform available to the ITS-RC community. As datasets increase in size, complexity, and/or sensitivity, more attention is required to ensure best match of resources to the tasks at hand.

/proj

  • High capacity storage
  • OK to compute against, however as IO increases, /work becomes the better option
  • OK to hold inactive data sets like a near-line archive
  • If you have 30TB or more of truly inactive data, we can assist in moving it to Cloud based cold archive, achiving greater data durability and lower storage costs
  • more info

/users
Your directory primary storage is: /users/{o}/{n}/{onyen}
This storage is provided by the same hardware as /proj

  • High capacity storage
  • OK to compute against, however as IO increases, consider copy/move to /work for processing
  • OK to hold inactive data sets like a near-line archive
  • If a meaningful amount of cold data accrues, it can be packaged and MOVED to cloud archive, providing more working space for your warm data
  • /users is not intended to be used for team oriented shared storage, like /proj; it is intended to be your personal storage location; think of it as a capacity expansion to your home directory. In this context, please note that /work is NOT intended to be a personal storage location; /work is for data actively being processed with high IO requirements. Please move any data from /work to /users that is not actively being computed upon.
  • 10 TB quota

/work

  • High performance storage
  • New as of 2022
  • Intended for active data only, aka hot data, actively being processed
  • /work is NOT intended for holding inactive data
  • Please MOVE inactive data to your /users directory
  • more info

/ms

  • High capacity
  • Read-only at this time
  • In process of migrating out of /ms
  • All data in /ms must be re-homed
  • more info

Cloud based cold data archive

  • The most durable, least expensive storage option available, presuming there is low probability of needing to retrieve/access frequently or at large data volumes

The following two systems are built upon old hardware that is no longer on maintenance, and thus at non-trivial risk of loss as compared to the storage systems listed above. This is the prior generation of /proj hardware; high capacity, decent performance.

/overflow

Useful for:

  • Data that can be re-acquired
  • Intermediate results of analysis and workflows
  • Staging area to re-organize data on its way somewhere else, such as Cloud based cold archive
  • etc.

/datacommons

  • Read only
  • Suggestions for data sets to add are welcome
  • Disposable data; if systems fail, data can be re-acquired

Examples of data sets in /datacommons:

  • 1,000 Genomes
  • Berkeley DeepDrive, bdd100k
  • SCAMPS Dataset, Camera Measurement of Physiology
  • Indoor Scene Recognition
  • Stanford Dogs

 

Last Update 4/16/2024 2:39:19 PM