A variety of storage systems with varying characteristics make up the data platform available to the ITS-RC community. As datasets increase in size, complexity, and/or sensitivity, more attention is required to ensure best match of resources to the tasks at hand.
/proj
- High capacity storage
- OK to compute against, however as IO increases, /work becomes the better option
- OK to hold inactive data sets like a near-line archive
- If you have 30TB or more of truly inactive data, we can assist in moving it to Cloud based cold archive, achiving greater data durability and lower storage costs
- more info
/users
Your directory primary storage is: /users/{o}/{n}/{onyen}
This storage is provided by the same hardware as /proj
- High capacity storage
- OK to compute against, however as IO increases, consider copy/move to /work for processing
- OK to hold inactive data sets like a near-line archive
- If a meaningful amount of cold data accrues, it can be packaged and MOVED to cloud archive, providing more working space for your warm data
- /users is not intended to be used for team oriented shared storage, like /proj; it is intended to be your personal storage location; think of it as a capacity expansion to your home directory. Note that /work is NOT intended to be a personal storage location; /work is for data actively being processed, especially for workloads with high IO requirements.
- 10 TB quota
/work
- High performance storage
- New as of 2022
- Intended for active data only, aka hot data, actively being processed
- /work is NOT intended for holding inactive data
- Please MOVE inactive data to your /users directory
- more info
/ms
- High capacity
- Read-only at this time
- In process of migrating out of /ms
- All data in /ms must be re-homed
- more info
Cloud based cold data archive
- The most durable, least expensive storage option available, presuming there is low probability of needing to retrieve/access frequently or at large data volumes
The following two systems are built upon old hardware that is no longer on maintenance, and thus at non-trivial risk of loss as compared to the storage systems listed above. This is the prior generation of /proj hardware; high capacity, decent performance.
/overflow
Useful for:
- Data that can be re-acquired
- Intermediate results of analysis and workflows
- Staging area to re-organize data on its way somewhere else, such as Cloud based cold archive
- etc.
/datacommons
- Read only
- Suggestions for data sets to add are welcome
- Disposable data; if systems fail, data can be re-acquired
Examples of data sets in /datacommons:
- 1,000 Genomes
- Berkeley DeepDrive, bdd100k
- SCAMPS Dataset, Camera Measurement of Physiology
- Indoor Scene Recognition
- Stanford Dogs
Last Update 11/21/2024 1:36:46 AM