Cloud Archive Storage
Table of Contents
About our cloud archive services
About our cloud archive storage service
ITS Research Computing offers Microsft Azure cloud archive storage for PIs, labs, groups and users who have data collections that are:
- of a relatively large size (e.g., > 1 TB) of data that's ready to be moved to archived in the cloud
- not in need of retrieval for the near future (data are stored in the "archive" tier)
- well-organized with a meaningful structure, naming conventions, etc.
- in a format according to the prescribed standards and accompanied by the requisite metadata and catalog of contents
There are typically no up-front charges for basic cloud archive storage. Any potential charges would likely be for inappropriate/early or frequent retrieval of data, as bringing data back from the archive tier is costly and not immediate.
At this time we do not allow direct access to archived data; writes and retrieval requests are handled by Research Computing.
Getting started
We are treating cloud archive storage requests on a case-by-case basis. Before acting on data moves, we first need to discuss the nature of the data and user expectations. We'll assess the candidate collection and determine if it's organized in a logical manner, formatted correctly, and plan how it will be curated.
To get started, please contact us at research [at] unc.edu
Preparing collections for storage
As an example, consider a lab named Tarheel has a shared directory named /proj/tarheellab. Lab workers and the PI have identified what needs to be archived, and moved it into a directory named stuff_to_archive. First, inspect your collection to ensure that you'll only necessary retain necessary data. Next your collection will need to be put into a format appropriate for cloud archiving.
Generally, the following steps are required:
Use the tar command to bundle your collection into one or more easily distributable files
Provide a text file that lists the structure and contents of each tar file
Using tar command to create stuff_to_archive.tar.gz:
tar -czvf stuff_to_archive.tar.gz /proj/tarheellab/stuff_to_archive
Creating a text file that lists the contents of stuff_to_archive.tar.gz:
tar -tvf stuff_to_archive.tar.gz > stuff_to_archive_tarlist.txt
To reduce future retrieval cost and effort, try to break up larger collections into meaningful sets, and manageably-sized archive bundles.
Other options available at UNC-CH
In addition to Research Computing's cloud archive storage services, please note that the following services should be checked to see if they are more appropriate for your data and storage requirements:
SHIRE: The Secure Health Informatics Research Environment is a new secure cloud computing environment built through a partnership between UNC School of Medicine, NC TraCS Institute, and UNC Health. The SHIRE is a fullfeatured analytics platform where users can securely work with sensitive data from UNC Health’s electronic health record (EHR) https://www.med.unc.edu/shire/
RMDC: The Research Data Management Core, a tools and services to support data management and sharing, including consultations and training, data curation and archiving, and tools and infrastructure. https://researchdata.unc.edu/
CDR: The Carolina Digital Repository. The Carolina Digital Repository (CDR) is a digital archive for scholarly materials produced by members of the University of North Carolina at Chapel Hill community. The main goal of the CDR is to keep UNC digital scholarly output safe, accessible and discoverable for as long as needed. https://cdr.lib.unc.edu
See also
Introduction to the topic of digital preservation: https://en.wikipedia.org/wiki/Digital_preservation
ITS tips on avoiding digital clutter: https://its.unc.edu/2024/08/29/dont-be-a-data-hoarder/
Microsoft document describing blob storage tiers: https://learn.microsoft.com/en-us/azure/storage/blobs/access-tiers-overview