Renku team is currently designing and preparing a new feature to simplify access to cloud storage, both from Renkulab and Renku CLI.
Motivation: improving users’ experience with handling data
While Renku already supports mounting S3 buckets into sessions and using them for datasets, these features are disparate, not easy to use and don’t cover that many use cases. Even more, Git-LFS has important limitations, such as size limits, requirement of storing data twice on disk, and other quality of life issues around using it.
Goal: facilitate the accessibility and reusability of large data in Renku
Renku users’ should easily access cloud storage (initially, S3 and Azure Blob Storage) to simplify working with and sharing large amounts of data in Renku projects.
As an initial step, Renku plans to provide a unified interface for associating cloud storage with a project (independent of datasets), which makes it easy to get started with using cloud storage. This step includes:
- storing cloud storage information at project level
- prompting for credentials if necessary
- automatically mounting storage associated with a project
- having the CLI and the UI interact with this in a transparent and consistent way
After the aforementioned first step in our quest to improve usage of cloud storage in Renku, we also consider further improvements. The plan ahead will likely include additional elements, such as:
- managing credentials for users,
- seamless integration of cloud storage with datasets and workflows functionality
- easy reuse of cloud storage in organizations/across projects.
This plan is not closed yet!
You can see the current design document on this Pull Request.
You are very welcome to comment on it, and provide us with any feedback, suggestions or concerns you might have.