Arvados
Unified Data and Workflow Management
Arvados is a modern open source platform for managing and processing large biomedical data.
By combining robust data and workflow management capabilities in a single platform, Arvados can organize and analyze petabytes of data and run reproducible and versioned computational workflows. Arvados supports the entire data life cycle, from acquisition to analysis, secure sharing, auditing, and reuse.
Data Management
Manage large biomedical data sets for everything from genomics to imaging. Version and track large data sets including both raw and metadata. Keep data accessible, avoid unnecessary data duplication, and ensure data integrity.
Workflow Engine
Run, manage and monitor scalable, portable, production-ready workflows. Use the Arvados orchestration system with Docker and Singularity to reliably run containerized workflows at scale.
Reproducibility
Track, record, and reproduce complex workflows on large datasets. Reliably re-run and verify previous workflows and retrieve workflow inputs and outputs.
Scale
Perform data analysis at scale, on-premises or in the cloud. Manage petabytes of data and scale on demand to run workflows that use thousands of cores of compute simultaneously.
Flexible Deployment
Run Arvados anywhere — in the cloud on AWS, Azure and GCP, as well as on premises and hybrid clusters. Work on sequestered data and avoid expensive data transfers using multi-cluster federation.
Secure Data and Sharing
Collaborate on projects by selectively and securely sharing data and workflows. Comply with data protection regulations.
With Arvados, bioinformaticians perform computationally intensive data analysis and machine learning, developers create biomedical applications, and IT administrators manage large compute and storage resources
Researchers
Use Arvados to manage large datasets and scale your analysis workflows.
- Automatically create detailed records of every computation: inputs, outputs, container images, workflow versions, parameters, and compute requirements.
- Organize all your data, workflows, computational runs, and results into projects.
- Harness thousands of CPUs and GPUs for data analysis, and track and manage petabytes of data.
- Tap the knowledge of the global scientific community and enable portability with Common Workflow Language standards.
- Safely share your datasets and workflows with colleagues and collaborators.
Developers
Use Arvados to productionize and scale workflows, create biomedical applications, and build secure services that integrate with existing infrastructure.
- Work with petabytes of data with great performance, fault tolerance, versioning, and automatic data integrity checking.
- Productionize your workflows using containerization, metadata, and dynamic scaling.
- Track every version of every workflow run and avoid unnecessary re-running of workflow tasks.
- Integrate, automate, and build on Arvados with your systems through command line tools, REST API and SDKs for Python, Java, Go, Ruby, and R.
- Avoid being locked into a proprietary system or stuck in a black box by using 100% open source software.
System Administrators
Use Arvados to run efficient and secure scalable workflows, provide flexible data management, meet end-user requirements, and manage costs.
- Run in your own cloud or on-premise infrastructure, backed by both community and commercial support.
- Ensure compliance with security and regulatory standards with robust data access permissions and support for Single-Sign-On.
- Lower costs with automatic data duplication,starting/stopping compute nodes on demand, and using preemptible (spot) instances.
- Track the history and performance of every workflow run on your system and trace detailed usage data back to each individual user.
- Use 100% open source developed in public, backed by strong commercial support so you will never be locked into a proprietary system.
Executives
Achieve your goals with a proven highly scalable analysis platform while maintaining security and costs.
- Get a proven, off the shelf solution for your big data and scientific analysis problems.
- Boost your team with a platform that maximizes productivity by streamlining the process of productionizing, deploying, and scaling data analysis workflows.
- Ensure security and regulatory compliance by controlling and auditing access to data.
- Lower costs with automatic data duplication, starting/stopping compute nodes on demand, and using preemptible (spot) instances.
- Unlock the potential of large, high-value datasets by establishing a data commons or data lake to make scientific data available across your organization.
Pharmaceutical companies, biotech startups, and research institutions are using Arvados today for clinical sequencing, drug development, and diagnostic testing.
Drug Discovery
Accelerate new drug identification and validation
Pharmaceutical companies are using Arvados to harness big data for therapeutic uses and drug discovery. Arvados helps:
- Develop reproducible, scalable and efficient processes for big data analysis.
- Run sophisticated analytical and machine learning workflows using tools such as Alphafold and Tensorflow.
- Efficiently track, organize and tag datasets using metadata.
Biomedical Imaging
Integrate management, analysis, and storage of biomedical data
The increasing volume and variety of biomedical image data brings new challenges for data management, analysis, and storage. Arvados helps:
- Aggregate and manage petabytes of high-resolution imaging data.
- Organize and process data through efficient automated workflows reducing error.
- Create a multifunctional platform that integrates tools to analyze image data and serves interactive viewers while maintaining access control, metadata, and data organization.
Biomedical Data Commons
Remove silos and support new biomedical discoveries
Arvados helps create biomedical data commons that:
- Aggregate and search across sequencing, omics, and imaging data hosted on different Arvados clusters.
- Run scalable, interoperable workflows to analyze and harmonize petabytes of raw data.
- Leverage FAIR guiding principles including the ability to identify datasets using persistent digital IDs data and metadata.
NGS Sequencing and Clinical Testing
Run production, scalable, clinical workflows with large datasets
Arvados helps clinical labs, biotech start-ups and research institutions:
- Automate an end-to-end workflow for high throughput sequencing from data collection to analysis.
- Run production workflows for sequencing and clinical research applications including for whole genomes, exomes, and targeted panels.
- Implement CAP and CLIA compliant bioinformatics solutions for clinical sequencing and diagnostic testing services.
Try It Out
Take a look at the public Arvados playground, a free-to-use installation of Arvados for evaluation and trial use.
Install
Learn how to install Arvados from the installation page in the documentation. Arvados supports AWS, GCP and Azure cloud platforms as well as on-premises installs.
Get Help
Both community and enterprise support are available for Arvados. Curii Corporation (info@curii.com) provides managed installations as well as commercial support.