Planning your installation¶

Introduction to Installing Agave¶

To install Agave in production environments, Agave provides an installation method (the installer) implemented using Ansible playbooks. Familiarity with Ansible is assumed, however this installation guide will provide you with the information to help you create an inventory file that represents your environment and desired Agave Platform configuration, then run the installation using the Ansible CLI tooling.

Note

You can read more about Ansible and its basic usage in the [official documentation](http://docs.ansible.com/ansible/).

Initial Planning¶

When installing the Agave Platform for a production environment, several factors influence installation. Consider the following questions as you read through this guide:

What kind of traffic do you anticipate receiving? The Sizing Considerations section provides limits for hosts and containers based on traffic type and duration so you can calculate how large your environment needs to be.
How much data do you plan to move through the platform? The Sizing Considerations section provides limits for hosts and containers based on data access and throughput so you can calculate how large your environment needs to be.
What are the nature and duration of the computation you intend to run through Agave? The Sizing Considerations section provides guidelines for hosts and containers based on job throughput so you can determine how large your environment needs to be.
How many hosts do you require in the cluster? The Environment Scenarios section provides multiple examples of Single Master and Multiple Master configurations.
Is high availability (HA) for the platform or science API services required? High availability is recommended for fault tolerance. In this situation, you might aim to use the Multiple Auth hosts Using Native HA, Multiple Core service hosts, and Clustered Persistence example as a basis for your environment.
Which identity provider do you use for authentication? If you already use a supported identity provider, it is a best practice to configure Agave's Auth components to use that identity provider during installation.

On-premises Versus Cloud Providers¶

Agave can be installed on-premises or hosted on public or private clouds. Ansible playbooks can help you with automating the provisioning and installation processes. For information, see Running Installation Playbooks.

Sizing Considerations¶

Determine the nature and duration of the traffic, data, and computation your tenant is expected to support. Concurrent API requests, data movement, and job throughput influences the number of hosts and containers you will need in your setup. See Platform Limits for the latest guidelines on capacity planning for your tenant.

Environment Scenarios¶

This section outlines different examples of scenarios for your Agave environment. Use these scenarios as a basis for planning your own Agave Platform deployment, based on your sizing needs.

Note

Moving from a single auth host to HA after installation is not supported by the deployer at this time.

For information on updating labels, see Updating Labels on Nodes.

Auth, Core, and Persistence Components on One System¶

Agave can be installed on a single system for a development environment only. An all-in-one environment is not considered a production environment and should not be attempted on a laptop.

Auth and Persistence on a Single System, Core on Multiple Systems¶

The following table describes an example environment for a single auth host (with persistence services co-located) and two core nodes:

Host Name	Infrastructure Component to Install
auth.agave	Auth + persistence
api.core.agave	Science API frontend services
worker.core.agave	Science API backend worker

Single Auth, Single Persistence, and Multiple Core Systems¶

The following table describes an example environment for a single auth system, single persistence system, and two core API systems:

Host Name	Infrastructure Component to Install
auth.agave	Auth
db.agave	Persistence
api.core.agave	Science API frontend services
worker.core.agave	Science API backend worker

Note

This is the minimum recommended production configuration.

Multiple Auth hosts Using Native HA, Multiple Core service hosts, and Clustered Persistence¶

The following describes an example environment for two load balanced Auth systems running native HA, a standalone realtime service for websocket notfications, 2 independent 3 host MariaDB clusters, 3 host sharded MongoDB cluster, 3 host RabbitMQ cluster with mirrored queues, three load balanced Science API systems, and 2 backend Science API workers systems.

Host Name	Infrastructure Component to Install
apim[1-2].auth.agave	Auth with native HAP load balancer
lb.auth.agave	Auth load balancer
realtime.auth.agave	Cloud hosted streaming + push notification service
mongo[1-3].nosql.agave	MongoDB Shard, Collection, and Mongos Replica Set
auth[1-3].db.agave	Auth MariaDB cluster
core[1-3].db.agave	Core MariaDB cluster
rabbit[1-3].queue.agave	Message Queue
lb.core.agave	Science API load balancer
api[1-3].core.agave	Science API frontend services
worker[1-2].core.agave	Science API backend worker

Warning

Clustering and management of MongoDB, RabbitMQ, and MariaDB are beyond the scope of this document.

Multiple Auth hosts Using Native HA, Multiple Core service hosts, and Clustered Persistence¶

The following describes an example environment for HA Auth with external IS, APIM, and tenant services, a standalone realtime service for websocket notfications, 2 independent 3 host MariaDB clusters, 3 host sharded MongoDB cluster, 3 host RabbitMQ cluster with mirrored queues, three load balanced Science API systems, and 2 backend Science API workers systems.

Host Name	Infrastructure Component to Install
apim[1-2].auth.agave	Auth with native HAP load balancer
is[1-2].auth.agave	IS + Key manager
gateway[1-2].auth.agave	Gateway traffic manager
api.auth.agave	Tenant services
lb.auth.agave	Auth load balancer
realtime.auth.agave	Cloud hosted streaming + push notification service
mongo[1-3].nosql.agave	MongoDB Shard, Collection, and Mongos Replica Set
auth[1-3].db.agave	Auth MariaDB cluster
core[1-3].db.agave	Core MariaDB cluster
rabbit[1-3].queue.agave	Message Queue
lb.core.agave	Science API load balancer
api[1-3].core.agave	Science API frontend services
worker[1-2].core.agave	Science API backend worker

Multiple Auth hosts Using Native HA, Multiple Core service hosts, and Hosted Persistence¶

The following describes an example environment for HA Auth with external IS, APIM, and tenant services, a standalone realtime service for websocket notfications, 2 independent 3 host MariaDB clusters, 3 host sharded MongoDB cluster, 3 host RabbitMQ cluster with mirrored queues, three load balanced Science API systems, and 2 backend Science API workers systems.

Host Name	Infrastructure Component to Install
apim[1-2].auth.agave	Auth with native HAP load balancer
is[1-2].auth.agave	IS + Key manager
gateway[1-2].auth.agave	Gateway traffic manager
api.auth.agave	Tenant services
api[1-3].core.agave	Science API frontend services
worker[1-2].core.agave	Science API backend workers

Host Name	Hosted infrastructure component
auth-1234567890.region.elb.amazonaws.com	Auth load balancer
core-1234567890.region.elb.amazonaws.com	Science API load balancer
12345.fanout.io	Cloud hosted streaming + push notification service
atlas.us-east-1.compute.amazonaws.com	Atlas Managed MongoDB cluster
mariadb.12345.us-east-1.rds.amazonaws.com	Shared Amazon RDS MariaDB cluster
mq-aws-eu-west-1-1.iron.io	IronMQ Message Queue

File Path Locations¶

All Agave Platform configuration files are placed in /home/apim directory during installation and will survive os upgrades. Unless logging is configuration to write to syslog, log data will be written to the /var/log/agave directory, and managed via the logrotate utility.

Storage Requirements¶

Agave's Science APIs can, at times, cache significant data to the local disk during transfers and transformation requests. The data will be written to the /home/apim/scratch directory and cleaned up after each operation. Ensure that you have enough space on the root file system before deploying the Science APIs, or configure an alternative location to use as scratch space. See the System Requirements section for details.