What is Google Compute Engine?

  • Infrastructure as a Service (IaaS)
  • Virtual Machines, Networks, Storage at Google
  • Leverages Google

Google Cloud Platform

It has layers

  • Infrastructure:
    • Google Cloud Storage
    • Google Compute Engine
  • Platform: Google App Engine
  • Services:

Guiding Principles

  • Secure
  • Open and Flexible
  • Consistent
  • Proven
  • Enables an ecosystem

The Architecture

Moving parts and how they fit together

System Overview

  • Compute
  • Network
  • Storage
  • API & Tools

API Basics

  • JSON over HTTP
  • Main Resources (Nouns)
    • Projects
    • Instances
    • Networks and Firewalls
    • Disks and Snapshots
    • Zones
  • Actions (Verbs):
    • GET, POST (create) and DELETE
    • Custom ‘verbs’ for updates
  • Auth via OAuth2

Clients and Libraries

  • gcutil: command line utility
  • Web UI: Built on GAE
  • Libraries
  • Partners and ecosystem

Projects

  • Container for all resources
  • Team Membership
  • Group ownership
  • Billing

Instances: Linux VMs

  • Root access
  • validated locked down kernel
  • Stock Images: Based on Ubuntu, CentOS
  • Useful utilities preinstalled

Instances: Machine Types

  • Intel Sandy Bridge
  • 1, 2, 4 and 8 virtual CPUs
    • A virtual CPU is a hyperthread
    • Smaller types coming
  • 3.75GB RAM per virtual CPU
  • Over 420GB local disk per CPU
    • Dedicated spindles on -4 and -8
  • New Performance Metric
    • GCEU: Google Compute Engine Unit
    • 2.75 GCEUs per virtual CPU

Instances: Under the hood

  • Kernel Virtual Machines
    • Linux is the hypervisor
    • Virtualized, non-virtualized run side by side
    • Worked closely with Red Hat
  • Linux cgroups
    • Resource isolation
    • Public Linux feature driven by Google kernel engineering

Networking: Private Virtual Network

  • Isolated networks per project
  • Private IPv4 space (RFC 1918)
  • IP Level (Layer 3) network
  • Flat across geographical regions
  • Internal facing DNS
    • VM name = DNS name

Networking: Internet Access

  • External IPs
    • Reserved, ephemeral or none
    • Not tied to region/zone
    • Dynamic attach/detach
  • 1-to-1 NAT
  • Built in firewall system
  • Global network footprint
  • Limitations
    • Outgoing SMTP blocked
    • UDP, TCP, ICMP only

Storage: Persistent Disk

  • Fast, consistent performance
  • Provisioned via API
  • Local to a zone
  • R/W with single instance
  • R/O with multiple instances
  • Encrypted at rest

Storage: Ephemeral Disk

  • Currently used for booting all instances
  • Lives and dies with instance
  • Large 'extended' devices
  • Dedicated spindles (4 vCPU+)
  • Encrypted at rest

Storage: Google Cloud Storage

  • Internet object store
  • Global API based access
  • Great for getting data in and out
  • Frictionless access with service accounts

Locality

  • Region: geography and routing
  • Zone: fault isolation
  • 3 US Zones in limited preview

MapR Hadoop Terasort Record Attempt

World Record MapR on Google Compute Engine
Time to sort 1:02 1:20
Number of Servers 1,460 1,256
Number of Cores 11,680 5,024
Time to build cluster Months Minutes
Capital Expenditure $5,840,000 $728 (per hour, could run 45 times!)

Demo Time!

Demo: Zoomable Fractals

Exploring Compute Engine

Getting the most from Google Compute Engine

Service Accounts

Frictionless Access to Google APIs

  • Synthetic identity for VMs and code
  • Google Compute calling Google APIs
    • Examples: Cloud Storage, App Engine task queue API
  • App Engine calling Compute Engine API
    • Use App Engine as 'orchestrator'
    • Build your own customized dashboard and control logic

Service Accounts

Google Compute Engine calling Google Cloud Storage

me@workstation$ gcutil addinstance sa-example --service_account_scopes=storage-rw
me@workstation$ gcutil ssh sa-example
[snip]
me@sa-example$ gsutil mb gs://unique-bucket-name
Creating gs://unique-bucket-name/...
No configuration or passwords required!

Instance Metadata

Parameters for VMs

  • Dictionary of key/value pairs
  • Set from the API, read by the Instance
  • Accessible at metadata server (http://metadata/...)
  • Useful for small amounts of configuration data
  • Project level metadata inherited by all instances

Instance Metadata

me@workstation$ gcutil addinstance metadata-example \
  --metadata=role:master --metadata_from_file=config:config.txt
me@workstation$ gcutil ssh metadata-example
[...snip...]
me@metadata-example$ curl http://metadata/0.1/meta-data/attributes/role
master
me@metadata-example$ curl http://metadata/0.1/meta-data/attributes/config
[...file content...]

Start Up Scripts

Simple Bootstrapping

  • Builds on Metadata
  • Equivalent to rc.local
  • Example Usage:
    • Install packages, start services
    • Use Google Cloud Storage to grab data, code and binaries
  • Bootstrap other management systems

Start Up Scripts

me@workstation$ cat render-stuff.sh
#! /bin/bash                            
apt-get install -y contextfree
cfdg -s 10000 /usr/share/doc/contextfree/examples/sierpinski.cfdg /tmp/out.png
gsutil cp -a public-read /tmp/out.png gs://contextfree-examples/sierpinski.png

me@workstation$ gcutil addinstance start-me-up \
  --metadata_from_file=startup-script:setup-my-instance.sh \
  --service_account_scopes=storage-rw
me@workstation$ gcutil ssh sa-example
[...snip...]
me@sa-example$ tail -f /var/log/google.log

Can I play too?

Limited Preview

  • We're growing the service carefully to make sure build the best product possible
  • We're focusing on computation and/or I/O intensive batch workloads
  • SLA and support are available now for commercial customers
  • You can apply for program at http://cloud.google.com
  • Come see me after the talk if you have a great application for Compute Engine

Where to learn more