<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>PiCloud Blog</title>
	<atom:link href="http://blog.picloud.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.picloud.com</link>
	<description>Cloud Computing. Simplified.</description>
	<lastBuildDate>Thu, 02 May 2013 23:40:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Using AUFS and LXC to Manage Dependencies at Cloud Scale</title>
		<link>http://blog.picloud.com/2013/05/01/using-aufs-and-lxc-to-manage-dependencies-at-cloud-scale/</link>
		<comments>http://blog.picloud.com/2013/05/01/using-aufs-and-lxc-to-manage-dependencies-at-cloud-scale/#comments</comments>
		<pubDate>Thu, 02 May 2013 05:32:24 +0000</pubDate>
		<dc:creator>Ken Park</dc:creator>
				<category><![CDATA[How It Works]]></category>
		<category><![CDATA[environment]]></category>

		<guid isPermaLink="false">http://blog.picloud.com/?p=1898</guid>
		<description><![CDATA[At PiCloud we strive to be the easiest service out-of-the-box, yet versatile enough to be used for the most complex applications. This ideal was especially challenging to realize when expanding our compute platform beyond Python-only modules, to support the whole gamut of non-Python software. How could we extend the same ease-of-use that our cloud library [...]]]></description>
			<content:encoded><![CDATA[<style type="text/css">
    .syntaxhighlighter { margin: 1.5em 0 !important; }
    h2.section { margin-top: 30px; }
    h3 { margin-top: 10px; }
</style>
<p><img style="float:right; height:150px; margin: 10px 10px 20px 20px;" src="https://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/environment_nutshell_panel.png" alt="Environments in a Nutshell" /></p>
<p>At PiCloud we strive to be the easiest service out-of-the-box, yet versatile enough to be used for the most complex applications. This ideal was especially challenging to realize when expanding our compute platform beyond Python-only modules, to support the whole gamut of non-Python software.  How could we extend the same ease-of-use that our cloud library offers to the complex world of software dependencies?</p>
<p>In this blog post, I&#8217;ll explore how we&#8217;ve harnessed three different technologies to achieve our goal of a highly-scalable software dependency manager. We call it Environments, and for many PaaS users, it&#8217;s what sets us apart.</p>
<h2 class="section">Where We Were</h2>
<p>PiCloud originated as a Python-only platform. In that simplified setup, our <a href="http://docs.picloud.com/primer.html#primer-automagic-dependency-transfer">Dependency Analyzer</a> could identify and ship all pure-Python dependencies at job submission time. Why only pure-Python?  Anything that requires compilation cannot be expected to work when shipped to a computer with a different architecture. Unfortunately, for performance purposes, many Python libraries are actually compiled <a href="http://docs.python.org/2/extending/extending.html">C-extensions</a>, and this meant if you used a Python library with C-extensions, our compute nodes needed to have that library pre-installed. Not surprisingly, it didn&#8217;t take long before this limitation became an issue for some of our users.</p>
<h2 class="section">Where We Wanted To Be</h2>
<p>When the time came to revamp our dependency manager, we sought a generic solution that would work for Python and non-Python software alike. We boiled our requirements for the new dependency management down to the following:</p>
<ul style="margin-left: 2em;">
<li><b>Simplicity.</b> It must be no more difficult than what a user would do on their own machine. Importantly, it should not require users to learn a new tool or configuration language.</li>
<li><b>Powerful.</b> There should be minimal limitations on what packages or programs a user can install. Such flexibility necessitates that escalated privileges (for dependency installation purposes) are given in controlled situations.</li>
<li><b>Low Overhead.</b> It must not introduce significant overhead to job processing. Among other things, this means satisfying a user&#8217;s dependency must not require booting a new machine or rebooting an existing one.</li>
<li><b>Parallelizable.</b> The value of PiCloud is giving users the freedom to use as many cores as needed to accomplish their tasks. Deploying dependencies should not hinder parallelization across many machines.</li>
</ul>
<h2 class="section">How We Got There</h2>
<h3>Filesystem Virtualization</h3>
<p>Fundamentally, satisfying the dependencies of a program means ensuring that the appropriate files can be found in the proper places in the filesystem.  We can then rephrase the role of the dependency manager as making sure a job is run in whatever filesystem environment it needs for proper function.</p>
<p>We took inspiration from the Python tool <a href="http://www.virtualenv.org/en/latest/">virtualenv</a>, which lets you create, on one machine, multiple, isolated Python environments, each with its own packages and settings. In other words, it virtualizes those portions of the filesystem relevant to Python. Generalizing this concept, what we wanted was a virtualenv for the entire filesystem.</p>
<p>On *nix machines, the simplest way to virtualize the filesystem is through the <a href="http://en.wikipedia.org/wiki/Chroot">chroot</a> program, which allows you to set any point in your filesystem hierarchy as the new filesystem &#8220;root&#8221;. Unfortunately, chroot lacks mechanisms for isolation and resource management.  At the other end of the virtualization spectrum is a <a href="http://en.wikipedia.org/wiki/Virtual_machine">virtual machine</a>, which gives you full control and isolation but comes with the huge overhead of starting and running the VM.  However, in between the simple chroot utility and full VMs are <a href="http://en.wikipedia.org/wiki/Operating_system-level_virtualization">container technologies</a> that share the host&#8217;s kernel but have the capacity to virtualize most everything else.  Combined with other recent Linux kernel developments, like cgroups and namespaces, container technologies provide resource management and  isolation, respectively.  And starting a container is fast, because there is no operating system to boot up &#8211; the host&#8217;s kernel is already up and running. We ended up choosing <a href="http://lxc.sourceforge.net/">LXC</a> (<b>L</b>inu<b>X</b> <b>C</b>ontainer) as our container technology, because it has been mainlined into the Linux kernel.</p>
<p>Our new dependency management system was beginning to take shape. A user would start with an LXC-bootable base filesystem, and install whatever programs and files their jobs need. We would store this modified filesystem, which we’ll call an Environment from this point on. When the user runs a job, we would start an LXC container using their Environment as the filesystem. The next question is, how do we store and send these big Linux filesystems across the network to compute nodes?</p>
<h3>Union Mount System</h3>
<p>What we realized was that saving the entire Environment was wasteful. In practice, Environments would be mostly the same, with only a relatively small number of files added in or changed. Theoretically, given the base filesystem the Environment was built on, we only need the changes the user made on top of this base to recreate the full Environment. What would enable us to save just this difference, rather than the whole Environment?</p>
<p>The answer is <a href="http://en.wikipedia.org/wiki/Union_mount">union mounts</a>.  A union mount allows you to stack multiple filesystems and present a unioned view at a different mount point. It is the technology behind many <a href="http://en.wikipedia.org/wiki/Live_CD">Live CDs</a>. A Live CD contains a bootable operating system but is not typically writable. Hence, upon bootup from a Live CD, the system will create a temporary scratch space in RAM, then union mount this RAMFS on top of the filesystem of the CD.  After chrooting into this unioned mount, the user is presented with a machine seemingly running off a read-only CD, but giving you the ability to tinker with and change the operating system files as you try it out. And due to the magic of the union mount, all changes are being written to the RAM filesystem, even when modifying files from the CD.</p>
<p>Several union filesystem tools exist, but after some research, we settled on <a href="http://aufs.sourceforge.net/">AUFS</a> (<b>A</b>nother <b>U</b>nion <b>F</b>ile<b>S</b>ystem), a nifty module created and maintained by Junjiro Akajima.  We chose AUFS, because it is relatively performant and reliable, and we have been impressed with Junjiro&#8217;s responsiveness to support requests.</p>
<p>So, how much does AUFS help? On PiCloud, the average size of Environment modifications is around 220 MB.  Given a base filesystem size of around 2.5 GB (which includes many pre-installed libraries for convenience and performance), this leads to roughly a 12-fold savings in terms of storage and network transfer.</p>
<h2 class="section">Environment In Action</h2>
<p>So, putting all that together, here&#8217;s how PiCloud Environments work:</p>
<ol>
<li>PiCloud user requests creation of a new Environment through our <a href="http://docs.picloud.com/environment.html#create-a-new-environment">web UI</a> or <a href="http://docs.picloud.com/environment.html#command-line-interface">client CLI</a>.</li>
<li>PiCloud launches a machine the user can log into, that is actually an AUFS stack of an empty filesystem (read-writable) on top of a base Environment (read-only), much like a Live CD.</li>
<li>User logs into the &#8220;setup server&#8221; and installs dependencies as they would on their own machine.</li>
</ol>
<p><img style="display: block; margin: 30px auto;" src="https://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/environment_creation_panel.png" alt="Environment Creation" /></p>
<ol start="4">
<li>When the user is done setting up the Environment and requests to save it, user changes are sanitized for security and saved into our distributed file system.</li>
</ol>
<p><img style="display: block; margin: 30px auto;" src="https://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/environment_save_panel.png" alt="Environment Saving" /></p>
<ol start="5">
<li>User can now submit jobs specifying it should be run in the custom Environment. In Python, Environment is specified with the &#8220;_env&#8221; keyword, and in bash, the &#8220;-e&#8221; flag.</li>
<li>PiCloud recreates the AUFS stack on our compute nodes, and runs the user&#8217;s jobs in LXC containers.</li>
</ol>
<p><img style="display: block; margin: 30px auto;" src="https://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/environment_run_panel.png" alt="Using Environment" /></p>
<h2 class="section">DFS Features That Improve Performance</h2>
<p>As mentioned above, AUFS reduces the size of an Environment by roughly 12-fold. In practice, we end up doing much better than that thanks to our distributed file system (DFS), which has two important properties:</p>
<ol>
<li>Files from an environment are transferred at the block level to the compute node on-demand (only when a program accesses them).</li>
<li>Transferred files are cached on the compute node automatically.</li>
</ol>
<p>To understand the benefits of these properties, consider the Matlab Compiler Runtime (MCR), which enables the stand-alone execution of compiled Matlab applications. When a user installs the MCR in an Environment, close to a gigabyte of files are added. But, for the typical MCR-compiled application, a small minority (< 10%) of MCR is accessed at runtime, significantly reducing the data transferred. And, if a subsequent job runs on the same compute node, and needs the same MCR Environment, it will be available without any data fetching over the network.</p>
<h2 class="section">Check Out the New Features</h2>
<p>If you&#8217;re currently a PiCloud user and have wondered how Environments worked, I hope this shed some light into the black box.</p>
<p>If you haven&#8217;t already, you should check out our recent updates to the Environment system. We&#8217;ve redone the <a href="http://docs.picloud.com/environment.html#create-a-new-environment">web UI</a> and added <a href="http://docs.picloud.com/environment.html#command-line-interface">client-side CLI</a> support for managing your Environments. Also, we are excited about the new <a href="http://docs.picloud.com/environment.html#sharing">sharing feature</a> that lets users share Environments they&#8217;ve created with colleagues and the general <a href="http://www.picloud.com/platform/public_environments/">public</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.picloud.com/2013/05/01/using-aufs-and-lxc-to-manage-dependencies-at-cloud-scale/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introducing Queues &#8212; Creating a Pipeline in the Cloud</title>
		<link>http://blog.picloud.com/2013/04/03/introducing-queues-creating-a-pipeline-in-the-cloud/</link>
		<comments>http://blog.picloud.com/2013/04/03/introducing-queues-creating-a-pipeline-in-the-cloud/#comments</comments>
		<pubDate>Wed, 03 Apr 2013 07:42:01 +0000</pubDate>
		<dc:creator>Ken Elkabany</dc:creator>
				<category><![CDATA[What's New]]></category>
		<category><![CDATA[image pipeline]]></category>
		<category><![CDATA[new features]]></category>
		<category><![CDATA[queues]]></category>
		<category><![CDATA[sepia]]></category>

		<guid isPermaLink="false">http://blog.picloud.com/?p=1518</guid>
		<description><![CDATA[Queues provide an interface for Dataflow Programming that is built on top of our job system. While a distributed queue data structure with push, pop, and ack capabilities is provided, the key benefit is the ability to attach a handler to a queue for scalable processing of a queue’s messages. The handler in turn can [...]]]></description>
			<content:encoded><![CDATA[<style type="text/css">
    .syntaxhighlighter { margin: 1.5em 0 !important; }
</style>
<p>Queues provide an interface for <a href="http://en.wikipedia.org/wiki/Dataflow_programming">Dataflow Programming</a> that is built on top of our job system.</p>
<p>While a distributed queue data structure with <a href="http://docs.picloud.com/queue.html#pushing-messages">push</a>, <a href="http://docs.picloud.com/queue.html#popping-messages">pop</a>, and <a href="http://docs.picloud.com/queue.html#acknowledgements">ack</a> capabilities is provided, the key benefit is the ability to <b>attach a handler</b> to a queue for scalable processing of a queue’s messages. The handler in turn can feed its output  messages to other queues.</p>
<p>In other words, you&#8217;re probably used to the queue data structure:</p>
<p><img style="display: block; margin: 30px auto;" src="http://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/queue_single.png" alt="Overview of Queue" /></p>
<p>Our queues link the data structure with a message handler, <code>f</code>, which we call an attachment:</p>
<p><img style="display: block; margin: 30px auto;" src="http://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/queue_overview.png" alt="Overview of Queue" /></p>
<p><code>f(msg)</code> is any Python-function you define which takes in 1 argument at a time, a message, coming from the input queue. Its return value is pushed into the output queue.</p>
<p><b>By the end of this post, you&#8217;ll be able to</b>:</p>
<ul style="margin-left: 2em;">
<li>Create a distributed, fault-tolerant pipeline of queues and processors.</li>
<li>Scale each component to achieve a throughput of thousands of messages per second.</li>
<li>See it all through automatically-generated visualizations.</li>
<li>Pay only when you have messages in your pipeline.</li>
<li>Do it all, with only Python, and not a single server.</li>
</ul>
<p>If you&#8217;re a developer who just wants to RTFM, see our <a href="http://docs.picloud.com/queue.html">documentation</a>.</p>
<h2 style="margin-top: 40px;">Diving In</h2>
<p>Let&#8217;s see queues in action. You&#8217;ll need to have the latest <a href="https://www.picloud.com/accounts/">client installed</a>, released today (4/3).</p>
<p>First, let&#8217;s push and pop from a queue in your console to get comfortable:</p>
<pre class="brush: python; gutter: false; title: ; notranslate">
&gt;&gt;&gt; # import our library
&gt;&gt;&gt; import cloud
&gt;&gt;&gt; q = cloud.queue.get('numbers')
&gt;&gt;&gt; # adds 3 messages to the queue
&gt;&gt;&gt; q.push([1,2,3])
&gt;&gt;&gt; # pops up to 10 messages
&gt;&gt;&gt; q.pop()
[2, 1, 3]
</pre>
<p>Note that the queue did not dequeue in perfect-FIFO order; this is expected. Our queues are designed for high-throughput, high-parallelism, with minimal queue backlog, making guaranteed FIFO behavior less relevant.</p>
<h3 style="margin-top: 30px;">Attach</h3>
<p>Now let&#8217;s <a href="http://docs.picloud.com/queue.html#attaching-a-message-handler">attach</a> a function that increments all numbers in the input queue:</p>
<pre class="brush: python; title: ; notranslate">
# declare the input and output queue
input_q = cloud.queue.get('numbers')
output_q = cloud.queue.get('bigger-numbers')

# create handler function
def increment(x):
    return x + 1

# attach the handler to the queue
input_q.attach(increment, output_q)
</pre>
<p>How did that work? We&#8217;re using the same <a href="http://docs.picloud.com/primer.html#automagic-dependency-transfer">automagic dependency transfer</a> we use in our job system to send your <code>increment</code> function to us along with any dependencies it might have.</p>
<h3 style="margin-top: 30px;">Visualization</h3>
<p>
From the <a href="https://www.picloud.com/accounts/queue/">Queues Dashboard</a>, we can see an auto-generated layout of our pipeline based on the attachment we made:
</p>
<p><img style="display: block; margin: 0px auto;" src="http://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/queue_numbers_layout.png" alt="Overview of Queue" /></p>
<h3 style="margin-top: 30px;">Message Processing by Attachment</h3>
<p>Let&#8217;s increment 1,000 numbers:</p>
<pre class="brush: python; title: ; notranslate">
# range(1000) is a list of numbers from 0 to 999
input_q.push(range(1000))
</pre>
<p>In the background, our systems have created a job (visible in your <a href="https://www.picloud.com/accounts/jobs/">Job Dashboard</a>) that applies the <code>increment</code> function to every number in the <code>numbers</code> queue, and outputs the result to the <code>bigger-numbers</code> queue. If you&#8217;re unfamiliar with our job framework, don&#8217;t worry, queues abstract away most of the details. But, if you&#8217;re interested, see our <a href="http://docs.picloud.com/primer.html">Primer</a>.</p>
<p>After ~10 seconds, you&#8217;ll see that all messages have been processed. On the live throughput chart, a single point represents the average throughput during a 10s window of time; the interpolation lines are cosmetic. Below, the single point at 100 msgs/second represents that 1000 messages were processed during the window. In actuality, we got about ~150 msgs/second for 7 seconds.</p>
<p><img style="display: block; margin: 0px auto;" src="http://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/queue_realtime.png" alt="Overview of Queue" /></p>
<p>As a sanity check, we can also check the size of the queues:</p>
<pre class="brush: python; gutter: false; title: ; notranslate">
&gt;&gt;&gt; input_q.count()
0
&gt;&gt;&gt; output_q.count()
1000
</pre>
<h3 style="margin-top: 30px;">Increasing Throughput</h3>
<p>What if you want to increase your throughput past 150 msgs/second? Set <a href="http://docs.picloud.com/queue.html#max-parallel-jobs"><code>max_parallel_jobs</code></a> for the attachment. You can do this from the <a href="https://www.picloud.com/accounts/queue/">Queue Dashboard</a> or from the <code>attach</code> call:</p>
<pre class="brush: python; title: ; notranslate">
# attach the handler to the queue
input_q.attach(increment, output_q, _max_parallel_jobs=5)
</pre>
<p>Now, assuming there are messages in the queue, you&#8217;ll see a throughput of 750 msgs/second!</p>
<p><img style="display: block; margin: 0px auto;" src="http://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/queue_live_max_parallel_5.png" alt="Overview of Queue" /></p>
<p>If you click on &#8220;view&#8221; jobs, you can see a list of the five jobs attached to the queue. For those familiar with our job processing framework, you can now see that our Queue processors are built on top of jobs. </p>
<p><b>The takeaway: you just increased your throughput 5-fold by changing one number, and without any server management or scaling.</b></p>
<h2 style="margin-top: 40px;">Creating an Image Processing Pipeline</h2>
<p>To showcase the power of queues, we&#8217;re going to create the following pipeline:</p>
<p><img style="display: block; margin: 30px auto;" src="http://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/queue_img_pipeline.png" alt="Overview of Queue" /></p>
<div style="text-align: center; font-size: 85%; margin-bottom: 10px;">
This is an <b>auto-generated visualization</b> available in the Queues Dashboard. Rectangles are queues; circles are attachments.
</div>
<p>The inputs to this pipeline are URLs, which should be pushed to the img-urls queue. The pipeline downloads the image, and does the following image operations:</p>
<ul>
<li>Resize to 150px (thumbnail)</li>
<li>Resize to 400px (medium size)</li>
<li>Apply a <a href="http://en.wikipedia.org/wiki/Sepia_(color)">Sepia-tone filter</a> to the medium size image</li>
</ul>
<p>For each generated image, an HTTP callback is made to an endpoint of your choice. Note that separating the three image operations into three attachments with different input queues isn&#8217;t the most efficient (you&#8217;d probably want to combine them into one operation), but it&#8217;s done for illustrative purposes.</p>
<p>You can download this pipeline from our <a href="https://github.com/picloud/basic-examples">repository</a>: <code style="font-size: 10px;">basic-example/queue/imgpipeline/pipeline.py</code></p>
<h3 style="margin-top: 30px;">Step 1: Scraping Images</h3>
<p>We&#8217;re going to use the following function as our attachment:</p>
<pre class="brush: python; title: ; notranslate">
import os
import Image
import urllib2
from StringIO import StringIO

import cloud

def scrape_to_bucket(target):
    &quot;&quot;&quot;Downloads image from url, and saves to bucket. *target* should
    be a dict with keys id (image id), and url (location of image).

    Returns a dict with keys id (image id), path (obj key), and
    transforms (empty list).&quot;&quot;&quot;

    id = target['id']
    url = target['url']

    # path to save image in bucket
    obj_path = 'imgs/{id}/original.png'.format(id=id)

    # extract extension from url
    ext = os.path.splitext(url)[-1]

    # open connection to image
    u = urllib2.urlopen(url)

    # if image isn't png, convert it to png
    if ext.lower() != 'png':
        i = Image.open(StringIO(u.read()))
        data = StringIO()
        i.save(data, 'png')
        data = data.getvalue()
    else:
        data = u.read()

    u.close()

    # add image to bucket
    cloud.bucket.putf(data, obj_path)

    return {'id': id,
            'path': obj_path,
            'transforms': []}
</pre>
<p>If you&#8217;re unfamiliar with <a href="http://docs.picloud.com/bucket.html">Buckets</a>, just think of them as a key->value object store. We use it here to conveniently retrieve and store objects to and from memory. However, buckets are not necessary, and are completely unrelated to queues. You can modify <code>scrape_to_bucket()</code> so it saves images into your own Amazon S3 account, database, or anywhere else.</p>
<p>Here&#8217;s a sample input message we&#8217;ll use to demonstrate each operation:</p>
<pre class="brush: python; gutter: false; title: ; notranslate">
{
 'id': 1,
 'url': 'http://s3.amazonaws.com/pi-user-buckets/vFvZxWVSiHeeB20rAZwnS66OLRjeU8MU4Igf2Kyl/blog/Obama_family_portrait.jpg'
}
</pre>
<p>The url points to an image of the Obama family:<br />
<img style="display: block; width: 600px; margin: 0px auto;" src="http://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/Obama_family_portrait.jpg" alt="Obama Family Full Size" /></p>
<p>Per the source code, the above image will be saved in your bucket. The output message pushed to the <code>thumbnail</code> and <code>medium</code> queues will be:</p>
<pre class="brush: python; gutter: false; title: ; notranslate">
{
 'id': 1,
 'path': 'imgs/1/original.png',
 'transforms': []
}
</pre>
<p>You can verify this works by simply running the function on your own machine:</p>
<pre class="brush: python; gutter: false; title: ; notranslate">
&gt;&gt;&gt; msg = {'id': 1,
'url': 'http://s3.amazonaws.com/pi-user-buckets/vFvZxWVSiHeeB20rAZwnS66OLRjeU8MU4Igf2Kyl/blog/Obama_family_portrait.jpg'}
&gt;&gt;&gt; scrape_to_bucket(msg)
{'id': 1, 'path': 'imgs/1/original.png', 'transforms': []}
</pre>
<p>This is another advantage of queues. Because your function doesn&#8217;t need to be modified in any way to be an attachment, you can just as easily test it locally, as you can on the cloud.</p>
<h4>Handling Exceptions</h4>
<p>What if the message handler throws an Exception? Maybe the URL was temporarily unavailable, but you&#8217;d like to retry it in 60 seconds. Using <code>retry_on</code>, <code>max_retries</code>, and <code>retry_delay</code>, you can specify which <a href="http://docs.picloud.com/queue.html#retrying">Exceptions you&#8217;d like to retry, the number of times to retry, and the amount of time between each attempt</a>.</p>
<div style="margin: 30px 0px;">
<pre class="brush: python; title: ; notranslate">
import urllib2

q = cloud.queue.get('img-urls')
output_qs = cloud.queue.get('thumbnail'), cloud.queue.get('medium')
bad_urls_q = cloud.queue.get('bad-urls')

q.attach(scrape_to_bucket,
         output_qs,
         retry_on=[urllib2.HTTPError, urllib2.URLError],
         max_retries=3,
         retry_delay=60,
         on_error={Exception: {'queue': bad_urls_q}})
</pre>
</div>
<p>Using the <a href="http://docs.picloud.com/queue.html#exceptions"><code>on_error</code></a> keyword, the <code>bad-urls</code> queue will be sent messages that raised non-retryable Exceptions, and messages that failed even after three retries. Error messages generated by <code>on_error</code> include the triggered exception, and associated traceback.</p>
<p>You can confirm that your attachment has been setup as intended with the visualization.</p>
<p><img style="display: block; margin: 0px auto;" src="http://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/queue_scrape.png" alt="Scrape Attachment" /></p>
<h4>Tweaking Performance with Multi-threading</h4>
<p>Because this scraping attachment spends most of its time waiting for network data transfer, and is thus I/O bound, it won&#8217;t be effectively utilizing the core it&#8217;s running on. The workaround is to run multiple, identical attachments in the job.</p>
<p>To do this, set the <a href="http://docs.picloud.com/queue.html#readers-per-job"><code>readers_per_job</code></a> keyword to the number of simultaneous threads you want running in a job. The default value is 1. Do not mistake this with <code>max_parallel_jobs</code>, which controls the number of jobs that may be running. For this example, we&#8217;ll set the value to 4.</p>
<div style="margin: 30px 0px;">
<pre class="brush: python; title: ; notranslate">
import urllib2

q = cloud.queue.get('img-urls')
output_qs = cloud.queue.get('thumbnail'), cloud.queue.get('medium')
bad_urls_q = cloud.queue.get('bad-urls')

q.attach(scrape_to_bucket,
         output_qs,
         retry_on=[urllib2.HTTPError, urllib2.URLError],
         max_retries=3,
         retry_delay=60,
         on_error={Exception: {'queue': bad_urls_q}},
         readers_per_job=4)
</pre>
</div>
<h3 style="margin-top: 30px;">Step 2: Resizing Images</h3>
<p>We&#8217;re going to attach handlers to the <code>thumbnail</code> queue, and <code>medium</code> queue to resize images to 150px, and 400px, respectively. To ease the storage and retrieval of images as PIL Image objects, we&#8217;re going to use a custom-defined <code>ImageOperation</code> class.</p>
<pre class="brush: python; title: ; notranslate">

class ImageOperation(object):
    &quot;&quot;&quot;Base class for Message Handlers in Image Pipeline.

    Retrieves images from bucket, performs in-memory manipulation
    with PIL object, stores result back in bucket, and then
    outputs message with additional transform listed.

    Override operation() for custom operation.&quot;&quot;&quot;

    name = 'identity'

    def get_image_from_bucket(self, obj_path):
        &quot;&quot;&quot;Given *obj_path* in bucket, returns PIL Image object&quot;&quot;&quot;

        # get image data as string of raw bytes
        data = cloud.bucket.getf(obj_path).read()

        return Image.open(StringIO(data))

    def put_image_in_bucket(self, img, obj_path):
        &quot;&quot;&quot;Given PIL image *img*, saves it to *obj_path* in bucket&quot;&quot;&quot;

        output_data = StringIO()

        # write raw image bytes to StringIO
        img.save(output_data, 'png')

        # store the image file in your bucket
        cloud.bucket.putf(output_data.getvalue(), obj_path)

    def add_modifier_to_key(self, obj_path):
        &quot;&quot;&quot;Returns new *obj_path* that includes name of transform&quot;&quot;&quot;

        obj_key, obj_ext = os.path.splitext(obj_path)
        obj_path = '{key}.{name}.png'.format(key=obj_key,
                                             name=self.name)
        return obj_path

    def message_handler(self, msg):
        &quot;&quot;&quot;Entry point for message handling. Do not override.&quot;&quot;&quot;

        img = self.get_image_from_bucket(msg['path'])

        # apply image operation
        new_img = self.operation(img)

        msg['path'] = self.add_modifier_to_key(msg['path'])
        msg['transforms'].append(self.name)

        self.put_image_in_bucket(new_img, msg['path'])

        return msg

    def operation(self, img):
        &quot;&quot;&quot;Method to replace for custom operation&quot;&quot;&quot;

        return img
</pre>
<p>Since we support <a href="http://docs.picloud.com/queue.html#maintaining-state">instances as message handlers</a>, we&#8217;ll subclass <code>ImageOperation</code> to make two message handlers: <code>ThumbnailOperation</code> and <code>MediumSizeOperation</code>.</p>
<pre class="brush: python; title: ; notranslate">
class ImageThumbnail(ImageOperation):

    name = 'thumb'

    def operation(self, img):
        &quot;&quot;&quot;Returns a thumbnail of the *img*&quot;&quot;&quot;

        img.thumbnail((150, 150), Image.ANTIALIAS)
        return img

class ImageMediumSize(ImageOperation):

    name = 'med'

    def operation(self, img):
        &quot;&quot;&quot;Returns a 400px version of the *img*&quot;&quot;&quot;

        img.thumbnail((400, 400), Image.ANTIALIAS)
        return img
</pre>
<p>Now we&#8217;ll attach instances of these classes to their respective input queues.</p>
<pre class="brush: python; title: ; notranslate">
thumbnail_q = cloud.queue.get('thumbnail')
thumbnail_q.attach(ImageThumbnail(), [callback_q])

medium_q = cloud.queue.get('medium')
medium_q.attach(ImageMediumSize(), [sepia_q, callback_q])
</pre>
<p>If you pushed the example message of the Obama family to <code>img-urls</code>, then there are already messages ready for the <code>thumbnail</code> and <code>medium</code> queue. Once processed, these two objects will appear in your bucket:</p>
<p><code>imgs/1/original.thumb.png</code></p>
<p><img style="display: block; margin: 10px auto;" src="https://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/queue_original.thumb.png" alt="Obama Family Thumbnail" /></p>
<p><code>imgs/1/original.med.png</code></p>
<p><img style="display: block; margin: 10px auto;" src="https://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/queue_original.med.png" alt="Obama Family Medium Size" /></p>
<h3 style="margin-top: 30px;">Step 3: Sepia Tone</h3>
<p>The final image operation is a sepia-tone filter after the medium-size downscale operation.</p>
<pre class="brush: python; title: ; notranslate">
class ImageSepia(ImageOperation):
    &quot;&quot;&quot;Applies Sepia Filter.
    Based on: http://effbot.org/zone/pil-sepia.htm&quot;&quot;&quot;

    name = 'sepia'

    def __init__(self):
        self.sepia_palette = self.make_linear_ramp()

    @staticmethod
    def make_linear_ramp():
        &quot;&quot;&quot;Generate a palette in a format acceptable for `putpalette`,
        which expects [r,g,b,r,g,b,...]&quot;&quot;&quot;

        ramp = []
        r, g, b = 255, 220, 162 

        for i in range(255):
            ramp.extend((r*i/255, g*i/255, b*i/255))

        return ramp

    def operation(self, img):
        &quot;&quot;&quot;Returns a version of the *img* with Sepia applied
        for a vintage look.&quot;&quot;&quot;

        # convert to grayscale
        orig_mode = img.mode
        if orig_mode != &quot;L&quot;:
            img = img.convert(&quot;L&quot;)

        img = ImageOps.autocontrast(img)

        # apply sepia palette
        img.putpalette(self.sepia_palette)

        # convert back to its original mode
        if orig_mode != &quot;L&quot;:
            img = img.convert(orig_mode)

        return img
</pre>
<p>Attaching:</p>
<pre class="brush: python; title: ; notranslate">
sepia_q = cloud.queue.get('sepia')
sepia_q.attach(ImageSepia(), [callback_q])
</pre>
<p>Once again, if you pushed the sample message, there should already be a message ready in the <code>sepia</code> queue. The image outputted to your bucket is:</p>
<p><img style="display: block; margin: 10px auto;" src="https://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/queue_original.med.sepia.png" alt="Obama Family Sepia" /></p>
<h3 style="margin-top: 30px;">Step 4: Callback</h3>
<p>Each image operation outputs a message to the <code>callback</code> queue. You probably want your callback handler to:</p>
<ul>
<li><a href="http://docs.picloud.com/customstorage.html#connecting-to-your-own-mysql">Write to your database</a> that the image is ready</li>
<li>Make a POST request to your website for instant notification</li>
<li>Store the image somewhere else</li>
</ul>
<p>For simplicity, we&#8217;ll have the callback handler <a href="http://docs.picloud.com/bucket.html#making-an-object-publicly-accessible-via-http">set the image object in <a href="">your bucket</a> as public</a>, so that it&#8217;s accessible by anyone. Based on the above examples, the following should be straightforward:</p>
<pre class="brush: python; title: ; notranslate">
def callback(msg):
    print msg
    cloud.bucket.make_public(msg['path'])

callback_q = cloud.queue.get('callback')
callback_q.attach(callback)
</pre>
<h4>Debugging Attachments</h4>
<p>You may have noticed that in <code>callback(msg)</code>, we did a <code>print msg</code>. How would you see standard output for an attachment? The same way you would for a job&mdash;by clicking on it in the <a href="https://www.picloud.com/accounts/jobs/">Jobs Dashboard</a>. To know what jobs are running your attachments, click &#8220;view&#8221; jobs from the <a href="https://www.picloud.com/accounts/queue/">Queues Dashboard</a>, which will take you to the Jobs Dashboard filtered for your attachment.</p>
<p>Using this method, you&#8217;ll get access to <a href="http://docs.picloud.com/job_mgmt_adv.html#query-job-information">all the information</a> you&#8217;re accustomed to with jobs, including a <a href="http://blog.picloud.com/2012/09/18/real-time-data-feed-for-jobs-and-more/">realtime feed of CPU, memory, and disk usage</a>.</p>
<h2 style="margin-top: 40px;">Scaling Up</h2>
<p>With just a few tweaks, I was able to get a system throughput greater than 150 images per second. I set <code>max_parallel_jobs</code> to 20 for the scraping step (10 <code>readers_per_job</code>, c2 core), and 30 for all image operation steps. Also, I set the image operation steps to use the f2 core for faster processing.</p>
<p>Here&#8217;s a screenshot of the Queue Dashboard in action as I was testing (doesn&#8217;t show max throughput). Note how the dequeue rate is able to keep up with the enqueue rate, which is precisely what we want.</p>
<p><img style="display: block; margin: 10px auto;" src="https://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/queue_medium_small.png" alt="Obama Family Sepia" /></p>
<h2 style="margin-top: 40px;">Pricing</h2>
<p>We charge for queues based on the amount of time jobs spend processing messages. Assuming you have a steady stream of messages, the maximum you&#8217;ll pay in an hour is:</p>
<div style="margin: 20px 0px; text-align: center;">
<code>max_parallel_jobs x cost per core hour</code>
</div>
<p>You can find the cost per core hour <a href="http://docs.picloud.com/queue.html#faster-message-processing">based on the core type</a> you&#8217;ve chosen from our <a href="http://www.picloud.com/pricing/#computation">pricing page</a>. <b>If your queue is empty, no jobs will be running, and you won&#8217;t pay a thing!</b></p>
<h2 style="margin-top: 40px;">Conclusion: Let Us Clean Your Pipes</h2>
<p>Letting us manage the full pipeline&mdash;the queues, <b>and</b> the processing of messages&mdash;has several advantages:</p>
<ul>
<li><b>No Servers</b>: You won&#8217;t have to configure or deploy a single server for storage or processing.</li>
<li><b>Faster Development</b>: It takes only a few lines of code to &#8220;use the cloud&#8221; for a pipeline.</li>
<li><b>Reliability</b>: Our queues and workers are distributed and replicated across multiple datacenters (AWS availability zones), and even a server failure won&#8217;t jeopardize your messages.</li>
<li><b>Scale Effortlessly</b>: Tell us how many cores you want to put to work, and we make it so.</li>
<li><b>Cut Costs</b>: You only pay for processing when there are messages. No idling servers.</li>
<li><b>Monitoring &#038; Analytics</b>: Take advantage of our queue analytics, and the same job monitoring interface that powers our standard service.</li>
</ul>
<p>If you&#8217;re ready to give it try, <a href="https://www.picloud.com/accounts/register/">sign up now</a>, and get 20 free core hours. Happy coding!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.picloud.com/2013/04/03/introducing-queues-creating-a-pipeline-in-the-cloud/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>XKCD Hash Breaking</title>
		<link>http://blog.picloud.com/2013/04/02/xkcd-hash-breaking/</link>
		<comments>http://blog.picloud.com/2013/04/02/xkcd-hash-breaking/#comments</comments>
		<pubDate>Tue, 02 Apr 2013 22:23:23 +0000</pubDate>
		<dc:creator>Josh Hawn</dc:creator>
				<category><![CDATA[Just for Fun]]></category>
		<category><![CDATA[xkcd]]></category>

		<guid isPermaLink="false">http://blog.picloud.com/?p=1856</guid>
		<description><![CDATA[The PiCloud team has been nerd-sniped by the April Fools XKCD comic Externalities which pits visitors against each other in a contest of raw computing power (and a little luck). Here at PiCloud we&#8217;ve got lots of computing power, but about how much is needed (on expectation) to win their hash breaking competition? The probability [...]]]></description>
			<content:encoded><![CDATA[<p>The PiCloud team has been <a href="http://xkcd.com/356/">nerd-sniped</a> by the April Fools XKCD comic <a href="http://xkcd.com/1193/">Externalities</a> which pits visitors against each other in a contest of raw computing power (and a little luck). Here at PiCloud we&#8217;ve got lots of computing power, but about how much is needed (on expectation) to win their hash breaking competition?</p>
<p>The probability of matching a certain number of bits in the hashed output follows a <a href="http://en.wikipedia.org/wiki/Binomial_distribution">binomial distribution</a>. The <a href="http://almamater.xkcd.com/best.csv">current leaders</a> of the contest are 389 bits off out of 1024 bits. We can calculate our chance of generating a better hash value like so:</p>
<div style="text-align: center; margin: 25px auto;">
<img src="http://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/binomial_dist_blog.gif" />
</div>
<p>This is the fraction of the distribution that is less than 389 bits off, and it&#8217;s one in 228 trillion!</p>
<p>Not to be discouraged, we wrote a program to generate random hashes and test how close they are to the <a href="http://almamater.xkcd.com/">goal</a> given by XKCD. Running it on a single f2 core, we were able to generate and check over 160,000 hashes per second. Multiplying the prior probability by this rate results in a 1 in 1.43 billion chance of finding a better hash in 1 second, but we can do much better than that.</p>
<p>How many f2 cores running in parallel would it take, on expectation, to generate a better hash within the next 6 hours? It turns out that it would take over 66,200 f2 cores running in parallel over 6 hours before we can expect to find a better hash. Unfortunately, we can&#8217;t get you 60 thousand f2 cores and the cost would be nearly $87,500 on PiCloud.</p>
<h2 style="margin-top: 30px;">Feeling Lucky?</h2>
<p>There are still over 6 hours left in the competition as of the time of this blog post and we&#8217;ve made it easy for you to get hashing using our environment set up with a skein-1024 hashing program. We&#8217;ve shared the environment publicly, so you can use it directly with no setup.</p>
<p>Here&#8217;s how you can give it a shot (You need our client library installed):</p>
<pre class="brush: bash; title: ; notranslate">
$ picloud exec -e /picloud/xkcd_skein -t f2 skein
</pre>
<p>This will run indefinitely and print to standard output the best result it has yet to come accross. You can monitor the standard output of the job from the <a href="https://www.picloud.com/accounts/jobs/">Jobs Dashboard</a> to see if you&#8217;ve found a better hash.</p>
<pre class="brush: bash; title: ; notranslate">
$ picloud exec -e /picloud/xkcd_skein -t f2 skein 389
</pre>
<p>This form of invocation will not print any output until it has come across a hash that is closer than 389 bits away from the goal.</p>
<pre class="brush: bash; title: ; notranslate">
picloud exec -e /picloud/xkcd_skein -t f2 skein 400 1000000000
</pre>
<p>This form of invocation will not print any output until it has come across a hash that is closer than 400 bits away from the goal, but will stop hashing after 1 billion attempts.</p>
<p>From the <a href="https://www.picloud.com/accounts/environment/">Environments Dashboard</a>, you can also clone our public environment, and modify the program for yourself.</p>
<p>You can run &#8220;picloud exec&#8221; as many times as you want to maximize your parallel computing power. Watch your bill, and good luck!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.picloud.com/2013/04/02/xkcd-hash-breaking/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dealing with the Inconsistent EC2 API</title>
		<link>http://blog.picloud.com/2013/02/17/dealing-with-the-inconsistent-ec2-api/</link>
		<comments>http://blog.picloud.com/2013/02/17/dealing-with-the-inconsistent-ec2-api/#comments</comments>
		<pubDate>Sun, 17 Feb 2013 20:36:38 +0000</pubDate>
		<dc:creator>Aaron Staley</dc:creator>
				<category><![CDATA[Battle Stories]]></category>
		<category><![CDATA[ec2 api]]></category>
		<category><![CDATA[ec2 bugs]]></category>

		<guid isPermaLink="false">http://blog.picloud.com/?p=1441</guid>
		<description><![CDATA[This is the second in a series of posts discussing issues when using Amazon Web Services at scale. The first was When EC2 Hardware Changes Underneath You…. At PiCloud, we’ve accumulated over 100,000 instance requests on Amazon EC2. While we know of no IaaS provider superior to Amazon, it isn&#8217;t perfect. In this post, I&#8217;ll [...]]]></description>
			<content:encoded><![CDATA[<p><i>This is the second in a series of posts discussing issues when using Amazon Web Services at scale. The first was <a href="/2013/01/08/when-ec2-hardware-changes-underneath-you/">When EC2 Hardware Changes Underneath You…</a>.</i></p>
<p>At PiCloud, we’ve accumulated over 100,000 instance requests on <a href="http://aws.amazon.com/ec2/">Amazon EC2</a>. While we know of no IaaS provider superior to Amazon, it isn&#8217;t perfect. In this post, I&#8217;ll be discussing how we&#8217;ve built our scaling systems around Amazon EC2, despite frequent data inconsistencies from its API.</p>
<h3>Background: The Scaler</h3>
<p>As users <a href="http://docs.picloud.com/primer.html#creating-a-job-from-python">create jobs</a>, we add them to our <a href="http://www.picloud.com/platform/#howitworks">job queue</a> until there is a free worker available to do the processing. We are constantly <a href="http://docs.picloud.com/realtime_cores.html#how-do-we-estimate-the-runtime-of-a-job">estimating the size</a> of this job queue to scale the number of &#8220;worker instances&#8221; we have available to perform our customers&#8217; computation. Due to fluctuations in our job queue throughout the day, our scaling system regularly requests and terminates EC2 instances.</p>
<p>Our automated scaling system, or &#8220;scaler&#8221; as we call it, runs the following algorithm several times a minute:</p>
<ul>
<li>Obtain queue size from scheduling system. Infer number of instances (servers) needed.
<li>Obtain instance state information from EC2 with the DescribeInstances API call.
<li>Compare the number of instances needed to the number of instances EC2 indicates are running, pending, etc.
<ul>
<li>RunInstances if more are needed.</li>
<li>TerminateInstances that are excessive.</li>
</ul>
</ul>
<p>(This is a simplification that doesn&#8217;t include our use of the <a href="http://aws.amazon.com/ec2/spot-instances/">EC2 spot market</a>, our inability to terminate servers running a customer&#8217;s jobs, and our optimization to only terminate servers near the end of their chargeable hour. For more information, see our <a href="http://blog.picloud.com/2012/12/05/picloud-wins-grand-prize-in-amazon-ec2-spotathon/">Grand Prize winning Spotathon Application</a>).</p>
<p>The benefit of the above algorithm is that it allows the scaler to maintain minimal internal state, making it simpler, easier to test, and more robust. Aside from the queue size calculated by our scheduler, the EC2 API essentially tracks our system state.</p>
<p>Relying on EC2 as our <a href="http://en.wikipedia.org/wiki/Single_Version_of_the_Truth">Single Version of the Truth</a> of our system state would cause us many issues, which we&#8217;ll now cover in detail.</p>
<h3>DescribeInstances Inconsistency</h3>
<p>When the scaler was first launched, and we had far fewer servers, all was well.  However, over time we noticed two odd behaviors:</p>
<ul>
<li>Sometimes, far more servers than needed were being deployed.</li>
<li>Rarely, but catastrophically, the scaler would terminate every worker server, only to immediately spawn new ones afterward!</li>
</ul>
<p>Sifting through debugging logs brought the problem to light.  After requesting a server, subsequent DescribeInstances responses would not necessarily include the newly pending server. In database terms, the EC2 API is only <a href="http://en.wikipedia.org/wiki/Eventual_consistency">eventually consistent</a>.  The stateless scaler, clueless that it had just requested a server, would keep deploying instances until they <i>finally</i> showed up in the DescribeInstances response.</p>
<p>Worse, the instances that did appear in the DescribeInstances response were not necessarily up to date.  At times, after an instance had been terminated, it would still appear as running.  The stateless scaler, clueless that it had just terminated a server, would then terminate a different server&mdash;and so forth&mdash;until EC2 finally concurred that they were terminated.</p>
<p>In the end, we had to introduce some state (a list of instances requested/terminating) to supplement the response from DescribeInstances.  While the EC2 API does not provide any upper bound on its eventual consistency (let alone document that the API is eventually consistent), we&#8217;ve found that this &#8220;override list&#8221; only needs to exist for a few minutes.</p>
<h3>CreateTags Inconsistency</h3>
<p>In keeping with the philosophy of a stateless scaler, all meta information about a given instance is stored as EC2 <a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html">tags</a>.  Our tags indicate the instance&#8217;s environment (&#8220;test&#8221;, &#8220;production&#8221;, etc.), role (&#8220;worker&#8221;, &#8220;webserver&#8221;, etc.), etc&#8230;  Rather than keeping such information in our own database, we let EC2 handle the details.</p>
<p>As we want tags to be set atomically with instance creation, any instance is created with the following API calls:</p>
<ul>
<li><i>RunInstances</i> &#8211; create the instance(s)
<li><i>CreateTags</i> &#8211; tag the just returned (pending) created instance(s)
</ul>
<p>Things worked for awhile. But at some point, we noticed the scaler was crashing with the error:</p>
<p><code style="color: red;">InvalidInstanceID.NotFound: The instance IDs ... do not exist</code></p>
<p>And yet the purportedly &#8220;not found&#8221; instances were clearly showing up in our DescribeInstances.</p>
<p>Given what we&#8217;ve learned from DescribeInstances, CreateTags, not surprisingly, also exhibits eventual consistency.  Our solution has been to exponentially back-off, giving up after some timeout, whenever the CreateTags request fails with InvalidInstanceID.  Again, it may take over a minute after RunInstances for CreateTags to work.</p>
<h3>Unavailable Meta-Data</h3>
<p>Our deployment scripts rely on EC2 <a href='http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AESDG-chapter-instancedata.html'>meta-data</a> to learn about the instance&#8217;s attributes, which in turn affect application configuration.</p>
<p>One such application we install is Linux&#8217;s <a href='http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)'>Logical Volume Manager</a> (LVM).  Instances with large amounts of ephemeral storage often have <a href='http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html'>multiple volumes</a> attached to them.  LVM allows us to abstract the multiple volumes into a single one.</p>
<p>The LVM installer needs to know the block device mapping (e.g. /dev/sdb) of the ephemeral storage.  Such information is only available in the instance meta-data.   Unfortunately, we&#8217;ve since discovered that requests for block-device-mapping sometimes return an empty string.  So once again, we need to back-off and try again.  Complicating matters is that even once a given request returns valid data, subsequent requests may again return no data!</p>
<h3>Instances Cannot Always be Launched</h3>
<p><a href="http://www.picloud.com/platform/realtime/">Realtime Cores</a> are our way of letting you dictate to our scaler the exact number of cores you need. The scaler allocates Realtime Cores by issuing an all-or-nothing <a href="http://docs.aws.amazon.com/AWSEC2/latest/APIReference/ApiReference-query-RunInstances.html">RunInstances</a> request (e.g. <code>MinCount == MaxCount == 10</code>). While Amazon&#8217;s documentation warns that sometimes instances can&#8217;t be launched, for months, everything worked. Like the C programmer who doesn&#8217;t check that the pointer returned by <code>malloc</code> is not <code>NULL</code>, we stopped worrying about what was actually returned.</p>
<p>Sure enough, one day, requests for dozens of Cluster Compute Eight Extra Large Instances (cc2.8xlarge) started failing due to &#8220;Insufficient capacity&#8221;. We weren&#8217;t even requesting instances from a specific Availability Zone (AZ); there wasn&#8217;t enough capacity anywhere!</p>
<p>What we thought would never happened turned out to be real.. and we had to update our Realtime interface appropriately.</p>
<p>And there was another subtle lesson too.  While we are indifferent to the AZ any worker instance is launched in, RunInstances will never return instances from a heterogeneous mix of AZs; Amazon will always place the batch requested instances in the same AZ.  Consequently, we now set MinCount to 1 and keep issuing requests until the correct number of instances is launched.</p>
<h3>Conclusion</h3>
<p>This article only touches on some of the EC2 difficulties we&#8217;ve encountered.  What we initially thought would be a simple, clean scaling management system ended up full of hacks to handle inconsistencies in the Amazon API.   As we&#8217;ve discovered over the past four years creating <a href="http://www.picloud.com/">PiCloud</a>, building a robust, large-scale system, even on EC2, where so much infrastructure management is already handled, ends up far more challenging than it initially appears to be.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.picloud.com/2013/02/17/dealing-with-the-inconsistent-ec2-api/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>2013 Academic Research Grant Recipients</title>
		<link>http://blog.picloud.com/2013/01/16/2013-academic-research-grant-recipients/</link>
		<comments>http://blog.picloud.com/2013/01/16/2013-academic-research-grant-recipients/#comments</comments>
		<pubDate>Wed, 16 Jan 2013 19:18:04 +0000</pubDate>
		<dc:creator>Daniel Singh</dc:creator>
				<category><![CDATA[Academic Research Program]]></category>
		<category><![CDATA[academic research program recipients]]></category>

		<guid isPermaLink="false">http://blog.picloud.com/?p=1418</guid>
		<description><![CDATA[Once again, we are excited to announce the recipients of the $500 (10,000 c1 core hours) Academic Research Program Grant from PiCloud.   In response to the overwhelming number of high quality applicants, we decided to award six grants this year! We&#8217;ve included a list of recipients below so you can see some of the great work that will [...]]]></description>
			<content:encoded><![CDATA[<p>Once again, we are excited to announce the recipients of the $500 (10,000 c1 core hours) Academic Research Program Grant from PiCloud.   In response to the overwhelming number of high quality applicants, we decided to <strong>award six grants this year</strong>!</p>
<p>We&#8217;ve included a list of recipients below so you can see some of the great work that will be benefiting from our platform.</p>
<p>If you missed out on this cycle of grants, don&#8217;t worry! We plan on having another Grant Program launched in the very near future.</p>
<h3>List of Recipients</h3>
<p><strong>Tim Althoff</strong><br />
Graduate Student Researcher<br />
German Research Center for Artificial Intelligence (DFKI)<br />
</p>
<p>
Over time different topics arise in media and society which reflect shifting interests of groups of individuals. These trending topics are also reflected in online media such as Twitter, Google, and Wikipedia. Our research project seeks to elucidate trending topics in online media through analyzing what people are interested in and characterizing those trends as well as making predictions about the impact and lifetime of those trends. These models and predictions are relevant to finance, advertising, journalism, and social media recommenders. Furthermore, we use trending topics to inform machine learning models in visual concept detection that automatically annotate videos with tags.
</p>
<p><strong>Robert Lanfear and Brett Calcott</strong><br />
Postdoctoral Research Fellows<br />
Australian National University, Canberra</p>
<p></p>
<p>
Biologists are now routinely producing genome-scale datasets, and bioinformaticians have to work hard to keep up. We are developing a recently-released piece of bioinformatics software, PartitionFinder, to run in the cloud. This will allow biologists to select optimal models and partitioning schemes for genome-scale datasets without investing in huge servers. We hope that this will help improve the inferences we make from genome-scale datasets in biology.
</p>
<p><strong>Ken Locey</strong><br />
PhD Candidate in the Dept. of Biology<br />
Utah State University</p>
<p></p>
<p>Development of a general approach to explain and predict macroecological patterns and a common pattern in nature, i.e. the uneven distribution of wealth and abundance.  Developed and test a framework for examining and predicting distributions of wealth and abundance in almost any system.</p>
<p><strong>Massimo Minervini</strong><br />
Ph.D. Candidate in Computer Science and Engineering<br />
IMT Institute for Advanced Studies, Lucca, Italy</p>
<p></p>
<p>
Multi-atlas, multi-template anatomical segmentation of brain MRI volumes when combined with label fusion is considered a state-of-the-art method for the segmentation of brain anatomy in humans and animal models. However, it is normally a challenging task and depending on the number of subjects, templates, and atlases involved, the amount of computation required is significant. We want to take advantage of PiCloud’s computational power to speed up this process significantly, in an intelligent and time optimal fashion.
</p>
<p><strong>Thomas Robitaille</strong><br />
Research Group Leader<br />
Max Planck Institute for Astronomy, Heidelberg, Germany</p>
<p></p>
<p>
Carrying out mosaicking for the GLIMPSE project, a survey of our Galaxy with the Spitzer Space Telescope and compute radiative transfer models with  a Python-based radiative transfer code that can be used to simulate observations of many Astrophysical objects, including for example, comets, forming stars, and galaxies.
</p>
<p><strong>Yannick Wurm</strong><br />
Lecturer in Bioinformatics<br />
Queen Mary, University of London</p>
<p></p>
<p>
Ants, bees, wasps and termites live in societies of complexities that rival our own. Past studies largely focused on the behaviors, morphologies and evolutionary histories of such social insects. Thanks to recent improvements in DNA sequencing and analysis technologies, we are beginning to identify the genes involved in social interactions in these species.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.picloud.com/2013/01/16/2013-academic-research-grant-recipients/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>When EC2 Hardware Changes Underneath You&#8230;</title>
		<link>http://blog.picloud.com/2013/01/08/when-ec2-hardware-changes-underneath-you/</link>
		<comments>http://blog.picloud.com/2013/01/08/when-ec2-hardware-changes-underneath-you/#comments</comments>
		<pubDate>Wed, 09 Jan 2013 06:36:49 +0000</pubDate>
		<dc:creator>Aaron Staley</dc:creator>
				<category><![CDATA[Battle Stories]]></category>
		<category><![CDATA[avx]]></category>
		<category><![CDATA[ec2 bugs]]></category>

		<guid isPermaLink="false">http://blog.picloud.com/?p=1341</guid>
		<description><![CDATA[At PiCloud, we&#8217;ve accumulated over 100,000 instance requests on Amazon EC2. Our scale has exposed us to many odd behaviors and outright bugs, which we&#8217;ll be sharing in a series of blog posts to come. In this post, I&#8217;ll share one of the strangest we&#8217;ve seen. The Bug It started with a customer filing a [...]]]></description>
			<content:encoded><![CDATA[<p>At PiCloud, we&#8217;ve accumulated over 100,000 instance requests on <a href="http://aws.amazon.com/ec2/">Amazon EC2</a>. Our scale has exposed us to many odd behaviors and outright bugs, which we&#8217;ll be sharing in a series of blog posts to come. In this post, I&#8217;ll share one of the strangest we&#8217;ve seen.</p>
<h3>The Bug</h3>
<p>It started with a customer filing a support ticket about code that had been working flawlessly for months suddenly crashing. Some, but not all, of his jobs were failing with an error that looked something like:</p>
<p style="font-family: fixed; padding-left: 10px; color: red;">Fatal Python error: Illegal instruction</p>
<p></p>
<p style="font-family: fixed; padding-left: 10px; color: red;">File &#8220;/usr/local/lib/python2.6/dist-packages/numpy/linalg/linalg.py&#8221;, line 1319 in svd</p>
<p style="font-family: fixed; padding-left: 10px; color: red;">File &#8220;/usr/local/lib/python2.6/dist-packages/numpy/linalg/linalg.py&#8221;, line 1546 in pinv</p>
</p>
<p>That&#8217;s odd, I thought. I had never before seen the Python interpreter use an <em>Illegal Instruction</em>! Naturally, I checked the relevant line that was crashing:</p>
<pre class="brush: python; first-line: 1318; gutter: false; title: ; notranslate">
results = lapack_routine(option, m, n, a, m, s, u, m, vt, nvt, work, lwork, iwork, 0)
</pre>
<p>A call to <a href="http://www.numpy.org/">numpy&#8217;s</a> C++ lapack_lite. Great, the robust numpy was crashing out.</p>
<p>More surprising was that a minority of jobs were failing, even though the customer indicated that all jobs were executing the problematic line. We did notice that the job failures were linked to just a few servers and those few servers ran none of the customer&#8217;s jobs successfully. Unfortunately, our automated scaling systems had already torn down the server.</p>
<h3>Debugging</h3>
<p>The first thing I did was Google the error. Most results were unhelpful, but one old, though now solved, <a href="http://software.intel.com/en-us/articles/illegal-instruction-intelr-coretm2-duo-cpu-e8400">bug</a> with Intel&#8217;s <a href="http://software.intel.com/en-us/intel-mkl">Math Kernel Library</a> (MKL) seemed notable. MKL would crash with an illegal instruction error when AVX (Advanced Vector Extensions, a 2011 extension to x86) instructions were being executed on CPUs that lacked support.  Why notable?  We compile numpy and scipy libraries with MKL support to give the best possible multi-threading performance, especially on the hyperthreading &#038; AVX capable <a href="http://www.picloud.com/pricing/">f2 core</a>.</p>
<p>Still though, why did only a few servers crash out?   Having not much to go on, I launched a hundred High-Memory <a href="http://aws.amazon.com/ec2/instance-types/">m2.xlarge</a> EC2 instances (200 m2 cores in PiCloud <a href="http://www.picloud.com/pricing/">nomenclature</a>) and reran all the user&#8217;s jobs over the nodes. A few jobs, all on the same server, failed.</p>
<p>As I compared the troublesome instance to the sane ones, one difference stood out.  The correctly operating m2.xlarge instances were running 2009-era Intel <a href="http://ark.intel.com/products/37106/Intel-Xeon-Processor-X5550-8M-Cache-2_66-GHz-6_40-GTs-Intel-QPI">Xeon X5550</a> CPUs.  But the troublesome instance was running a more modern (2012) <a href="http://ark.intel.com/products/64597/Intel-Xeon-Processor-E5-2665-20M-Cache-2_40-GHz-8_00-GTs-Intel-QPI">Xeon E5-2665</a> CPU.  And returning back to the MKL bug noted earlier, this new chip supported AVX.</p>
<p>Examining <em>/proc/cpuinfo</em> showed as much; AVX was supported on the failing instance, but not the new ones.  To test it out, I compiled some code from <a href="http://stackoverflow.com/questions/9193697/avx-optimized-code-not-running-on-linux-redhat-5-6">stackoverflow</a>  with &#8216;g++ -mavx&#8221;.  Sure enough, running the binary produced an Illegal Instruction.</p>
<p>From my perspective as an instance user, the processor was lying, claiming to support AVX but actually crashing when any AVX code would run.</p>
<h3>Analysis</h3>
<p>Turns out the actual answer was subtle.  Per the Intel <a href="http://software.intel.com/sites/default/files/m/a/b/3/4/d/41604-319433-012a.pdf">manual</a>, it is possible for the operating system to disable AVX instructions by disabling the processor&#8217;s OSXSAVE feature.  By the spec, any application wishing to use AVX first must check if OSXSAVE is enabled.</p>
<p>Amazon seems to have disabled the OSXSAVE feature at the hypervisor layer on their new Xeon E5-2665 based m2.* series of instances.  This may just be because their version of the <a href="http://www.xen.org/">Xen</a> hypervisor that manages these instances lacks support for handling AVX registers in context switching.  But even if support does exist in the hypervisor, it makes sense to disable AVX for the m2.* family as long as there are Xeon X5550 based instances. Imagine compiling a program on an m2.xlarge EBS instance, thinking you had AVX support, and then upon stopping/starting the instance, finding that the program crashes, because your instance now runs on older hardware that doesn&#8217;t have AVX support!  A downside of VM migration is that all your hardware must advertise the least common denominator of capabilities.</p>
<p>Unfortunately, Amazon did not ensure that the Guest OS saw that OSXSAVE was disabled.  This led to MKL thinking it had the capabilities to run AVX code, when it actually didn&#8217;t.</p>
<p>Ultimately, there was not much to do but:</p>
<ol>
<li>Given how rare the Xeon E5-2665 instances are, we now simply self-destruct if an m2.*&#8217;s <em>/proc/cpuinfo</em> claims that both <em>avx</em> and <em>xsave</em> is enabled</li>
<li>File a support case with Amazon. They have been quite responsive and as I publish this post, it seems that a fix has at least been partially pushed.</li>
</ol>
<p>So, if you use instances in the m2.* family, be sure to check /proc/cpuinfo. If the instance claims it has both <em>avx</em> and <em>xsave</em>, it is probably lying to you.</p>
<p>Alternatively, if you are doing high performance computation in the cloud, you may just want to pass on the responsibility for such dirty details to us at <a href="http://www.picloud.com">PiCloud</a>. <img src='http://blog.picloud.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.picloud.com/2013/01/08/when-ec2-hardware-changes-underneath-you/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Introducing the PiCloud Notebook</title>
		<link>http://blog.picloud.com/2012/12/23/introducing-the-picloud-notebook/</link>
		<comments>http://blog.picloud.com/2012/12/23/introducing-the-picloud-notebook/#comments</comments>
		<pubDate>Mon, 24 Dec 2012 07:01:30 +0000</pubDate>
		<dc:creator>Ken Elkabany</dc:creator>
				<category><![CDATA[What's New]]></category>
		<category><![CDATA[new features]]></category>
		<category><![CDATA[notebook]]></category>

		<guid isPermaLink="false">http://blog.picloud.com/?p=1275</guid>
		<description><![CDATA[The PiCloud Notebook is a Python console in your browser optimized for data analysis and collaboration. Unlike a traditional console, a notebook lets you intermingle code and descriptive text. The best way to get a feel for how it works is to see it: We&#8217;re big fans of IPython, and those who are familiar with [...]]]></description>
			<content:encoded><![CDATA[<p>The PiCloud Notebook is a Python console in your browser optimized for data analysis and collaboration. Unlike a traditional console, a notebook lets you intermingle code and descriptive text. The best way to get a feel for how it works is to see it:</p>
<p><a href="http://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/notebook_primer_cropped.png"><br />
<img style="width: 90%; display: block; margin-left: auto; margin-right: auto;" src="http://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/notebook_primer_cropped.png" /><br />
</a></p>
<p>We&#8217;re big fans of <a href="http://ipython.org">IPython</a>, and those who are familiar with it will immediately recognize the console as an <a href="http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html">IPython Notebook</a> running on the PiCloud Platform. While we believe that in general developers will continue to code on their local machine for convenience, there are several advantages to using a cloud-based notebook:</p>
<ul>
<li>Your research is accessible from anywhere.</li>
<li>You can share your work with collaborators.</li>
<li>You can work efficiently with your data stored on PiCloud or AWS.</li>
<li>You can get comfortable with the environment your jobs run in.</li>
</ul>
<h3>Getting Started</h3>
<p>Log into your account (<a href="https://www.picloud.com/accounts/register/">sign up and get 20 free core hours</a>), and click on the Notebook tab. This brings you to your Notebook Machine.</p>
<p>Think of your Notebook Machine as a computer dedicated to you sitting on the cloud. When you open a notebook, your Notebook Machine is started if it isn&#8217;t already. And when you&#8217;re done, you should shutdown the machine to avoid wasting resources.</p>
<p>By default, a notebook named &#8220;Primer&#8221; is available, which will walk you through the examples in the introductory section of our documentation called <a href="http://docs.picloud.com/primer.html">Primer</a>. Click on it to open your first notebook.</p>
<p>Use your PiCloud account password when prompted.</p>
<h3>Using a Notebook</h3>
<h4>Cell</h4>
<p>Each box in your notebook is called a cell. Cells can hold code, or various types of markup (<a href="http://daringfireball.net/projects/markdown/">Markdown</a>, headers, or raw text). Unlike in a traditional Python console, pressing <i><strong>enter</strong></i> in a cell will create a new line, but not execute it. This makes it easy to write multi-line functions and sequences of commands.</p>
<h4>Executing Code</h4>
<p>When you want to execute a cell, whether it&#8217;s code or markup, press <i><strong>shift+enter</strong></i>. Note the number indicating the order of execution on the left side of the cell, &#8220;In [X]&#8220;. Unlike a traditional console, you can execute and re-execute a cell at any time, and cells are thus not necessarily in order of execution. While it takes some time to get used to, it&#8217;s very handy when you&#8217;re continuously iterating your code.</p>
<h4>Executing Shell Commands</h4>
<p>While the primary use case of the notebook is for writing Python code, it&#8217;s also convenient for running shell commands. To execute a shell command, just prefix a command with &#8220;!&#8221;. For example, you can run &#8220;!ls&#8221;, &#8220;!pwd&#8221;, or even the PiCloud command-line interface (CLI), &#8220;!picloud&#8221;.</p>
<p><a href="http://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/notebook_shell_cropped.png"><br />
<img style="width: 90%; display: block; margin-left: auto; margin-right: auto;" src="http://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/notebook_shell_cropped.png" /><br />
</a></p>
<h4>Visualization</h4>
<p>Your notebook can also display rich media including images, graphs, videos, and more!</p>
<p><a href="http://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/notebook_graph_cropped.png"><br />
<img style="width: 90%; display: block; margin-left: auto; margin-right: auto;" src="http://s3.amazonaws.com/pi-user-buckets/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/blog/notebook_graph_cropped.png" /><br />
</a></p>
<h4>What else?</h4>
<p>For more tips and tricks, see the following resources:</p>
<ul>
<li><a href="http://docs.picloud.com/notebook.html">PiCloud Notebook Documentation</a></li>
<li><a href="http://ipython.org/documentation.html">IPython Documentation</a></li>
<li><a href="http://ipython.org/ipython-doc/stable/interactive/htmlnotebook.html">IPython Notebook Documentation</a></li>
<li>Inside your notebook: Help -> Keyboard shortcuts</li>
</ul>
<h3>What is the Notebook Machine Exactly?</h3>
<p>Your Notebook Machine is more than a web application. It&#8217;s a full-blown Linux system that can be accessed through the notebook console. We&#8217;re able to offer this by leveraging the same job abstraction we&#8217;ve already devoted so much effort to. In fact, the machine is running as a job. You can see the job id of your notebook machine from the notebook tab.</p>
<h3>How is it Implemented?</h3>
<p>As mentioned previously, our notebook is an IPython Notebook running on PiCloud. The notebook is run as a job in your account. For those of you familiar with our platform, you may be wondering how you&#8217;re able to connect to the notebook web server running in the job. The answer is that we&#8217;ve just released a feature that allows jobs to <a href="http://docs.picloud.com/job_mgmt_adv.html#listening-on-ports">open listening sockets which can accept external connections</a>. This opens up numerous possibilities including using sockets for job to job communication, as well as hosting web applications.</p>
<h3>Using the Notebook to Live the Life of a Job</h3>
<p>Another advantage of a notebook is it allows you to explore the system that a job sees. You can do the following:</p>
<ul>
<li>Peek around the filesystem.</li>
<li>Import Python libraries to check availability and version.</li>
<li>Run non-Python programs.</li>
<li>Verify that your custom environment is working as expected.</li>
<li>Verify that your volumes are mounted in the way you intended.</li>
<li>Benchmark workloads interactively.</li>
</ul>
<p>Refer to our documentation on <a href="http://docs.picloud.com/notebook.html#configuration">configuring your notebook machine</a> for more information regarding mimicking a job&#8217;s usage of volumes, environments, multicore, and core types.</p>
<h3>Collaboration</h3>
<p>If you want to give collaborators access to use your notebook machine, set a secondary password. Then provide the URL of your notebook to your collaborators. They should use the secondary password when prompted. We don&#8217;t want you to hand out your primary account password to others.</p>
<p>You can also send a notebook to a collaborator by downloading it (File -> Download as) and manually sending it.</p>
<h3>Suggestions?</h3>
<p>We&#8217;re really excited about this latest addition to the PiCloud Platform. If you have any ideas, let us know!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.picloud.com/2012/12/23/introducing-the-picloud-notebook/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Announcing PiCloud’s Second Annual Research Grant Program</title>
		<link>http://blog.picloud.com/2012/12/11/announcing-piclouds-second-annual-academic-research-grant-program/</link>
		<comments>http://blog.picloud.com/2012/12/11/announcing-piclouds-second-annual-academic-research-grant-program/#comments</comments>
		<pubDate>Wed, 12 Dec 2012 05:49:59 +0000</pubDate>
		<dc:creator>Daniel Singh</dc:creator>
				<category><![CDATA[Academic Research Program]]></category>

		<guid isPermaLink="false">http://blog.picloud.com/?p=1254</guid>
		<description><![CDATA[Update: We&#8217;ve decided to extend the deadline until January 2nd to accommodate all those wishing to apply, but that have busy end-of-year schedules. PiCloud has been growing incredibly, and we could not do it without the support and feedback of our global research community. So, to return the favor, today we&#8217;re introducing the 2nd PiCloud [...]]]></description>
			<content:encoded><![CDATA[<p><b>Update: We&#8217;ve decided to extend the deadline until January 2nd to accommodate all those wishing to apply, but that have busy end-of-year schedules.</b></p>
<p>PiCloud has been growing incredibly, and we could not do it without the support and feedback of our global research community. So, to return the favor, today we&#8217;re introducing the 2<sup>nd</sup> PiCloud Research Program open to all research developers, engineers and academic professionals worldwide.</p>
<p>Our mission has always been to bring the cloud to scientists and engineers who don&#8217;t have access to a major compute cluster or lack the system administration know-how to operate one. Due to the success of our first Academic Grant Program in 2011, we’re proud to offer the opportunity for researchers around the world to once again submit their proposal for the chance to receive $500 (10,000 c1 core hours) free.</p>
<p>To apply, send an e-mail to <a href="mailto:research-funding@picloud.com" target="_blank">research-funding@picloud.com</a> by Wednesday, January 2nd 2013 with the following:</p>
<ol>
<li>Full name</li>
<li>Organization or Educational Institution</li>
<li>Your position</li>
<li>Short biography</li>
<li>A summary of your research field and project. Feel free to include conference papers, publications, and links to project websites. Please emphasize how PiCloud&#8217;s compute power will facilitate your research.</li>
</ol>
<p>This year we will be awarding three submissions with free core hours. Winning researchers will also have an opportunity to get larger exposure for their projects on our blog, social media, case studies and our website.</p>
<p>To get an idea of past submissions we&#8217;ve received, see our <a href="http://blog.picloud.com/2011/11/04/academic-research-program-grant-recipients/">previous winners</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.picloud.com/2012/12/11/announcing-piclouds-second-annual-academic-research-grant-program/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Case Study: Speeding Up Machine Learning by 1,000 Fold</title>
		<link>http://blog.picloud.com/2012/12/11/case-study-speeding-up-machine-learning-by-1000-fold/</link>
		<comments>http://blog.picloud.com/2012/12/11/case-study-speeding-up-machine-learning-by-1000-fold/#comments</comments>
		<pubDate>Tue, 11 Dec 2012 10:35:39 +0000</pubDate>
		<dc:creator>Ken Elkabany</dc:creator>
				<category><![CDATA[Case Study]]></category>
		<category><![CDATA[Success Story]]></category>
		<category><![CDATA[case study]]></category>
		<category><![CDATA[d-wave]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[quantum computing]]></category>

		<guid isPermaLink="false">http://blog.picloud.com/?p=1211</guid>
		<description><![CDATA[One of our most exciting partnerships has been with D-Wave Systems. Their technical team, led by Founder and CTO Geordie Rose, has an incredibly bold vision for the future. One that has led them to build the world&#8217;s first large-scale adiabatic quantum computer, the D-Wave One. To understand how PiCloud is used with a quantum [...]]]></description>
			<content:encoded><![CDATA[<p><img style="float: right; padding-left: 15px; padding-bottom: 15px;" src="http://www.dwavesys.com/en/images/d_wave_logo.jpg" width="230px"></p>
<p>One of our most exciting partnerships has been with <a href="http://www.dwavesys.com/">D-Wave Systems</a>. Their technical team, led by Founder and CTO Geordie Rose, has an incredibly bold vision for the future. One that has led them to build the world&#8217;s first large-scale adiabatic quantum computer, the D-Wave One.</p>
<p>To understand how PiCloud is used with a quantum computer, we can draw a parallel to the relationship between a CPU and GPU. A CPU is general purpose, and is responsible for running an application and controlling its flow. It only calls on a GPU for specialized tasks, particularly <a href="http://en.wikipedia.org/wiki/SIMD">SIMD</a>-favorable operations commonly found in graphics. Similar to a GPU, the D-Wave One requires a general-purpose computing cluster to work alongside it, as it only solves highly-specialized problems. Rather than building their own compute cluster to complement their quantum computer, D-Wave turned to us. Geordie writes:</p>
<blockquote style="padding-bottom: 20px;"><p>
<span style="font-size: 18px; color: #666666; font-style: italic;"><br />
&#8220;We wouldn’t have been able to do the project at all, as none of us had the experience necessary to build the infrastructure PiCloud provides.<br />
</span>
</p></blockquote>
<p>The D-Wave One, and its planned successors, are designed to find solutions to the <a href="http://en.wikipedia.org/wiki/Ising_model">Ising model</a>, which has broad applications in machine learning including in <a href="http://en.wikipedia.org/wiki/Deep_learning">deep learning</a>. For those interested in the optimization model, D-Wave&#8217;s software running on PiCloud is responsible for generating feature vectors, while the D-Wave One is responsible for generating weight vectors for fitting those features to represent some data array (image, audio, text, &#8230;). The process is iterative where each iteration optimizes either weights or features, while the other is held constant.</p>
<p>The results have been a success. Geordie continues:</p>
<blockquote style="padding-bottom: 20px;"><p>
<span style="font-size: 18px; color: #666666; font-style: italic;"><br />
“We have achieved speedups on the order of 1,000 times faster for large<br />
unsupervised feature learning jobs bringing tasks that would have taken six months on single workstations down to less than half a day.”<br />
</span>
</p></blockquote>
<p>Geordie has identified the two underlying reasons for PiCloud popularity in computationally-heavy disciplines. The first is the access we provide to an unparalleled amount of computing power. The second is the ability to use said computing power without the need for in-house expertise.</p>
<p>
You can <a href="https://pi-user-buckets.s3.amazonaws.com/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/case-study/picloud_case_study_dwave.pdf">download the full D-Wave case study</a>, or view it in your browser below.
</p>
<p><iframe src="http://docs.google.com/gview?url=http://pi-user-buckets.s3.amazonaws.com/XMorKthhWQ1wdqkIhtPNyyjBfkd5lZP9ql4klJ92/case-study/picloud_case_study_dwave.pdf&#038;embedded=true" style="width:620px; height:660px;" frameborder="0"></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.picloud.com/2012/12/11/case-study-speeding-up-machine-learning-by-1000-fold/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PiCloud Wins Grand Prize in Amazon EC2 Spotathon</title>
		<link>http://blog.picloud.com/2012/12/05/picloud-wins-grand-prize-in-amazon-ec2-spotathon/</link>
		<comments>http://blog.picloud.com/2012/12/05/picloud-wins-grand-prize-in-amazon-ec2-spotathon/#comments</comments>
		<pubDate>Wed, 05 Dec 2012 10:05:33 +0000</pubDate>
		<dc:creator>Ken Elkabany</dc:creator>
				<category><![CDATA[Official]]></category>

		<guid isPermaLink="false">http://blog.picloud.com/?p=1172</guid>
		<description><![CDATA[Update: AWS has a blog post covering the results. We&#8217;re excited to announce that Amazon Web Services has chosen PiCloud as the 1st ever Grand Prize winner of the EC2 Spotathon! As indicated on the contest page, the judging criteria were as follows: cost savings by using spots; performance benefits due to spots; computational scale [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Update: <a href="http://aws.typepad.com/aws/2012/12/picloud-and-princeton-consultants-win-the-first-amazon-ec2-spotathon.html">AWS has a blog post covering the results</a>.</strong></p>
<p>We&#8217;re excited to announce that Amazon Web Services has chosen PiCloud as the 1st ever Grand Prize winner of the <a href="http://aws.amazon.com/ec2/spotathon/">EC2 Spotathon</a>! </p>
<p>As indicated on the contest page, the judging criteria were as follows: cost savings by using spots; performance benefits due to spots; computational scale achieved by application; and overall elegance and efficacy.</p>
<p>To give an idea to the broader community how PiCloud fared with each of the above criteria, we&#8217;ve decided to release our Spotathon application. We hope our readers will be able to gain insight into using spot instances effectively for their own applications.</p>
<p>&nbsp;</p>
<h2>Spotathon Application</h2>
<p><strong>15. What is your Spot application, what problem does it solve, and why is it important?  For example: If you are representing a company or organization, what does your company do and how does your Spot application fit in?</strong></p>
<p>PiCloud offers a Platform-as-a-Service (PaaS) for high-performance computing, batch processing, and scientific computing applications. We differentiate ourselves from the Amazon Web Services offerings by providing high-level APIs that scientists and engineers with minimal system administration experience can leverage. Our platform has been used in a wide range industrial and academic applications that take advantage of computational sciences: pharmaceutical (sequence alignment, protein folding), oil &#038; gas (geophysics), finance (risk analysis), quantum computing, machine learning, image/video processing, and many more.</p>
<p>Our popularity with scientists and engineers stems from our ease of use. Most notably, our users do not provision, administer, or teardown servers. Instead, a user submits jobs to us. A job is a unit of computational work like finding proteins of interest in a genome. It is our responsibility to take these jobs, and distribute them across our cluster of machines made available by Amazon.</p>
<p>Because most workloads we receive are batch, we have the flexibility to trade-off between the number of EC2 servers we rent, and the time it takes for a workload to complete. For a set of jobs, we can bring up more servers to increase parallelization, and hence shorten the time it takes for them to complete, at the risk of incurring the heavy cost of idling servers when that batch is completed. Spot Instances play a vital role in addressing this dilemma by enabling us to increase parallelization with lower risk.</p>
<p>To understand why, you’ll need to understand how we determine the number of EC2 servers to rent at any given point in time. We never know in advance how long each job in our system will take to complete. However, statistical analysis on previous jobs of the same type lets us estimate to a degree of confidence how long our current queue of jobs is. With Amazon charging hourly, we aim to group jobs such that each group takes one hour at full instance utilization. We rent as many servers as we have groups.</p>
<p>If we underestimate the length of our queue of jobs, our user’s experience suffers. If we overestimate, our servers sit idle driving up our costs. Spot instances reduce the risk of poor estimation, allowing us to scale up our cluster and finish scientific workloads faster. We estimate that we’ve been able to bring up roughly 50% more servers at the same cost, improving user experience by delivering results 33% faster. For the thousands of researchers on our platform, who have collectively processed over 100 million jobs, the benefits of spot instances have been immeasurable.</p>
<p>&nbsp;</p>
<p><strong>16. How have you incorporated Amazon EC2 Spot Instances into your application?  Please describe your application architecture, including: how you evaluate the Spot market, how you bid on and manage your Spot Instances, how you handle Spot interruptions, how you integrate them with On Demand or other computing resources (if any), and any third party architecture or software you use.</strong></p>
<p>Many of the instance types we deploy are frequently available at prices as low as one-tenth that of on-demand instances.  Thus, leveraging the typical price advantage of spot instances allows PiCloud to simultaneously:</p>
<ol>
<li>Accept lower server utilization rates, meaning we launch more worker instances to process customer workloads faster.</li>
<li>Provide even more competitive pricing to our users.</li>
<li>Have higher profit margins.</li>
</ol>
<p>However, the use of spot instances comes with its own set of challenges: price volatility, termination without notification, and slower server provisioning.</p>
<p>To handle these issues, we have designed a sophisticated scaling system that continuously monitors and analyzes the price of spot instances across different availability zones.  Merging this analysis and our prediction of job queue size, we are able to predict in real-time the distribution of on-demand and spot instances that optimizes customer experience (minimizing time to complete workloads) and cost to PiCloud (see example in #18 for this in practice).  Thus, we are constantly expanding and contracting our pool of spot and on-demand worker instances.</p>
<p>The biggest drawback of spot instances is that they are susceptible to being terminated by AWS at any time due to the fluctuating supply and demand.  While the termination of an instance that is running a user’s computation is undesirable, we are capable of handling that event. If our infrastructure detects that an instance has terminated, the terminated instance’s workload is restarted on an active instance; user&#8217;s are not charged for the work that was “lost”.</p>
<p>Because these restarts are undesirable, we take several actions to mitigate them:</p>
<ul>
<li><b>Avoiding volatile spots</b>: Our automated scaling system uses the most recent prices, and historical prices to gauge price volatility to predict the expected cost of a spot over the next hour.  We are only willing to use spots if this cost is significantly below that of on-demand instances.  Otherwise, only on-demand instances are utilized.</li>
<li><b>Overbidding</b>: As there is a cost to us and the user if we have to restart, we are willing to pay a bit more than even the on-demand cost to minimize the chance of spot termination. Our scaling system is responsible for safely terminating (i.e. waiting for jobs to complete) expensive workers.</li>
<li><b>Multi AZ</b>: Our worker instances are spread across Availability Zones, minimizing the shock of a spot instance price spike.</li>
<li><b>Placement</b>: We ensure that only users with short predicted runtimes are placed on spots. Longer runtime jobs are placed on less volatile on-demand instances. In practice, most jobs have runtimes less than a few minutes, because users typically break down long-running serial computation into smaller jobs to exploit maximum parallelism.</li>
</ul>
<p>Finally, some users use our &#8220;Realtime Cores&#8221; service to run at higher levels of parallelism than our estimator would provide.  In exchange for paying an hourly rate per core, they are given their own “job queue.”  For instance, if a user purchases 200 realtime cores, we guarantee that 200 jobs will be processed in parallel. Many users only request this service for several hours a day.  Unfortunately, spot instances deploy much slower than on-demand.  This additional boot time prevents us from satisfying a real-time request with spot instances.  Fortunately, many users’ real-time requests are issued periodically, making prediction possible. Spots are often cheap enough that it makes economic sense to satisfy a request we believe will occur in 15 minutes.  If we are right, our costs may be reduced by 90%; even if the prediction was wrong 50% of the time, we’d still end up with lower average costs.</p>
<p>&nbsp;</p>
<p><strong>17. What cost savings do you achieve by using Spot Instances in your application?  For example: How many instance-hours does your application use, how many are on Spot, and what is the total cost of running your Spot application?  What would the total cost be if you were not using Spot Instances?  What percent savings do you achieve?</strong></p>
<p>Typical monthly consumption of our platform is 100,000 instance hours per month, with over 85% on spot instances resulting in savings of tens of thousands of dollars per month.</p>
<p>The flexibility of our platform allows us to recoup nearly all of the price difference between on-demand and spot instances.  For instance, c1.xlarge spot instances are typically 85% cheaper than on-demand, meaning our steady-state costs are reduced by 85%.</p>
<p>In practice, because spot prices are not constant, we cannot capture all of the price differential. One loss is switching costs&mdash;moving from an expensive spot to an on-demand or moving from an on-demand to a cheap spot&mdash;where during the switch, we suffer lower effective utilization. Additionally, if an instance is “spot-terminated” while running computation, we must rerun the computation, potentially doubling our costs for that job.  In practice, both of these issues are minor and our savings still hover near 65%. </p>
<p>There is a trade-off though between performance and cost savings.  We intentionally do not capture some potential savings to increase customer performance.  This, along with an example of the savings and performance advantage, is discussed in Q#18.</p>
<p>&nbsp;</p>
<p><strong>18. What performance benefit(s) does your Spot application achieve by using Spot Instances?  Please describe.  For example: Are you able to achieve shorter time to results because you can deploy more EC2 instances?  If you’re running a simulation, does Spot enable you to execute more computational runs to improve the accuracy of your solution?</strong></p>
<p>As mentioned in #16 and #17, spots let us accept lower utilization over an hourly interval to complete customer workloads faster.  A practical example helps explain better:</p>
<p><b>Definition of core type</b>:  Each core we rent out is from a larger instance we&#8217;re renting from EC2. Different instances map to different “core types”. As an example, a “c2 core” represents 1 core of an c1.xlarge instance. Each c1.xlarge instance holds 8 (c2) cores.</p>
<p>If we have a user submit 10,000 5 minute c2 core jobs, the entire workload could theoretically be completed in 5 minutes.  As we charge by job runtime ($0.13/c2-core-hour), our revenue would be:</p>
<p>10,000 c2 jobs*5 (minutes/job) *(hour / 60 minutes) * ($0.13/(c2*hours) = $108</p>
<p>If we launched enough instances to finish the workload over 60 minutes on on-demand instances, our costs would be:<br />
10,000 c2 jobs*5 (minutes/job) *(hour / 60 minutes) * (1 c1.xlarge instance / 8 c2 jobs) * (0.66/c1.xlarge-hour) = $69</p>
<p>Under an 85% spot discount, our costs would be merely:<br />
$69*(1-0.85) = $10.30</p>
<p>However, sometimes we prefer to increase our costs to give our users higher performance.  As an example, we could complete this workload in 10 minutes (+ extra spot provisioning time) by running 5,000 jobs simultaneously. This requires:</p>
<p>5,000 c2 jobs * (1 c1.xlarge instance / 8 c2 jobs) = 625 c1.xlarge instances</p>
<p>At spot rates, this is still pretty cheap: $62. However, this level of performance would be impossible to realize with on-demand instances: It would cost $412, far more than our revenue.</p>
<p>Another source of performance benefit (and a trade-off over cost) is the earlier mentioned realtime prediction.  To ensure a positive customer experience, we do not, due to slower provisioning time, request (new) spot instances to satisfy a realtime request; rather, if we have insufficient capacity, we launch on-demand instances (which can later be replaced by spots).  However, the low cost of spots allows us to act on realtime request predictions (#16).  A correct prediction not only lowers our costs (using spots rather than on-demand), but also ensures the user’s realtime request is satisfied instantly (rather than waiting the 5 minutes it typically takes to deploy our worker instances).</p>
<p>&nbsp;</p>
<p><strong>19. What computational scale have you been able to achieve with your Spot application? For example: What is the most number of concurrent instances you have been able to run?  Does your application run across many regions and instance types?  How many instance-hours does it (did it) take to run your application?</strong></p>
<p>Our application extensively makes use of being in a “cloud” environment; we are constantly requesting and terminating instances based on user demand.</p>
<p>As mentioned in Q#15, our workers operate on c1.xlarge, m2.xlarge, t1.micro, and cc2.8xlarge instances. While we operate solely in the US East region, we utilize all availability zones. </p>
<p>Our platform is theoretically unbounded in the number of concurrent instances it supports.  Peak customer usage has required provisioning over 1,000 instances.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.picloud.com/2012/12/05/picloud-wins-grand-prize-in-amazon-ec2-spotathon/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
