Announcing PiCloud’s Second Annual Research Grant Program

December 11th, 2012 by Daniel Singh

Update: We’ve decided to extend the deadline until January 2nd to accommodate all those wishing to apply, but that have busy end-of-year schedules.

PiCloud has been growing incredibly, and we could not do it without the support and feedback of our global research community. So, to return the favor, today we’re introducing the 2nd PiCloud Research Program open to all research developers, engineers and academic professionals worldwide.

Our mission has always been to bring the cloud to scientists and engineers who don’t have access to a major compute cluster or lack the system administration know-how to operate one. Due to the success of our first Academic Grant Program in 2011, we’re proud to offer the opportunity for researchers around the world to once again submit their proposal for the chance to receive $500 (10,000 c1 core hours) free.

To apply, send an e-mail to by Wednesday, January 2nd 2013 with the following:

  1. Full name
  2. Organization or Educational Institution
  3. Your position
  4. Short biography
  5. A summary of your research field and project. Feel free to include conference papers, publications, and links to project websites. Please emphasize how PiCloud’s compute power will facilitate your research.

This year we will be awarding three submissions with free core hours. Winning researchers will also have an opportunity to get larger exposure for their projects on our blog, social media, case studies and our website.

To get an idea of past submissions we’ve received, see our previous winners.

Case Study: Speeding Up Machine Learning by 1,000 Fold

December 11th, 2012 by Ken Elkabany

One of our most exciting partnerships has been with D-Wave Systems. Their technical team, led by Founder and CTO Geordie Rose, has an incredibly bold vision for the future. One that has led them to build the world’s first large-scale adiabatic quantum computer, the D-Wave One.

To understand how PiCloud is used with a quantum computer, we can draw a parallel to the relationship between a CPU and GPU. A CPU is general purpose, and is responsible for running an application and controlling its flow. It only calls on a GPU for specialized tasks, particularly SIMD-favorable operations commonly found in graphics. Similar to a GPU, the D-Wave One requires a general-purpose computing cluster to work alongside it, as it only solves highly-specialized problems. Rather than building their own compute cluster to complement their quantum computer, D-Wave turned to us. Geordie writes:

“We wouldn’t have been able to do the project at all, as none of us had the experience necessary to build the infrastructure PiCloud provides.

The D-Wave One, and its planned successors, are designed to find solutions to the Ising model, which has broad applications in machine learning including in deep learning. For those interested in the optimization model, D-Wave’s software running on PiCloud is responsible for generating feature vectors, while the D-Wave One is responsible for generating weight vectors for fitting those features to represent some data array (image, audio, text, …). The process is iterative where each iteration optimizes either weights or features, while the other is held constant.

The results have been a success. Geordie continues:

“We have achieved speedups on the order of 1,000 times faster for large
unsupervised feature learning jobs bringing tasks that would have taken six months on single workstations down to less than half a day.”

Geordie has identified the two underlying reasons for PiCloud popularity in computationally-heavy disciplines. The first is the access we provide to an unparalleled amount of computing power. The second is the ability to use said computing power without the need for in-house expertise.

You can download the full D-Wave case study, or view it in your browser below.

PiCloud Wins Grand Prize in Amazon EC2 Spotathon

December 5th, 2012 by Ken Elkabany

Update: AWS has a blog post covering the results.

We’re excited to announce that Amazon Web Services has chosen PiCloud as the 1st ever Grand Prize winner of the EC2 Spotathon!

As indicated on the contest page, the judging criteria were as follows: cost savings by using spots; performance benefits due to spots; computational scale achieved by application; and overall elegance and efficacy.

To give an idea to the broader community how PiCloud fared with each of the above criteria, we’ve decided to release our Spotathon application. We hope our readers will be able to gain insight into using spot instances effectively for their own applications.


Spotathon Application

15. What is your Spot application, what problem does it solve, and why is it important? For example: If you are representing a company or organization, what does your company do and how does your Spot application fit in?

PiCloud offers a Platform-as-a-Service (PaaS) for high-performance computing, batch processing, and scientific computing applications. We differentiate ourselves from the Amazon Web Services offerings by providing high-level APIs that scientists and engineers with minimal system administration experience can leverage. Our platform has been used in a wide range industrial and academic applications that take advantage of computational sciences: pharmaceutical (sequence alignment, protein folding), oil & gas (geophysics), finance (risk analysis), quantum computing, machine learning, image/video processing, and many more.

Our popularity with scientists and engineers stems from our ease of use. Most notably, our users do not provision, administer, or teardown servers. Instead, a user submits jobs to us. A job is a unit of computational work like finding proteins of interest in a genome. It is our responsibility to take these jobs, and distribute them across our cluster of machines made available by Amazon.

Because most workloads we receive are batch, we have the flexibility to trade-off between the number of EC2 servers we rent, and the time it takes for a workload to complete. For a set of jobs, we can bring up more servers to increase parallelization, and hence shorten the time it takes for them to complete, at the risk of incurring the heavy cost of idling servers when that batch is completed. Spot Instances play a vital role in addressing this dilemma by enabling us to increase parallelization with lower risk.

To understand why, you’ll need to understand how we determine the number of EC2 servers to rent at any given point in time. We never know in advance how long each job in our system will take to complete. However, statistical analysis on previous jobs of the same type lets us estimate to a degree of confidence how long our current queue of jobs is. With Amazon charging hourly, we aim to group jobs such that each group takes one hour at full instance utilization. We rent as many servers as we have groups.

If we underestimate the length of our queue of jobs, our user’s experience suffers. If we overestimate, our servers sit idle driving up our costs. Spot instances reduce the risk of poor estimation, allowing us to scale up our cluster and finish scientific workloads faster. We estimate that we’ve been able to bring up roughly 50% more servers at the same cost, improving user experience by delivering results 33% faster. For the thousands of researchers on our platform, who have collectively processed over 100 million jobs, the benefits of spot instances have been immeasurable.


16. How have you incorporated Amazon EC2 Spot Instances into your application? Please describe your application architecture, including: how you evaluate the Spot market, how you bid on and manage your Spot Instances, how you handle Spot interruptions, how you integrate them with On Demand or other computing resources (if any), and any third party architecture or software you use.

Many of the instance types we deploy are frequently available at prices as low as one-tenth that of on-demand instances. Thus, leveraging the typical price advantage of spot instances allows PiCloud to simultaneously:

  1. Accept lower server utilization rates, meaning we launch more worker instances to process customer workloads faster.
  2. Provide even more competitive pricing to our users.
  3. Have higher profit margins.

However, the use of spot instances comes with its own set of challenges: price volatility, termination without notification, and slower server provisioning.

To handle these issues, we have designed a sophisticated scaling system that continuously monitors and analyzes the price of spot instances across different availability zones. Merging this analysis and our prediction of job queue size, we are able to predict in real-time the distribution of on-demand and spot instances that optimizes customer experience (minimizing time to complete workloads) and cost to PiCloud (see example in #18 for this in practice). Thus, we are constantly expanding and contracting our pool of spot and on-demand worker instances.

The biggest drawback of spot instances is that they are susceptible to being terminated by AWS at any time due to the fluctuating supply and demand. While the termination of an instance that is running a user’s computation is undesirable, we are capable of handling that event. If our infrastructure detects that an instance has terminated, the terminated instance’s workload is restarted on an active instance; user’s are not charged for the work that was “lost”.

Because these restarts are undesirable, we take several actions to mitigate them:

  • Avoiding volatile spots: Our automated scaling system uses the most recent prices, and historical prices to gauge price volatility to predict the expected cost of a spot over the next hour. We are only willing to use spots if this cost is significantly below that of on-demand instances. Otherwise, only on-demand instances are utilized.
  • Overbidding: As there is a cost to us and the user if we have to restart, we are willing to pay a bit more than even the on-demand cost to minimize the chance of spot termination. Our scaling system is responsible for safely terminating (i.e. waiting for jobs to complete) expensive workers.
  • Multi AZ: Our worker instances are spread across Availability Zones, minimizing the shock of a spot instance price spike.
  • Placement: We ensure that only users with short predicted runtimes are placed on spots. Longer runtime jobs are placed on less volatile on-demand instances. In practice, most jobs have runtimes less than a few minutes, because users typically break down long-running serial computation into smaller jobs to exploit maximum parallelism.

Finally, some users use our “Realtime Cores” service to run at higher levels of parallelism than our estimator would provide. In exchange for paying an hourly rate per core, they are given their own “job queue.” For instance, if a user purchases 200 realtime cores, we guarantee that 200 jobs will be processed in parallel. Many users only request this service for several hours a day. Unfortunately, spot instances deploy much slower than on-demand. This additional boot time prevents us from satisfying a real-time request with spot instances. Fortunately, many users’ real-time requests are issued periodically, making prediction possible. Spots are often cheap enough that it makes economic sense to satisfy a request we believe will occur in 15 minutes. If we are right, our costs may be reduced by 90%; even if the prediction was wrong 50% of the time, we’d still end up with lower average costs.


17. What cost savings do you achieve by using Spot Instances in your application? For example: How many instance-hours does your application use, how many are on Spot, and what is the total cost of running your Spot application? What would the total cost be if you were not using Spot Instances? What percent savings do you achieve?

Typical monthly consumption of our platform is 100,000 instance hours per month, with over 85% on spot instances resulting in savings of tens of thousands of dollars per month.

The flexibility of our platform allows us to recoup nearly all of the price difference between on-demand and spot instances. For instance, c1.xlarge spot instances are typically 85% cheaper than on-demand, meaning our steady-state costs are reduced by 85%.

In practice, because spot prices are not constant, we cannot capture all of the price differential. One loss is switching costs—moving from an expensive spot to an on-demand or moving from an on-demand to a cheap spot—where during the switch, we suffer lower effective utilization. Additionally, if an instance is “spot-terminated” while running computation, we must rerun the computation, potentially doubling our costs for that job. In practice, both of these issues are minor and our savings still hover near 65%.

There is a trade-off though between performance and cost savings. We intentionally do not capture some potential savings to increase customer performance. This, along with an example of the savings and performance advantage, is discussed in Q#18.


18. What performance benefit(s) does your Spot application achieve by using Spot Instances? Please describe. For example: Are you able to achieve shorter time to results because you can deploy more EC2 instances? If you’re running a simulation, does Spot enable you to execute more computational runs to improve the accuracy of your solution?

As mentioned in #16 and #17, spots let us accept lower utilization over an hourly interval to complete customer workloads faster. A practical example helps explain better:

Definition of core type: Each core we rent out is from a larger instance we’re renting from EC2. Different instances map to different “core types”. As an example, a “c2 core” represents 1 core of an c1.xlarge instance. Each c1.xlarge instance holds 8 (c2) cores.

If we have a user submit 10,000 5 minute c2 core jobs, the entire workload could theoretically be completed in 5 minutes. As we charge by job runtime ($0.13/c2-core-hour), our revenue would be:

10,000 c2 jobs*5 (minutes/job) *(hour / 60 minutes) * ($0.13/(c2*hours) = $108

If we launched enough instances to finish the workload over 60 minutes on on-demand instances, our costs would be:
10,000 c2 jobs*5 (minutes/job) *(hour / 60 minutes) * (1 c1.xlarge instance / 8 c2 jobs) * (0.66/c1.xlarge-hour) = $69

Under an 85% spot discount, our costs would be merely:
$69*(1-0.85) = $10.30

However, sometimes we prefer to increase our costs to give our users higher performance. As an example, we could complete this workload in 10 minutes (+ extra spot provisioning time) by running 5,000 jobs simultaneously. This requires:

5,000 c2 jobs * (1 c1.xlarge instance / 8 c2 jobs) = 625 c1.xlarge instances

At spot rates, this is still pretty cheap: $62. However, this level of performance would be impossible to realize with on-demand instances: It would cost $412, far more than our revenue.

Another source of performance benefit (and a trade-off over cost) is the earlier mentioned realtime prediction. To ensure a positive customer experience, we do not, due to slower provisioning time, request (new) spot instances to satisfy a realtime request; rather, if we have insufficient capacity, we launch on-demand instances (which can later be replaced by spots). However, the low cost of spots allows us to act on realtime request predictions (#16). A correct prediction not only lowers our costs (using spots rather than on-demand), but also ensures the user’s realtime request is satisfied instantly (rather than waiting the 5 minutes it typically takes to deploy our worker instances).


19. What computational scale have you been able to achieve with your Spot application? For example: What is the most number of concurrent instances you have been able to run? Does your application run across many regions and instance types? How many instance-hours does it (did it) take to run your application?

Our application extensively makes use of being in a “cloud” environment; we are constantly requesting and terminating instances based on user demand.

As mentioned in Q#15, our workers operate on c1.xlarge, m2.xlarge, t1.micro, and cc2.8xlarge instances. While we operate solely in the US East region, we utilize all availability zones.

Our platform is theoretically unbounded in the number of concurrent instances it supports. Peak customer usage has required provisioning over 1,000 instances.

New Base Environment — Ubuntu Precise

October 24th, 2012 by Ken Elkabany

Since its introduction, Environments have become a staple feature of the PiCloud Platform, enabling users to deploy custom libraries and binaries. It was our first step outside of the Python box.

Up until now, we’ve offered two base environments that you could customize, or use directly:

  1. Ubuntu 10.10 Maverick for Python 2.6
  2. Ubuntu 11.04 Natty for Python 2.7

Enter Ubuntu Precise 12.04

Our latest environment is pre-configured with many of the latest libraries, making it easier than ever to move your computation to the cloud. Here are some of the notable packages:

  • NumPy 1.6.2
  • SciPy 0.11
  • Pandas 0.9.0
  • Scikits Learn 0.8.1
  • OpenCV 2.4.2
  • Java 7
  • R 2.14.1
  • Ruby 1.9.1
  • PHP 5.3.10

Click here for a list of all contents.

How do I use the new Base?

To maintain backwards compatibility with users dependent on our Natty base, by default all Python 2.7 jobs still use Natty. To use Precise, specify the environment of a job as ‘base/precise’. In Python:, _env='base/precise')

In the shell:

$ picloud exec -e base/precise program

Of course, you can create an environment based off of the Precise base, and use that too.

Need Another Base?

If Precise isn’t enough for you, let us know what other distribution of Linux you’d like to see us support in the comments.

Real-time Data Feed for Jobs and More!

September 18th, 2012 by Ken Elkabany

We’ve just completed a major overhaul of the job dashboard. The overhaul marks a fundamental shift in our belief of what information you should be able to see about your job. We had originally striven for minimalism, revealing only what we thought was necessary: standard output & error, exception traceback, and profile. After all, if software ran bug free, you wouldn’t need anything else. But as our users kept reminding us with support tickets, when things aren’t working, there is exceptional value to under-the-hood data.

This post covers the new visibility we offer into your computation. You can see it in action by running a new job, and viewing it through the Job Dashboard.

Count of Memory Allocation Failures

We now report the number of times a memory allocation was requested, but failed, under the “MemAlloc Fails” column.

If you haven’t run out of memory on PiCloud before, the count may not seem all that important. But up until now, users would generally get a cryptic “None” Exception when they ran out of memory, indicating that their process had unexpectedly died. The reason is because most programs don’t handle out of memory errors gracefully, typically segfaulting instead. Now if your program crashes, it’s easy to check for allocation failures, the most likely culprit.

Once you know you’re running out of memory, you can take advantage of our other features. Try using a core type with more RAM, such as our f2 core, or take advantage of our new multicore support.

CPU Usage

A common question is how effectively a job is utilizing a CPU. If your job is supposed to be crunching numbers, but is only using the CPU 30% of its allotted time, then you probably have an unsuspecting bottleneck. Before this update, there was really no way to know, other than profiling the job locally.

Now, you can access three different views of a job’s CPU usage to give you maximum visibility.

Total Usage

Runtime refers to wall-clock time. CPU Time is divided into two categories based on whether time is being spent in user or kernel space. With the data above, we can determine that the CPU was being utilized (7613.14+280.85)/8239.6 = 95.8% of the time. But where’d the other 4% go?

Usage over Time

Using the following chart, we can see what happened.

This hypothetical job was at 99%+ utilization for the majority of the time. However, it spent the first 500 seconds loading data from external sources before crunching it. Depending on your job, a graph like this might look ideal, or it might be an impetus to reduce the data loading time.

The chart is generated by taking a snapshot of a job’s CPU Time every 30 seconds.

Realtime Usage

If you select a job from the Job Dashboard, while it is processing, you’ll be able to see a realtime graph of CPU usage in the right hand column. The graph actively samples your running job every few seconds.

This soothes the anxiety some developers feel when their precious algorithms are all grown up and running on the cloud. Never again will they anxiously wonder, “how’s my job doing?”

Memory Usage

We offer a similar set of views for a job’s memory usage, as we do for its CPU usage. While swap usage is shown, no job can currently use swap.

Peak Usage

Peak usage is the most amount of memory that was ever allocated for the job. Peak usage is viewable while the job is running, or after it has completed.

Current Usage

Current usage is the amount of memory currently being used by the job. Current usage can only be seen for a job that is currently being processed.

Usage over Time

Extending the example of the hypothetical job from the CPU Usage section, we can confirm that the first 500 seconds were spent loading data into memory.

Realtime Usage

Not shown for brevity. Just like the Realtime CPU Usage chart, you can see the memory usage of a job, while it is processing.

Python Logging

The Python logging module is popular for its simplicity and extensibility. Until now, users have had to output their loggers to standard output or error to have their messages retained by PiCloud. We now automatically grab your logger messages, and store them separately for later viewing.

Pi Log

As your job runs, our system sometimes has comments for you. For example, if you spawn subprocesses, it will notify you that you cannot take advantage of our persistent processes optimization. Before, we used to write these messages to standard error, but that unnecessarily worried some users, and others complained that it polluted their output. Now, we have a separate section specifically for messages from PiCloud.

System Log

Since more users are using PiCloud for non-Python computation, we have decided to expose the Syslog associated with a job. If your job is segfaulting, and there are no indications why, this is the place to look.

In the Works

We plan to offer metrics for disk, and network usage.


Need more visibility? Let us know by leaving a comment!

Case Study: Aggregating Daily Content from 500+ Sites

September 13th, 2012 by Ken Elkabany

We know that PiCloud is primarily associated with scientific workloads, and high-performance computing. So it may be surprising to hear that many of our earliest adopters were in fact web companies who aggregate content from all over the web. In fact, the s1 (scraping) core was in direct response to the needs of this class of users.

Zinc.TV (Division of TiVo) signed up only a couple of months after our first release. They’ve built an awesome service they call your “Internet Television Dashboard.” It’s a central location where you can search for any TV show or movie, and it’ll show you all the online sources–both free and paid–that offer it. They even have a “leanback” mode for when you aren’t feeling particularly decisive.

Zinc.TV maintains its comprehensive show catalogue by regularly scraping over 500 web properties using PiCloud. They take advantage of PiCloud’s easy parallelism for aggregating and curating data, and our interoperability with other clouds; Zinc.TV stores all their data in their own databases hosted on Amazon Web Services directly from PiCloud.

Gary Rose, co-founder and head of R&D at Zinc.TV, had the following kind words to share:

“PiCloud has been an extremely valuable partner for the growth of Zinc. We estimate that it cuts our operational costs for managing the infrastructure by over 50%, and allows us to release new sites in a fraction of the time.”

Gary highlights what we agree to be the core benefit of PiCloud: drastically reducing the man hours that go into “behind the scenes” work such as creating a robust, scalable distributed processing system, thereby increasing the time for unique, interesting work. We’re popular with researchers because we automate clusters so they can focus on their science. The same principle applies to all types of technological endeavors reliant on computational power.

You can download the full Zinc.TV case study, or view it in your browser below.

Introducing Multicore Support

August 31st, 2012 by Ken Elkabany

Up until today, each job run on PiCloud has been limited to using only a single core. For those familiar with the Python global interpreter lock (GIL), this may not seem like a big deal at first. But as our users have let us know, the limitation is acute for the following reasons:

  1. Many performance-focused Python libraries including numpy release the GIL whenever possible, which means that even Python programs can leverage multiple cores.
  2. With the release of Environments, many of our users are running non-Python multithreaded programs. Some of those can use as many cores as we can throw at it.
  3. The most RAM a single core has is 8GB (m1). Up until now, a single job couldn’t break this limit. But now, you can pool multiple cores together to get access to more RAM.

How do I use it?

All you have to do is use the _cores keyword argument.

# uses 4 cores
job_id =, _type='f2', _cores=4)

# works the same for map jobs
job_ids =, datapoints, _type='f2', _cores=4)

Each job gets 4 f2 cores of processing power, and 14.8GB (4 cores x 3.7GB per f2 core) of RAM. We use the f2 core because as the next section shows, the c1 core, which is default, does not support the new multicore feature.

How many cores per job?

The number depends on the type of core you select.

Core Type Supported Multiples
c1 1 core
c2 1, 2, 4 or 8 cores
f2 1, 2, 4, 8, or 16 cores
m1 1 or 2 cores
s1 1 core

How much?

Per our pricing page, a job using a single f2 core would cost $0.22/hour. A job using two f2 cores would cost $0.44/hour. In other words, the cost per core has stayed the same, and there are no additional fees. You’re still charged by the millisecond.


Multicore not enough for you? Let us know by leaving a comment.

Introducing the f2 Core – Our Fastest Core Yet!

June 15th, 2012 by Ken Elkabany

If you’ve been craving more speed, you’ll want to give our brand new f2 core a spin!

How fast?

The f2 core is rated at a whopping 5.5 compute units, which is 69% more than our previous leader, the m1 core with 3.25 compute units.

How about memory?

f2 cores have access to 3.7 GBs of RAM, which is a much-needed middle ground between the 800 MBs available in a c2 core, and the 8 GBs available in an m1 core.

How do I use it?

All you’ll need to do is set the _type keyword., _type='f2')

How much?

At $0.22/core/hour, it’s got 120% more compute units than a c2 core, but only a 69% price premium.


With our collection of five core types, there’s something for everyone! If you have ideas for another core type, or want to tell us how important a core with a GPU is to you, please leave a comment.

Happy Pi Day!

March 14th, 2012 by Ken Elkabany

π is dear to our hearts. Not only is it our favorite transcendental number, and makes up the first syllable of our company name, but it is also a homophone of the first syllable of our favorite programming language. So to celebrate this glorious day, we’re giving out free core hours.

All you have to do is send us a function that can calculate π to arbitrary digits, and runs on PiCloud. We’ll be awarding three users 100, 200, and 300 core hours based on how awesome we think their solutions are.

Need inspiration? Here’s a monte carlo method for calculating π in our documentation. Also, the Wikipedia article for Numerical Approxiations of π should come in handy.

Send your submissions to [the first 314 digits of π starting with 3 and without the .] Happy coding!

Improved Pricing Scheme for Realtime Cores

February 29th, 2012 by Ken Elkabany

If you’ve been using realtime cores to leverage hundreds of cores in parallel, this is the cost-reducing update you’ve been waiting for. We’ve implemented the pricing change outlined in our previous post.

What’s changed?

At first glance, our realtime core prices look like they’ve been bumped up significantly. A realtime c1 core used to be priced at $0.015/hour, and now it’s $0.03/hour. Don’t be alarmed! The difference is that the old price was an additional charge added on top of your computation bill. The new price is not a charge, but an hourly minimum.

A simple example

For a single c1 realtime core, if you use at least $0.03 (the hourly minimum) of computation in a single hour, you do not pay anything extra by having the realtime core. We reserved a core for you, and since you put it to good use, you get it for free. If you don’t use as much as $0.03 of computation in the hour, we adjust your computation bill so that you paid a total of $0.03. In other words, in exchange for us reserving a core for you, you agree to pay at least the hourly minimum whether it’s by the amount of computation you run, or by an adjusted bill.

In our old scheme, if you ran $0.03 worth of computation, you would pay $0.03 + $0.015 = $0.045. In this case, our new scheme saves you 33% of your bill!

Why is this great?

If you’ve been using hundreds of realtime cores on PiCloud, you understand the pain that this solves. Instead of paying $0.015 per core per hour on top of your computation bill, you can now reserve hundreds of realtime cores for your parallel processing needs with potentially no additional fee.

How much do I need to utilize my cores for them to be free?

Just take the ratio between the core type’s realtime hourly minimum, and the standard price. For a c1 core, that’s $0.03/$0.05 = 60%. In other words, you need to be utilizing the core for 60% (36 minutes) of the hour to hit the minimum hourly bill.

How do I control which jobs are scheduled to realtime cores?

We automatically schedule your jobs to the appropriate cores. In fact, if we have slack capacity, you may utilize more cores in parallel than the number of realtimes cores you have, and those extra cores will still count towards your hourly minimum.

How will this change be phased in?

For this month, and the month of March, users who use realtime will be given the cheaper of the old and new pricing schemes.