Archive for the ‘What's New’ Category

Crons in the Cloud!

Tuesday, August 10th, 2010

We’re pleased to announce the addition of crons to the PiCloud platform. A cron is a simple way to schedule a function to be run periodically. Time and dates are specified using the standard crontab format. Crons can be triggered as often as every minute, and there’s no limit to the number of functions you can register as crons. You will be billed for the amount of compute time consumed by the function triggered by your cron–just like if you were running a function on PiCloud. We have also added a tab to the web interface for managing crons.

Here’s how to register a cron:

# registers function ping_webserver with the label heartbeat
# this function could be checking whether a webservice is active
cloud.cron.register(ping_webserver, 'heartbeat', '* * * * *') # runs every minute

When you no longer need a cron, you can deregister it via our web interface or using the following:

# deregister function ping_webserver with the label heartbeat
cloud.cron.deregister('ping_webserver')

Here’s a function that runs once a day at noon.

# 19 is the 19th GMT hour, which translates to 12pm PDT (GMT -7)
cloud.cron.register(sudo_make_me_a_sandwich, 'lunch', '0 19 * * *')

That’s all it takes! See our documentation for the full cron specification.

Our Pricing Model

Friday, June 4th, 2010

Since the beginning of the year, we’ve been tweaking our pricing scheme to no avail. Just last month we published a new pricing page that we admitted wasn’t perfect, but was, we felt, as good as it would ever get. A key attribute of that system was the “parallelism limit,” the total number of cores we would devote to your computation at any one time. The higher the parallelism limit, the more we would charge per compute unit hour.

We quickly realized that our users weren’t fans of this. It’s roughly equivalent to Amazon charging a higher hourly rate for every additional instance booted up, which is a disincentive to users looking to use hundreds of cores of processing power. Some users cleverly created multiple accounts, each with the cheapest 10 compute unit parallelism limit, and used them in concert to run their computation with a very high parallelism limit.

We weren’t fans either. We had users choose their parallelism limit so we could provision enough servers ahead of time to respond quickly to their computational demands. That was good in theory, but it meant that we had to maintain a large pool of servers even when our users weren’t running functions. Wasted compute cycles meant that we had to raise all of our prices, even for users who didn’t need immediate response times.

New Model
Our solution was to drop the idea of a parallelism limit altogether.

Now, our vanilla service doesn’t guarantee when functions will begin processing. In the background, we’re adding our users’ functions to a fair-queuing scheduler. We estimate the amount of workload in the queue, and automatically scale our cluster as we see fit. Most functions don’t wait very long; you can see empirical data on our product page. If you’re looking for a cheap and effective batch-processing solution, this is it.

Real time compute units now serve a clear purpose. These are compute units that we reserve just for you. When you make a cloud.call(), your function will run immediately if you have any real time compute units available (not allocated to another one of your functions). If your real time compute units are fully utilized, then your function will wait until a real time compute unit becomes available, or if room exists in our fair-queuing system. This is the ideal solution for those who need real time response requirements, or simply want to accelerate their processing time. We charge a minimal amount ($0.015 per compute unit hour) to reserve real time units in hourly increments. This minimal cost exists to protect ourselves in case you don’t run any functions, since we’re reserving space on Amazon instances for you.

I hope this sheds light on why our pricing model has been in flux. Our team is genuinely happy with this latest pricing model, because it accurately structures the value we provide our customers. If you have any questions, thoughts, or concerns, we’d love to hear what you have to say in the comments.

Store your files with PiCloud!

Monday, May 3rd, 2010

One of the most frequent questions we get is “where do I put my data?” To this, we’ve always had the same answer: Anywhere you want. Unlike other platforms, we’ve never believed in locking in your data into our proprietary data store. Our users keep data in all sorts of different places (AWS, Rackspace, or on their local machines), and in all different forms (flat files, relational databases, and key stores). We don’t plan to change this, because we don’t believe we can provide the single best data storage solution to satisfy everyone’s needs. We’re big fans of using the correct tool for every problem.

So what is our new file storage solution? It’s a simple and easy way for our users to get their data on the cloud to be crunched by PiCloud. We don’t pretend that it’s the holy grail of data storage solutions, but it’s a solid answer for users who don’t already have a data store setup. If you don’t need it, you won’t be affected.

The module is included in our cloud library as cloud.files. Here’s the most basic way to use it:

cloud.files.put('data.txt') # stores data.txt on the cloud
cloud.files.get('data.txt') # saves data.txt onto your machine
cloud.files.getf('data.txt') # gets a stream of the contents of data.txt

See our documentation for the full specification and examples.

New Users, New Features, and PyCon!

Friday, February 19th, 2010

Wait no longer! We’ve opened up PiCloud to another batch of users today, and from now onward, we promise to accelerate the roll out of PiCloud to new users. For users, both new and old, I wanted to highlight some of the many changes we’ve made in the past month that haven’t necessarily been the most visible.

Variable Compute Units
We had customers asking us for more powerful CPUs, and so we’ve delivered. With a simple keyword argument change, you can now switch between using 1 Compute Unit (1-1.2 ghz Xeon) to 2.5 Compute Units (2.5-3ghz Xeon). Check it out (code):

cloud.call(cpu_intensive_func, _high_cpu=True) # uses 2.5 compute units

Profiler Option
While we’ve gotten great feedback for profiling functions that run on PiCloud, we’ve also received requests to have the ability to turn off the profiler. After all, the deterministic profiler does have overhead that scales with the number of function calls in a script. To turn off the profiler, it’s simply another keyword argument _profile.

cloud.call(foo, _profiler=False)

Drop in for multiprocessing
If you’re already using Python multiprocessing, but want to run your computation across our cluster, now you can. Check out our docs to see how.

cloud library is now open source
We told users before that the client library was not open sourced, because frankly, we didn’t believe it was stable enough to deserve the attention of developers in the community. We are now at that point, so the client library has been released with an LGPL license.

Inclusion in the Enthought Python Distribution (EPD)
EPD is ideal for scientists and engineers looking for an easy, standardized way to deploy a powerful set of scientific tools on their own computer or across a whole organization. As of the latest EPD release, 6.0, the cloud library is now included in the distribution. Welcome EPD customers!

Bug fixes
Having hundreds of users using our platform is the easiest way to expose all the nitty-gritty bugs and race conditions that are lurking in our system. We would like to thank our ever-growing community for the many bug reports and critical fixes we have had over the past month.

Lastly, our CTO, Aaron Staley, and I will be at PyCon this weekend. Hope to see you all there!