Posts Tagged ‘cron’

Running a Twitter Bot with Cloud-based Crons

Thursday, August 12th, 2010

Two days ago, we released our latest feature: Crons. In short, you’re able to register a function to be run periodically on PiCloud. Today, we wanted to give you an example of crons in action by making an automatic retweeter bot.

We’ve setup a twitter account, @picloudrt, that will automatically retweet any message that includes the word “picloud” within a minute of its posting. We accomplish this by running a cron every minute that uses the Twitter API to search for new “picloud” tweets. As with our previous video encoding tutorial, we’ll first demonstrate how to run the retweeter locally, and then show how to move it to the cloud.

What can I do with a retweeting bot?

Your retweeting bot has several uses. First, you can follow it to get a consolidated view of what people are saying about your company or product. Second, you can use it to keep a comprehensive history of tweets, rather than the temporally finite (~1.5 weeks) results provided by Twitter Search. Third, you can augment the code to filter tweets and report findings as you see fit. For example, rather than re-tweeting, you can use smtplib to alert yourself of new tweets via e-mail. This will get you functionality similar to that of a service like TweetBeep.

Prerequisite Libraries

cloud – Version >= 2.0.0 of our library. You’ll need to sign up (it’s free!) to download it.
tweepy – Twitter API for Python.

Local Version

We have a function called retweeter(), which does the following:

  1. Uses the tweepy library to search twitter for posts.
  2. Determines which tweets are new, and not from the retweeting account, picloudrt.
  3. Uses tweepy to retweet.

The details are best understood by examining the comments embedded in the code below.

Updated: retweeter() has been improved. The old method assumed that Twitter search would instantaneously display new tweets, when in fact the tweets can be delayed by over 10 seconds. Rather than using specific time intervals, we now use the last retweet as a marker for determining what tweets are new.

import tweepy
import datetime

username = 'picloudrt' # put your twitter handle here
password = 'XXXX' # put your password here
keyword = 'picloud' # the word we're tracking

def retweeter():
    """Searches for the picloud key term on twitter and retweets
    any new tweets tweeted since our last retweet."""

    # create api object (authentication needed for retweeting)
    auth = tweepy.BasicAuthHandler(username, password)
    api = tweepy.API(auth)

    # find the most recent tweet we've retweeted, so that
    # when we search for the latest tweets, we know only
    # to retweet messages that were created_after
    retweets = api.retweeted_by_me()
    if retweets:
        created_after = retweets[0].retweeted_status.created_at
    else:
        # if we've never retweeted before, then we're going to
        # retweet all msgs created after the 20th century, ie. all of them
        created_after = datetime.datetime(year=2000, month=1, day=1)

    # grab all tweets that include our keyword (default: picloud)
    tweets = api.search(keyword)
    # reverse them to get the oldest first
    tweets.reverse()
    for tweet in tweets:
        # if the tweet is new, and was not made from our account, retweet it
        if tweet.created_at > created_after and tweet.from_user != username:
            api.retweet(tweet.id)

To run the function on your local machine, you simply call it, retweeter().

Cloud Version

To run retweeter() periodically on PiCloud, you register it as a cron:

import cloud
cloud.cron.register(retweeter, 'picloud_retweeter', '* * * * *')

That’s it! Note that while you had to install tweepy locally, you did not have to install it on PiCloud. Our cloud library automatically extracts dependencies, such as tweepy, from your machine, and deploys them on PiCloud.

We labeled the newly registered cron ‘picloud_retweeter’; labels make it easy to refer to the cron in the web interface, and in other functions, eg. cloud.cron.deregister(). The expression, ‘* * * * *’, is the UNIX crontab way of saying that retweeter() should be run every minute. You can find more details on specifying the periodicity at the unix man page for crontab.

Here’s what the cron dashboard now shows:


Cron Dashboard

If you click on “View Jobs,” you’ll be taken to our jobs dashboard, which will automatically filter for jobs created by the cron. Jobs spawned by a cron are labeled with the cron’s label prefixed with ‘cron_’. In this case, the jobs are labeled as ‘cron_picloud_retweeter’.


Cron Dashboard

As you can see, your cron is creating a new job every minute. For testing purposes, you can manually run a cron at any time using the cron dashboard. You can also remove the cron using the dashboard, or using our library.

cloud.cron.deregister('picloud_retweeter')

Cost

Assuming that your custom twitter bot takes about a second to scrape and process data, we can estimate your monthly cost. The function will be run approximately (60 minutes) * (24 hours) = 1440 times a day, for a total of (1440 times a day)*(30 days)=43,200 times per month. If it takes one second to execute each time, that’s 43,200 compute seconds, or 43,200/(3,600 seconds per hour) = 12 compute hours. The total cost is therefore (12 compute hours) * ($0.05 per compute hour) = $0.60. Compare that with the $5-$20/month charge for some Twitter alert services, or the $20-$68 price of bringing up an instance from Amazon or Rackspace, directly.

Conclusion: Why PiCloud should be your go-to Cron Artist

Easy: All you need is one line of code: cloud.cron.register().
Fire-and-forget: Once you’ve registered a cron, we’ll make sure it works until the end of time.
Scalable: If you have a lot of crons, we’ll automatically distribute them across multiple machines in our cluster.
Monitoring: Ever wonder how your cron is doing, but can’t find the logs? Would you like to know exactly when your script stopped working? Just check your job dashboard to see a full history of your computation for easy debugging.
Inexpensive: For basic usage, you could be charged less than a dollar per month!

Crons in the Cloud!

Tuesday, August 10th, 2010

We’re pleased to announce the addition of crons to the PiCloud platform. A cron is a simple way to schedule a function to be run periodically. Time and dates are specified using the standard crontab format. Crons can be triggered as often as every minute, and there’s no limit to the number of functions you can register as crons. You will be billed for the amount of compute time consumed by the function triggered by your cron–just like if you were running a function on PiCloud. We have also added a tab to the web interface for managing crons.

Here’s how to register a cron:

# registers function ping_webserver with the label heartbeat
# this function could be checking whether a webservice is active
cloud.cron.register(ping_webserver, 'heartbeat', '* * * * *') # runs every minute

When you no longer need a cron, you can deregister it via our web interface or using the following:

# deregister function ping_webserver with the label heartbeat
cloud.cron.deregister('ping_webserver')

Here’s a function that runs once a day at noon.

# 19 is the 19th GMT hour, which translates to 12pm PDT (GMT -7)
cloud.cron.register(sudo_make_me_a_sandwich, 'lunch', '0 19 * * *')

That’s all it takes! See our documentation for the full cron specification.