Cache API Requests

When making requests to an external service's API, some requests will frequently occur with the same parameters and return the same result. If we cache our request or response, we can reduce HTTP requests, which can improve performance and avoid hitting rate limits.

In some cases, the APICache Ruby gem is a good choice for caching the API responses (typically JSON) in any Moneta-compatible store such as Memcache or Redis.

In other cases, we don't need to cache the entire API response. If we cache only the URL requested, we can save space, avoid adding the operational overhead of Memcache or Redis, and avoid repeating the JSON parsing step.

In the following example, our app only needs a venue's name, latitude, longitude, and street address. We'll get the data from Foursquare's venue search API by category ("restaurant") and neighborhood ("The Mission").

Interface for the pattern

url = Foursquare.new(category, neighborhood).venue_search_url

ApiRequest.cache(url, Foursquare::CACHE_POLICY) do
  # make a GET to the URL
  # parse JSON
  # create or update venue name, lat, lon, street address
end

The first time this code runs for a venue search for restaurants in the Mission, ApiRequest will save the URL to the database and the block will be executed.

Whenever this runs again for a venue search for restaurants in the Mission, as long as it is within Foursquare's 30 day cache policy, the block won't be executed and expensive work will be avoided.

Implement the pattern

Here's the database migration:

class CreateApiRequests < ActiveRecord::Migration
  def change
    create_table :api_requests do |t|
      t.timestamps null: false
      t.text :url, null: false
    end

    add_index :api_requests, :url, unique: true
  end
end

The index improves performance of future lookups and enforces uniqueness of the URL.

Here's the ApiRequest model:

class ApiRequest < ActiveRecord::Base
  validates :url, presence: true, uniqueness: true

  def self.cache(url, cache_policy)
    find_or_initialize_by(url: url).cache(cache_policy) do
      if block_given?
        yield
      end
    end
  end

  def cache(cache_policy)
    if new_record? || updated_at < cache_policy.call
      update_attributes(updated_at: Time.zone.now)
      yield
    end
  end
end

This model is generic enough to be used with other APIs beyond Foursquare.

We inject a cache policy dependency that must respond to call. This allows us to pass in ActiveSupport's Numeric methods like 30.days.ago and have them execute at runtime.

Here's the Foursquare model:

class Foursquare
  BASE_URL = 'https://api.foursquare.com/v2/'
  CACHE_POLICY = lambda { 30.days.ago }

  attr_reader :category, :neighborhood

  def initialize(category, neighborhood)
    @category = category
    @neighborhood = neighborhood
  end

  def venue_search_url
    BASE_URL + 'venues/search?' + {
      categoryId: category_id,
      client_id: ENV['FOURSQUARE_CLIENT_ID'],
      client_secret: ENV['FOURSQUARE_CLIENT_SECRET'],
      limit: 50,
      ll: lat_lon,
      radius: 800,
      v: '20130118'
    }.to_query
  end

  private

  def category_id
    category.foursquare_id
  end

  def lat_lon
    "#{neighborhood.lat},#{neighborhood.lon}"
  end
end

In this example, we chose to build the URL ourselves, using ActiveSupport's Hash#to_query and pulling our client ID and secret in from environment variables.

Summary

Cache GET requests to an external API to improve performance and avoid hitting rate limits.

Compared to caching the JSON response, caching only the request URL saves storage space, avoids the operational overhead of Memcache or Redis, and avoids the JSON parsing step.