The logworm querying API

The counterpart of logworm's lw_log method is lw_query, which allows you to run complex queries against a particular log table.

For example, say you have an application that uses logworm to automatically keep a log of all HTTP requests, and you need a way to programatically monitor the HEROKU_QUEUE_DEPTH's ups and downs in the last 5 minutes and react to them (e.g. by starting or stopping dynos). All you have to do is ask logworm to query your application's web_log and give you the queue_size field, and then you can process the results:

$ cat monitor.rb
require "logworm_amqp"

res = lw_query(:web_log, :fields => :queue_size, :start => (Time.now - 300))
res["results"].each do |entry|
    queue_size = entry["queue_size"]
    # Process it
end

When you invoke the lw_query command, your application issues HTTP requests to the logworm server, which in turn executes the query and return its results in a JSON document. The document includes the following fields:

  • id : The id of the query (for internal use)
  • query_url : URL to GET information about the query (for internal use)
  • results_url : URL to GET the results for the query (for internal use)
  • created : The timestamp of the creation of the query
  • updated : The timestamp of the most recent update of the query and/or its results
  • expires : The timestamp until which the server will return cached results and not re-run the query
  • execution_time : The time in ms to run the query
  • results : An array of hashes, where each element corresponds to a log entry with its fields

Typically, you'd run a query and inspect the results element of the Hash that comes back, like in the example above. You can also inspect the administrative information, to get an idea for example of how long the query took to run, or when its results will expire:

require "logworm_amqp"

res = lw_query(:web_log, :fields => :queue_size, :start => (Time.now - 300))
puts "Query executed in #{res["execution_time"]} ms. Won't be refreshed until #{res["expires"]}."
    # ==> Query executed in 0.009762 ms. Won't be refreshed until 2010-04-23T07:39:57-07:00.
Query syntax

The lw_query method receives two arguments: the name of the log table you want to query, and the query itself. The query is specified as optional parameters among the following:

  • :fields : List of fields to retrieve. [String with a comma-separated list of fields (quoted or not), or an Array of Strings]
  • :aggregate_function: If you're doing aggregation, the name of the aggregation function. [String]
  • :aggregate_argument: If you're doing aggregation, the name of the aggregation field. [String]
  • :aggregate_group : If you're doing aggregation, the list of fields upon which you want to group by. [String with a comma-separated list of fields (quoted or not), or Array of Strings]
  • :conditions : List of conditions that the log entries must satisfy to be retrieved by the query. [String with comma-separated conditions (in MongoDB syntax), or Array of Strings]
  • :start : The first entry to take into account [Time, String, Integer (for year)]
  • :end : The last entry to take into account [Time, String, Integer (for year)]
  • :limit : The maximum number of entries to retrieve [String or Integer]

These arguments are combined to compose a valid logworm query, as defined in the logworm query language specification. Please, consult that document for a complete coverage of these options. Some examples are provided below:

require "logworm_amqp"

lw_query(:web_log)
    # ==> a list of all entries recorded in the log table, with all fields

lw_query(:web_log, :fields => "queue_size")
    # ==> a list of all queue_sizes recorded in the log table

lw_query(:web_log, :aggregate_function => "max", :aggregate_argument => "queue_size")
    # ==> the maximum queue size ever recorded in the log table

lw_query(:web_log, :aggregate_function => "max", :aggregate_argument => "queue_size",
         :conditions => '"hour(_ts_utc)":14')
    # ==> the maximum queue size ever witnessed in the log table between 2 and 3 PM of any day

lw_query(:web_log, :aggregate_function => "max", :aggregate_argument => "queue_size",
         :conditions => ['"hour(_ts_utc)":14', '"response_status":404'])
    # ==> the maximum queue size ever witnessed in the log table between 2 and 3 PM of any day,
    # ==> AND when the response status was 404.
    
Direct JSON queries

Another possibility is to pass in the query directly as a JSON document, written following the query language specification. Essentially, a query is a JSON document with the following fields (all optional):

  • fields : An array of the fields in the log entries that you're interested in
  • aggregate : An aggregation condition (inside a "group_by" sub-element) and method (in a "function" sub-element)
  • conditions : An set of conditions that the log entries must satisfied to be retrieved by the query
  • timeframe : A temporal frame ("start" and "end") to limit the query to
  • limit : The maximum number of entries to retrieve

Please, refer to the logworm query language specification for a complete coverage of these options. Some examples are provided below:

require "logworm_amqp"

lw_query(:web_log, '{}')
    # ==> a list of all entries recorded in the log table, with all fields

lw_query(:web_log, '{"fields": "queue_size"}')
    # ==> a list of all queue_sizes recorded in the log table

lw_query(:web_log, '{"aggregate":{"function":"max", "argument": "queue_size"}}')
    # ==> the maximum queue size ever recorded in the log table

lw_query(:web_log, '{"aggregate":{"function":"max", "argument": "queue_size"},
                     "conditions":{"hour(_ts_utc)":14}}')
    # ==> the maximum queue size ever witnessed in the log table between 2 and 3 PM of any day
Specifying your database

The lw_query method needs to access the proper configuration files to know the URL of the database against which it needs to perform the query. If you call lw_query from within an application that already uses logworm for logging, it will all work fine, since the application will be properly configured.

In some cases, however, you'll probably want to execute queries from outside of the application (e.g. from a script running as a cron job). In that case, you'll need to specify the configuration manually. First of all, go to your project's page on the main site and click on the "View Security Keys and Setup Instructions" link. You'll get to a page that lists the URL for that project, something like

logworm://Ub5sOstT9w:GZi0HciTVcoFHEoIZ7@db.logworm.com/OzO71hEvWYDmncbf3C/J7wq4X06MihhZgqDeB/

Then take that line and save it onto a file named .logworm in the same directory from where you're running your script:

$ cd <your script dir>
$ echo 'logworm://Ub5sOstT9w:GZi0HciTVcoFHEoIZ7@db.logworm.com/OzO71hEvWYDmncbf3C/J7wq4X06MihhZgqDeB/' > .logworm

That will ensure that logworm knows which database to talk to when running your query.

Query caching

By default, the result of running a query are cached on the server for 5 minutes. lw_query will tell you when the results will expire; you are free to run the query again, but if you do so within the caching window, you'll get exactly the same results.

There are ways to force a refresh of the results, or to indicate that the query should have a different TTL, but they are not yet accessible via the API.

Getting a list of your tables

You can also query the database simply to get a list of the log tables that you have created for your project, using the lw_list_logs method:

puts lw_list_logs.inspect
    # ==> ["tablename"=>"web_log", "rows"=>39, "last_write"=>"2010-04-22T21:48:19Z"}, 
           "tablename"=>"audit",   "rows"=>15, "last_write"=>"2010-04-22T20:11:45Z"}]

lw_list_logs returns an array of hashes, one for each log table, with the following information:

  • tablename : The name of the log table
  • rows : The total count of entries in the log table
  • last_write : The date/time of the last entry

Back to Logging arbitrary data or Back to Documentation