Podcasts on the command-line, in Rust

Posted on Sun 08 April 2018

I've been listening to more podcasts recently on my commute and at the gym, including New Rustacean and AB Testing. I currently use Clementine to manage my podcasts, but it's not perfect - the "Copy to device" menu option was greyed out this morning, so I had to copy podcast episodes on manually.

I've been looking for a small project to work on to practice Rust, and here I have a problem that needs solving - so I've spent some time today hacking together a simple text interface program in Rust to manage podcasts. The code is at https://github.com/rkday/podcast-manager.rs.

Here's a screenshot:

Screenshot

Currently, it can:

  • download and parse an RSS feed
  • show all podcast episodes as a list
  • allow the user to scroll through that list with Up and Down
  • show extra information about an episode in the siebar

Next steps:

  • download the MP3 to my MP3 player when I hit 'y'
  • allow multiple podcasts (currently, AB Testing is hardcoded)
  • fix the unhelpful delay at the start where it downloads the feed URL (a loading screen? caching?)

Longer-term futures:

  • support feed formats like Atom
  • support deleting podcasts off a device
  • maybe even support playback from the console?

Rust turned out to be a pretty reasonable language for this kind of thing - the https://github.com/fdehau/tui-rs crate is quite impressive, and the ecosystem is already far enough along for https://github.com/rust-syndication/rss to exist.

I did spot one bug in the wider Rust ecosystem when writing this - the AB Testing podcast's descriptions crashed the Rust textwrap library due to https://github.com/mgeisler/textwrap/issues/129 - but that's now fixed.


Learning Rust

Posted on Sat 04 November 2017 • Tagged with rust

I've been learning Rust recently, and it's the first new language (with new concepts, like borrowing) I've learnt for a while - probably the first since Clojure in 2013. I thought it was worth writing up exactly what I'm doing to learn it, in the hopes that it's useful for others learning Rust and new programming languages in general.

Code koans

https://github.com/crazymykl/rust-koans are a good, simple (but gradually increasing in complexity) introduction to Rust. I started learning Rust just by checking them out and working through them, looking up any topics I was unclear on in the Rust book (https://doc.rust-lang.org/book/second-edition/).

They start off with the basics, just dealing with true/false and integers, but cover borrowing and traits by the end. They're also all written as unit tests, which was useful in building up good unit testing habits in Rust - when I moved on to other projects I was alreadly familiar with the test syntax.

These aren't specific to Rust - see http://rubykoans.com/ or https://github.com/torbjoernk/CppKoans for examples in other languages.

Code katas

The next thing I did after finishing the koans was to start working through http://codekata.com/. I really like this:

  • I think writing decent amounts of actual code in a language is the best way to learn it.
  • With the code katas, I don't have to choose a project, worry about picking the right one, etc. - they're all just laid out for me
  • I don't run the risk of picking a project that's too large - they're pretty likely to be doable in an hour or so
  • They're clearly just practice for educating myself, so I can focus on learning (including, for example, refactoring to get the best style) rather than just accomplishing a project I want to do.

I'm tracking the katas I'm doing at https://github.com/rkday/rust-katas.

As well as http://codekata.com/, useful things to do here include:

awesome-rust

https://github.com/rust-unofficial/awesome-rust is a curated list of rust projects and libraries.

This provides:

  • a set of code to read and learn good practices from - for example, I'd like to read and understand all of https://github.com/alexcrichton/tar-rs at some point
  • useful libraries, which might inspire further projects (or mean you feel able to do a project in rust which you'd otherwise have done in Python or C++, because there's a Rust library for it)
  • awareness of the common libraries for the language (e.g. https://github.com/kbknapp/clap-rs for command-line argument parsing)

As with the koans, this isn't Rust-specific - see e.g. https://github.com/vinta/awesome-python.

Other

  • http://rustup.rs is worth mentioning here - it's the standard Rust installer
  • clippy - this is a tool installable with cargo install clippy, and is a linter for Rust - it helps spot things which are valid but bad style (like an explicit return x instead of just having x be the last expression in a function).
  • the Rust style guide at https://aturon.github.io/ is also useful for understanding idiomatic code
  • StackOverflow questions - sorting the Rust questions by number of votes (https://stackoverflow.com/questions/tagged/rust?sort=votes&pageSize=15) lets me read answers about the topics most people find the most confusing, and reading lots of alternative explanations of things like borrowing and ownership gives me a better picture of how this works.

Can testers prevent problems?

Posted on Mon 25 September 2017

I read recently that testers can't prevent problems, with the specific example that if a tester found an issue while reviewing a design or prototype before any code had been written, the mistake had still been made and caught by testing. I thought that in one sense this was pretty close to being true, and that in another sense it was completely wrong.

What I actually think is that testing (the activity) can't prevent problems - if testing is the evaluation of a product or artifact to see whether it's fit for purpose, then yes, the most it can do is judge that something isn't fit for purpose. It can't anticipate problems, or even make something fit for purpose.

However, testers (the people) can prevent problems, but what they're doing then isn't testing - it's technical leadership. In the example of issues found while reviewing a design or prototype, testing has found the issue, but I'd argue that what's really prevented the later problem is the culture of prototyping and collaboration - getting that design or prototype written, and circulating it for feedback rather than just forging ahead and writing the code, is what means that finding a problem (by testing) at that stage prevents further problems down the line. Whoever encouraged and fostered that culture of openness and early feedback was the one who really prevented the problem - without that, the testing would have come too late.

Similarly, educating the wider team (on what users and customers actually want, or just on common sources of bugs like Unicode handling) is a good way to prevent problems. This isn't something you could describe as testing - but even so, testers are probably very well-placed to do it, and it's almost certainly a good way to spend time to improve overall product quality.

There are a lot of problems that testing can't solve (in fact, I'm not sure it can actually solve any) - because testing is just about gathering information. It's what information you look for, and how you use that information to guide the rest of your work, that makes the real impact.


Lightweight presentations with represent

Posted on Sat 09 July 2016

I've been trying to find a good, lightweight way to write technical presentations for a while now. Specifically, I'm looking for one that avoids some of the common problems with Powerpoint:

  • no need to mess around endlessly with styles, fonts etc. - it should look good out of the box
  • easy to represent code snippets
  • possible to create, edit and run anywhere - I use a Linux laptop, but often transfer presentations to Windows laptops, so this is important

In the last week, I've discovered present and represent, the tools used for presentations about the Go programming language, and these are now my go-to presentation tools. (For an example presentation, look at Powered by Go.)

These tools work with .slide files describing a presentation - you can get the full description at https://godoc.org/golang.org/x/tools/present, but here's an example file:

Title of document
15:04 2 Jan 2006
<blank line>
Author Name

* Slide 1

Text goes here

- bullets
- more bullets

* Slide 2

   def fn():
       print("Here is some code")

After creating a .slide file, if you install Go, run go get golang.org/x/tools/cmd/present and then run present (or $GOPATH/bin/present if $GOPATH/bin isn't in your path), it will run a local web server listing the .slide files and rendering them as HTML.

Often, though, I want to have presentations for offline viewing (being able to copy them onto the conference organiser's laptop, for example), in which case I use represent (go get github.com/cmars/represent). With this, you can just run represent and it'll convert all the .slide files in the current directory into HTML presentations in a subdirectory called publish.

UltiSnips

In order to help me remember the format of .slide files (both heading and slide format), I've created a UltiSnips snippet for it, which means I can just type hhh<Tab> in a new .slide file and get a template:

~/.vim/UltiSnips/slide.snippets:

snippet hhh "Slide headings" b
${1:title}
${2:subtitle}
`date "+%e %b, %Y"`

Rob Day
rkd@rkd.me.uk
http://rkd.me.uk
@day_rk

* ${3:First slide title}

endsnippet

(I needed au BufNewFile,BufRead *.slide set filetype=slide in my .vimrc to make .slide a recognised file type).

Alternatives

I've also tried a couple of other lightweight slide formats in the past, but I preefer repesent. I've tried:

  • cleaver, which is NodeJS-based - I'm more of a systems programmer than a web app programmer, so I'm more comfortable having Go installed on my system than Node
  • remark, which involves embedding Markdown into a HTML file, which I find looks cluttered (I just want a file describing my slide, and to have the infrrastructure take care of surrounding stuff!)

The present/represent toolchain give me nice-looking slides, with a simple markup format, and don't require me to install much that I wouldn't have installed anyway - I suspect I'll use this as my preferred presentation tool for a while.


Using Python and Pandas to check HTTP load distribution

Posted on Mon 30 May 2016

Since writing my previous post on using standard Unix shell tools to analyse a HTTP access log, I've been thinking about how to do this with more standard data analysis tools. Using Pandas and Matplotlib, which are the main Python data libraries, you can get a bit more fine-grained control, and view the results graphically.

As a reminder, the question I'm trying to answer is "given this HTTP access log file, are there any time periods where my request rate is much higher than normal?".

First, I'll import the relevant libraries:

#!/usr/bin/python3
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt

Then parse the data out of the file into a Pandas Series data structure, indexed by timestamp (i.e. a time series):

def mysum(list):
    return 0 if len(list) == 0 else sum(list)

def parse_data(filename):
    events = {}

    with open(filename) as f:
        # Example line:
        # 14-04-2016 11:01:28.737 UTC 200 GET /url_a 0.000863 seconds
        for l in f.readlines():
            date, time, tz, status, method, path, latency, _ = l.split()
            dt = datetime.strptime(date + " " + time, "%d-%m-%Y %H:%M:%S.%f")
            # Build up a dictionary of timestamp -> number of events happening at that time
            events[dt] = 1

    time_series = pd.Series(events)

    # Count how many events were in each 50ms period
    time_series = time_series.resample('50L', how=mysum)
    return time_series


ts = parse_data('access.log')

This gives me a time series, indexed by the start of each 50ms window, containing the number of HTTP requests in that time window. An example snippet of it looks like this:

(Timestamp('2016-04-14 11:13:39.600000', offset='50L'), 1)
(Timestamp('2016-04-14 11:13:39.650000', offset='50L'), 2)
(Timestamp('2016-04-14 11:13:39.700000', offset='50L'), 5)
(Timestamp('2016-04-14 11:13:39.750000', offset='50L'), 1)
(Timestamp('2016-04-14 11:13:39.800000', offset='50L'), 8)

Unlike with Unix shell tools, which could only group things into 100ms or 10ms buckets, Python/Pandas allows us to sample at any frequency we want.

We can then graph that time series in a few lines of code:

def display_graph(time_series):
    fig = plt.figure()
    p = time_series.plot(style=".")
    fig.add_subplot(p)
    plt.show()

which produces this graph:

If we want a more methodical approach than just eyeballing the graph, Pandas maskes it easy to do outlier detection - here, I'm using Tukey's range test, and printing out any data which it views as 'far out'.

q1 = ts.quantile(0.25)
q3 = ts.quantile(0.75)

outlier_boundary = q3 + (3*(q3-q1))
print(ts[ts >= outlier_boundary])

You can also do more manipulation on this data structure - for example, the graph above makes it difficult to tell how the events are distributed at the lower levels (e.g. whether more 50ms time intervals have one event in than have two events in), because there are so many of them. But you can just create a new Pandas data structure to answer the question "how many time intervals have N events in?", and print or graph that:

buckets_by_number_of_events = collections.defaultdict(lambda: 0)
for value in ts:
    buckets_by_number_of_events[value] += 1

display_graph(pd.Series(buckets_by_number_of_events))
print(pd.Series(buckets_by_number_of_events))

which produces the following output, showing that 7,000+ 50ms time windows have 1 or 2 events in, but only 12 have 12 or more:

0     1520
1     3933
2     3678
3     2254
4     1898
5     1337
6      855
7      528
8      270
9      129
10      64
11      26
12       5
13       6
14       1

I've put the code from this post on Github, at https://github.com/rkday/pandas-clustering-analysis.