Intermediate Vim: Macros

Posted on Sun 24 April 2016

(I previously posted on another Vim feature, sessions, which you can find here).

Macros have been the bane of my life in Vim for a little while. I didn't really know what they did, so I never used them, but sometimes I'd mistype :q as q and Vim would print 'recording' at the bottom and I'd panic slightly. (If you didn't know that that 'recording' message was part of the macro feature, well, now you do!).

So, when I decided to learn some more advanced features of Vim, macros were one of the things I picked. As it turns out, they're actually pretty easy:

  • hit q and another character to start recording to a buffer identified by that character (I normally use qq for ease, but if I wanted multiple macros, I could store one with qq, one with qw, etc.)
  • type the sequence of commands you want to record - starting or finishing it with commands like 0 or j (start of line/move down a line) is good, so that after your macro runs, you're ready to run it again on the next line
  • type q again to stop recording
  • now type @ and the name of your macro buffer (e.g. @q if you started with qq, @w if you started with qw) to replay your macro
  • if you want to re-run the last-used macro, @@ does that

I recently wanted to add C++ const specifiers to a lot of function definitions, which was more fun to automate with a macro than do by hand - qqA const<Esc>q defined a macro that appended ' const' to the end of the current line, and I could just hit @@ on any line where I wanted that to happen.


Is my load evenly distributed?

Posted on Sat 16 April 2016

I recently had a log file full of timestamped data, and I needed to analyse it to see how evenly distributed it was. Turns out this is surprisingly easy with standard GNU command-line tools.

The data looked like this:

15-04-2016 11:01:02.897 UTC 200 GET /foobar 0.000048 seconds
15-04-2016 11:01:02.905 UTC 200 GET /foobar 0.000492 seconds
15-04-2016 11:01:03.031 UTC 200 GET /foobar 0.000032 seconds
15-04-2016 11:01:03.034 UTC 200 PUT /foobar 0.001252 seconds
15-04-2016 11:01:03.340 UTC 200 GET /foobar 0.000041 seconds
15-04-2016 11:01:03.344 UTC 200 GET /foobar 0.000668 seconds
15-04-2016 11:01:03.441 UTC 200 GET /foobar 0.000030 seconds
15-04-2016 11:01:03.446 UTC 200 PUT /foobar 0.002239 seconds
15-04-2016 11:01:03.750 UTC 200 PUT /foobar 0.005282 seconds
15-04-2016 11:01:04.657 UTC 200 GET /foobar 0.000056 seconds
...

So first, I piped it through cut to extract just the timestamp field:

$ cat ~/logfile.txt | cut -f2 -d " " 
11:01:02.897
11:01:02.905
11:01:03.031
11:01:03.034
11:01:03.340
11:01:03.344
11:01:03.441
11:01:03.446
11:01:03.750
11:01:04.657
...

And then through sed to delete the last two characters, grouping the data into 100ms windows:

$ cat ~/logfile.txt | cut -f2 -d " " | sed "s/..$//"
11:01:02.8
11:01:02.9
11:01:03.0
11:01:03.0
11:01:03.3
11:01:03.3
11:01:03.4
11:01:03.4
11:01:03.7
11:01:04.6
...

I then piped that into uniq -c, which removes duplicate lines and gives a count of how many duplicates there were:

$ cat ~/logfile.txt | cut -f2 -d " " | sed "s/..$//" | uniq -c
      1 11:01:02.8
      1 11:01:02.9
      2 11:01:03.0
      2 11:01:03.3
      2 11:01:03.4
      1 11:01:03.7
     16 11:01:04.6
      2 11:01:05.2
      2 11:01:05.4
      4 11:01:05.7
      1 11:01:06.0
      6 11:01:06.6
      2 11:01:07.0
      2 11:01:07.1
      2 11:01:07.4
      2 11:01:07.5
      1 11:01:07.8
      3 11:01:08.1
      3 11:01:08.3
      2 11:01:08.6
      2 11:01:08.9

and finally into sort -nr, which showed me the time windows with the highest number of requests:

$ cat ~/logfile.txt | cut -f2 -d " " | sed "s/..$//" | uniq -c | sort -nr
     16 11:01:04.6
      6 11:01:06.6
      4 11:01:05.7
      3 11:01:08.3
      3 11:01:08.1
      2 11:01:08.9
      2 11:01:08.6
      2 11:01:07.5
      2 11:01:07.4
      2 11:01:07.1
      2 11:01:07.0
      2 11:01:05.4
      2 11:01:05.2
      2 11:01:03.4
      2 11:01:03.3
      2 11:01:03.0
      1 11:01:07.8
      1 11:01:06.0
      1 11:01:03.7
      1 11:01:02.9
      1 11:01:02.8

Picking out this data helped me prove that the incoming data wasn't evenly distributed - there were some short bursts of traffic - and so I was able to start figuring out where the load spikes were coming from, rather than debugging my (working) overload control algorithm. Hurrah!


C++ static analysis

Posted on Sun 10 April 2016

Recently, I've been looking at static analysis for some of my C++ codebases - tools that will warn about possible errors in my code without needing to run the code. I've been looking, in particular, for things that I can add as a build step - that is, cases where I can trust the static analyser to be right and fail the build if it fails, rather than fuzzier checks where you need human judgement to double-check that the analysis makes sense.

I've now got a reasonably good feel for what the available tools are and how to run them - so I'm writing the blog post I'd have liked to read when I started looking at this stuff.

Compiler warnings

If you're interested in static analysis, the first thing you should do is make sure you're making full use of your compiler warnings - with the -Wall flagat least, and maybe -Wextra. On Clang, my preferred compiler, -Wextra includes warnings about signed/unsigned integer comparisons and warnings about unused function parameters.

Switching between Clang and GCC is a good idea to ensure higher quality, as different compilers warn on different things - when compiling some of my codebases with Clang 3.8 (after previously using GCC 4.8), I got new warnings about unused private member variables and using ::abs instead of std::abs. In the other direction, GCC includes -Wsign-compare in -Wall, whereas Clang only has it in -Wextra.

cppcheck

http://cppcheck.sourceforge.net/

cppcheck was the first static analysis tool I tried, and it's relatively easy to get started with - it's in most GNU/Linux distributions' packaging systems, for example.

Running it over my codebase gave me:

  • a warning about using vector.empty() where I meant vector.clear()
  • performance suggestions, like passing const arguments by reference and using ++x not x++
  • warnings about using C-style casts, which are less safe than the C++ static_cast, const_cast and reinterpret_cast
  • warnings about unused local variables
  • warnings about unused functions
  • warnings about classes that included pointers but didn't have copy constructors

I did see instances where it failed to parse modern C++ correctly - for example, I saw it get confused by C++ map initialization syntax - it parsed:

std::map<std::string, uint32_t> {{"FOO" + std::to_string(x), 1}};

and reported "(warning) Redundant code: Found a statement that begins with string constant."

This seems to have been fixed between 1.61 (the version on Ubuntu Trusty) and 1.70 (the version on Fedora), though.

The plugin architecture looks quite friendly - you can add new rules just by writing an XML file with a regular expression, whereas adding checks to Clang-based tools requires writing code.

clang-tidy

http://clang.llvm.org/extra/clang-tidy/

clang-tidy is based on Clang, and uses the Clang library to parse code before analysis - this makes it less prone to parsing errors like I saw with cppcheck, because it can correctly parse anything which Clang can correctly compile.

I found it a bit hard to get started with - just running clang-tidy file.cpp complained about being unable to find a compilation database. Fortunately I found http://eli.thegreenplace.net/2014/05/21/compilation-databases-for-clang-based-tools, which explained how this worked and the fix (appending -- and then any compiler arguments I needed).

Running it over my codebase pointed out:

  • suggestions of better ways to do things in C++11 (the clang-modernize tool has been integrated into clang-tidy) like:
  • nullptr (which can avoid bugs where the type system treats NULL as an int)
  • the 'override' annotation (which can avoid bugs where you think you've overriden a virtual function, but have made a typo and so declared a new function)
  • range-based for loops (which are unlikely to avoid bugs, but which make code more readable, as you're not dealing directly with iterators)
  • differences between parameter names in the header file and the source file (a great feature - this is annoying and difficult to spot manually)

(I'd already run cppcheck over this codebase, and fixed those warnings, so clang-tidy almost certainly detects some or all of the things that cppcheck does.)

oclint

http://oclint.org/

Like clang-tidy, this is based on Clang, and needs the same -- argument when invoking it.

I found its checks and warnings to need a bit too much human judgement for my taste - it checks things like variable name length, function complexity and so on. But whereas it's difficult to say "ah, it doesn't make sense to use nullptr here because...", it is possible to have code where having a long variable name, or a long function, is actually the best way to write it (e.g. because of the inherent complexity of what you're trying to do).

That said, it might be worth integrating a limited set of checks into a build - for example, it's reasonable to say 'you should never have 1-character variable names - even loop indexes should be ii or jj for easy searching", and oclint can enforce that.

Copy-Paste Detector

http://pmd.sourceforge.net/pmd-4.2.5/cpd.html

CPD finds duplicated sections of code. It worked pretty well, but like oclint this didn't feel like a hard-and-fast check - duplicated code can sometimes be the best option, perhaps because two cases differ in subtle ways that can't be nicely factored out. (As an example, I think I tried too hard to limit duplication when porting epoll support to SIPp - the resulting set of #ifdefs is probably less maintainable than having two separate-but-similar functions would have been.)

This might still be a useful tool to regularly run over a codebase, though - both to maintain awareness of what code is duplicated 9so if you make a change, you make it in both places) and to check that the number of duplicated sections isn't growing sharply.

Clang Thread Safety Analysis

http://clang.llvm.org/docs/ThreadSafetyAnalysis.html

I didn't actually test this, but it looks so cool I couldn't leave it out of a static analysis blog post.

This adds a series of annotations to Clang that let you express "this variable is guarded by this mutex", "this mutex must be acquired after that mutex", and so on (although ACQUIRED_BEFORE and ACQUIRED_AFTER aren't implemented yet). Clang can then statically check this at compile'time, to verify that your locks really are all taken in the right order, that nothing is accessed without the proper lock, and so on.

This was a bit more involved than what I was trying to do, though - I wanted to just analyse my existing codebases cheaply, without hving to modify the code to help the analysis. But if I'm writing some complex threaded code from scratch in the future, I might take another look at this tool to help verify its correctness.

(Overall disclaimer on static analysis: obviously you should also be looking for software defects by running the code - for example, through a test suite - but

  • as Dijkstra said, testing can only confirm the presence of bugs, not their absence, and static analysis can help spot issues in cases that your test suite doesn't cover
  • some of the issues static analysis picks up are not just bugs you can find in unit testing - these tools also give performance and readability guidance
  • the earlier you find an issue, the cheaper it is to fix - this is the logic usually used to justify designs and unit tests, but it also applies here, since it's cheaper to fix an issue if you spot it as soon as you write the code, rather than after you also write the unit test that hits the error case)

Clang vs. GCC - code coverage checking

Posted on Sat 02 April 2016

Recently, I've been compiling some of the codebases I work on under Clang 3.8 instead of GCC 4.8, to take advantage of better compiler warnings and faster compilation speed (I've seen a 33% speedup, though I haven't tested this rigorously). Today, I got to grips with llvm-cov-3.8, and checked the coverage of my Clang-compiled test suite - and saw coverage failures, on a test suite I know has 100% coverage when compiled under GCC. Some of the uncovered lines were very odd - a closing brace, for example. What's going on?

The section of code in question looked something like this (deliberately reduced to form a minimal example):

int Foo::xyz()
{
  std::string my_ip = "10.0.0.2";
  int index = 0;
  for (std::vector<std::string>::iterator it = replicas.begin();
                                          it != replicas.end();
                                          ++it, ++index) // Clang reports this line as uncovered...
  {
    if (*it == my_ip)
    {
      break;
    }
  } // ...and also this line. ???

  return 0;
}

I foolishly assumed some kind of Clang/llvm-cov bug, perhaps something to do with using the comma operator in a for loop (as other for loops without the comma operator didn't have this coverage issue), and started trying to create a minimal example so I could report a bug. I didn't have any luck with this - until I tweaked my minimal example so that my_ip matched the first value in the replicas list, at which point the problem reproduced.

Aha! There is a genuine coverage bug here, which Clang spotted and GCC didn't:

  • my test suite always set up the input so that my_ip was the first item in replicas
  • so we always broke out of the loop - we never exited the loop normally - and I think this is what the uncovered } is trying to tell me
  • and we always broke out on the first iteration, so the ++it, ++index code never ran (which is why that line is uncovered)

Good job, Clang!

In GCC's defense, using the C++11 range-based for loops makes you write the code like this:

int Foo::xyz()
{
  std::string my_ip = "10.0.0.2";
  int index = 0;
  for (std::string& it: replicas)
  {
    if (it == my_ip)
    {
      break;
    }
    ++index;
  }

  return 0;
}

and in that case, both GCC and Clang can spot that the ++index; line is uncovered.

So if you want to make your code coverage checking more robust, switch to Clang or use more C++11 features - or both.

Update:

notanote on Hacker News pointed out that GCC 4.8 is much older than Clang 3.8 (GCC 4.8.0 was released in March 2013, GCC 4.8.4 - my version - in Deceber 2014, and Clang 3.8 in March 2016), so it would be fairer to compare against a recent GCC. The Ubuntu Toolchain PPA doesn't have the in-progress GCC 6 available for Ubuntu Trusty, just for Xenial - but it does have GCC 5.3. I installed that and checked coverage, and it still doesn't report any uncovered lines on that test. (I've uploaded the .gcov output files for GCC 4.8.4, GCC 5.3 and Clang 3.8, if you're interested.)

GCC-5 is more of a pain to install than Clang 3.8 - it brings in an updated libstdc++ which defaults to a new ABI, which causes compatibility issues when deploying to other Ubuntu Trusty systems - which is why I don't use a non-default GCC.


Intermediate Vim: Sessions

Posted on Sun 27 March 2016

I've been trying to improve my skill with Vim recently - I'm definitely nowhere near an expert, but I've been trying to move beyond just the basic "read files/write files/use regexps/install plugins" to understand and use more of Vim's built-in features. I'm planning to write blog posts on a couple of things, including macros, folding and ctags, but the first thing I've dug into and found useful are sessions.

I often have the experience where I'll be deep inside a set of files, then quit vim (perhaps to compile/test some code - which probably indicates that I'm not using :make as much as I should, but that's something for another day). Often, I'll want to get back to where I was - e.g. because my fix didn't work and I want to make a further tweak. Until now, my strategy has been "try to remember that I was at line 113, then run vim +113 file.cpp", but this doesn't always work well (especially if I've split my window or changed to a new file within Vim, so that my command-line history doesn't reflect what files I was working on). Sessions offer a much better way to do this!

The :mksession command creates a Vim session file - a file which preserves the entire state of your editing session (open windows, exact window split, positions within files, etc.). The only thing it doesn't do is save unsaved files, so you need to do that separately. This file defaults to being Session.vim in the current directory, but you can pass an argument to :mksession to change that (and use :mksession! to overwrite an existing file.)

I now have the following set up in my .vimrc:

cabbrev Q mksession! ~/default_session.vim \| xa

(which means that typing :Q saves my current session to ~/default_session.vim, then saves all windows and quits - this is using the pipe character | to join multiple commands together)

and the following Bash alias set up:

alias vims="vim -S ~/default_session.vim"

(which means that running vims restores the last session I saved with :Q)

Hopefully this is useful to others - I think I've covered the main information about Vim sessions, but the full documentation is here.