Archive by Author | Danny Brown

Braces

There are a few things that any seasoned Software Engineer will have had arguments discussions about. Windows vs Linux, Merge vs Rebase and inevitably code indentation style.

Just today Rob and I discussed whether we should diverge from the “One True Brace Style” (1TBS) decreed by the AirBnB JavaScript Guide toward the Stoustrup style of indentation. The only difference? Stroustrup does not use a “cuddled else”, instead else keywords are on their own line.

Does such a minor difference matter? I would argue it does. If being able to read code in a certain style increases a programmers productivity then that is no bad thing. However, this increase in productivity can be easily offset by having to change style when working in different codebases. Consistency is important.

To maintain consistency in the CS Blogs codebase every component would have to be updated. This would mean 100s of lines changing for style, reducing the effectiveness of git blame and muddying the commit history. Even if we were to do this eslint-config-airbnb was downloaded 399,657 times in the last month and I would wager most of the projects using it are sticking with the suggested 1TBS style. The advantage of having code that looks like the “standard” for an open source project is that it enables potential contributers to get involved that bit easier.

My theory about code style guidelines is that in a team of people n-1 people will be unhappy with at least part of the guideline. The only person that will be completely happy with them will be the person whom decided upon the rules. Programming is merely transcribing processes and thoughts into a language a computer can understand, and in that sense it is very personal and everyone is likely therefore to have strong feelings around how those thoughts look on screen.

As with so many things in Software Engineering, in many ways the style you choose doesn’t matter, but sticking to it and enforcing consistency does. This is why I am against changing the CS Blogs codebase even though I agree with Rob that the stoustrup style is nicer on the eye.

So, what can Rob do in this situation? The first option would be to just keep writing in the 1TBS style until it seems natural (this took me a few days of writing), however he could also use an automated code formatter to change how his local code looks and then automatically have it changed to the prescribed style before any commits to version control. Any mistakes by the automated code formatter would be caught by the ESLint commit hook.

Danny

Re-architecting CS Blogs

Where are we now?

As I mentioned in my previous post the current CS Blogs system grew out of a prototype. This meant that the requirements of the system were discovered in parallel with designing and implementing the system, resulting in the slightly weird architecture shown below.

Old CSBlogs Architecture

Old CSBlogs Architecture

I say its weird because the `web-app` component isn’t really just a web application — it’s also an API server for the android application (and in theory any other app) and includes all the business logic of the system.

The decision to use MongoDB was born partly out of the desire to be “JavaScript all the way down” and partly out of the desire to be using what was cool at the time. Unfortunately at the time of building the system MongoDB wasn’t supported as a SaaS offering on Microsoft Azure — where CS Blogs is currently hosted — so the database was hosted on MLab, making database calls more expensive in terms of networking time than necessary.

The `feed-aggregator` is a small node.js application ran as an Azure WebJob. It was hacked together in a few days and really only supports certain RSS and ATOM feeds. For example it works great for ATOM feeds using <description> tags, but not ones which use <content> tags. These oversights were made due to the software not being developed on much real data, essentially only my own feed, and the homogeneous nature of our users blogs — they’re mainly all Blogger or WordPress.com.

Despite the obvious and numerous flaws of the system it has worked well for the past year or so. However, when I wanted to add the concepts of organisation to the system — a way of seeing blogs only written by people at a certain company or university — I found the system to be a hodge-podge of technical debt, to the point where adding new features was going to take longer than developing a good, modular, expandable system. It was time to pay down the technical debt.

Requirements

The first thing to do was to determine what parts of the old system were good — and try to ensure that these poistive things didn’t regress in the new system –, which things were in need of improvement and what new features we should add in at the same time.

Fortunately CS Blogs does do a number of things well:

  • Short lead time — New posts appear in the system within 15 minutes
  • Good Web App — The front end works well both on desktop and on mobile and is very performant due to its lack of scripts. The work Rob did on the styling makes it a joy to use
  • Good Authentication — Users enjoy being able to use Github, Stack Exchange or WordPress to sign in and I enjoy not having to look after their passwords

A few things it could improve on are:

  • Support for a larger range of RSS and ATOM feeds  — ATOM support in particular isnt great in the current system
  • A lot of functionality only works in the web app — Any method which requires authentication, such as signing up to the system, isn’t avaliable through the API
  • Feed aggregation downloads every authors feed every 15 minutes, this is a lot of data and wouldn’t be economic to scale to 100s of users
  • Code maintainability is poor due to a complete lack of automated testing and linting

The additional user-facing features I want to implement are:

  • Notifications of new blog posts for CS Blogs applications on Android/iOS
  • Support for the aforementioned organisations feature

Designing a Distributed System

The system you can see in the diagram below was designed with the intention of fulfilling out of the requirements which I outlined above. You’ll notice the use of Amazon Web Services icons, as I have recently switched hosting from Azure to AWS. There are a enough reasons for this decision to warrent its own blog post, so I wont go into detail here.

newcsb-3

The new CS Blogs Architecture

In the new system all applications are treated as first class citizens, meaning there is nothing that the web application can do that any other application can’t. This is achieved by having all of the business logic, authentication and database interaction handled by the `api-server` — which is accessable by anthing that can make HTTPS:// requests and handle JSON responses.

This means that the mobile applications will be able to perform actions such as registering a user and editing their details, which they cannot under the current system. Another benefit to the mobile applications that isn’t shown on this diagram is that the `feed-downloader` calls Amazon SNS with information about how many new blog posts it has found every time it runs, this in turn is relayed to the mobile applications in the form of notifications.

Whereas in the old system we used MongoDB, I’ve opted to use PostgreSQL — via the Sequelize Node.js ORM — this time around. Some of the features I want to implement in the future, such as organisations, make more sense as relations rather than as document in my mind and the ecosystem of applicatons for interacting with SQL databases, and in partciular PostgreSQL, is much more mature than MongoDB.

The `feed-downloader` is portable, but contains an entry point so that it can be used as a infrastructureless AWS Lambda function (and I suppose this entry point would also work for the newly released Azure Function system). It’s a bit more clever than the old `feed-aggregator` in that it uses If-Modified-Since HTTP requests to only download and parse RSS or ATOM feeds that purport to have changed since the last time an aggregation was ran.

Implementation

The implementation of the `feed-downloader`, `api-server` and `web-app` components follows my guide to writing better quality Node.js applications. Node.js was chosen due to its abundance of good quality libraries, ease of interaction with JSON objects and the authors familarity with it in production scenarios.

ES2015 JavaScript features including the module system, string interpolation and destructuring are used throughout to aid readability of the system — therefore Babel is required for transpilation.

Just some of the feed-downloader tests

Just some of the feed-downloader tests

In order to meet the requirement of good maintainability the `feed downloader` was built using the test driven development methodology and currently has 99% test coverage. These tests use real data, feeds from actual CS Blogs authors, including feeds from Blogger, WordPress.com, WordPress.org, Ghost and Jekyll.

Theres still a lot to be done before before the new CS Blogs can be released, so why not hit up the contribution guide and get involved?

Danny

Writing better quality Node.js applications

In February last year I started writing my first Node.js App, csblogs.com, alongside Rob Crocombe. The application has run without too many issues since then, serving around 1100 unique visitors per month. However, because it started out as a prototype we didn’t follow many of the best practices we should have done, and its starting to show now we want to extend the application.

Since writing a more complicated application at Trainline — which provides an API to clients and consumes many Windows Services, RESTful APIs and Redis Caches itself — I’ve realised how important it is to be using good software engineering techniques and tools from the very beginning of development.

Whilst most of the concepts in this post are language independent, the example tools I explain are all geared towards Node.

The Basics

These first few things are obvious, and are things you should be doing in all your projects.

Source Control

Source Control everything, even prototypes. The minuscule amount of disk space a git repository will require, and the few seconds every so often to write a commit message will be incomparable to the amount of time you will save by being able to revert a change, or check when changes occurred.

I’ve taken to using Github’s variant of the git flow pattern in which branches are deployed to production and only merged into master once they have been tested “in the wild”. This means that whatever code is in master is always certified as working in production and can be relied upon to be rolled back to in the event a new branch doesn’t work as intended. I like the WordPress Calypso Branch Naming Scheme to make it easy to understand what is being developed in each branch.

Branches use the following naming conventions:

  • add/{something} — When you are adding a completely new feature
  • update/{something} — When you are iterating on an existing feature
  • fix/{something} — When you are fixing something broken in a feature
  • try/{something} — When you are trying out an idea and want feedback

If you don’t like that naming convention or it doesn’t suit your needs, thats fine. Choose a naming convention and stick with it. As with so many stylistic choices in Software Engineering it isn’t the style that is important but the uniformity and consistency it brings.

Documentation

There are few things as annoying when developing software as opening a repository to find no information in how to build the project, run tests against the project or what data and functions the code it contains exposes to its consumers. When developing your code you should try to keep your documentation up to date with at least this information:

  • How to build
  • How to run tests
  • How to deploy
  • Data and functions exposed

Things like how to run your build, test and deployment scripts shouldn’t be changing so often as to be a pain to keep up to date. The data and functions exposed however may change reasonably often, especially if you are iterating whilst developing an API, so in order to make that task easier I suggest you use something like Swagger.io.

Testing

Testing code is one of the things that, 5 years into learning about developing software, I’m still learning about and still eager to learn more about. Good quality testing can be the difference between a bug rearing its head in production and costing you thousands of pounds and it being caught, thought about, and fixed earlier in the development cycle. Automated testing also means that you can be confident that any changes you make won’t be causing regression bugs.

When writing a new class, I first sketch out the interface — e.g. the public constructors, functions and data that the class will expose.

class Train {
     constructor(name, britishRailCode) {

     }

     getTopSpeed() {

     }

     determineLocation() {

     }
}

Then I spend some time thinking about all the potential edge cases, as well as the ‘happy path‘ through each of the functions. For example, what should happen when an invalid BR code is provided to the constructor? What happens if GPS coordinates cannot be determine due to faulty hardware — or in the case of html5 geolocation lack of user permission — in the determineLocation() call? What data should I get back from each of these functions? Is there a timeout after which the function should return an error if it hasn’t completed?

Once I have an idea of the expected behaviour of the class I start writing test, in a Behaviour Driven Development way, using the Chai Assertion Library and Mocha Test Framework. There are many advantages to use BDD, one of my favourites is that you dont write names for your unit tests, you state the expected behaviour in a full sentence — this makes it so much easier to realise the intention of the test and makes it so that test code can, in some senses, document the application code. Another great attribute of BDD is that it allows you to think in terms of what you want your code to behave like, rather than implementation details.

Test Coverage

Test coverage is a highly discredited method of determining the quality of a set of unit tests. Covering every line of code doesn’t mean you have thought of every conceivable edge case and therefore doesn’t guarantee your code is free of defects — indeed no testing can, as testing only shows the existence of bugs, not their absence. However, at minimum you should be covering every line of code — and every branch. (Yes, you can have a branch of code that doesn’t include any lines — I leave working out how as an exercise for the reader)

Linting

JavaScript allows you to make decisions on many areas of syntax. To use semi-colons, or not to use semi-colons — that is just one of the questions. When writing code in a team its easier if all of the code is formatted in the same way, so you don’t waste time reformatting it to a way you prefer code to be written. One way of doing this is to have everyone memorise your projects coding conventions and hope they stick to them, a much better way is to use a linter which will warn the programmer if they break any of the projects rules — this ensures that everybody writes in the same style.

An additional benefit of linting is that, depending on which linter you use, you get static analysis of your code provided to you too. This means the linter can point out any variables you have defined, but haven’t later used, for example.

I personally prefer to use ESLint as it allows different rules to be configured through the use of plugins. This means that you can have a set of rules for React JSX code and a different, more suitable, set of rules for your Mocha tests. For the bulk of my application code I use the official Airbnb style guide ESLint plugin — I like Airbnb’s focus on using modern JavaScript constructs and having code be as explicit as possible. They also provide lint rules for React code.

Commit Hooks

Linting and Testing is all well and good, but you need to have the members of your team buy in to using it. And, even on a one man team, I often forget to run tests and linting before I commit code resulting in broken builds and ugly code being in the master branch of my respository.

 

Pre-commit hooks to the rescue!

Pre-commit hooks to the rescue!

Commit hooks can be used to ensure that your unit tests and linter are always ran before a developer can commit their code. This means they can’t forget to lint or unit test and actually saves time in the long run. I use the pre-commit package to provide this functionality. In combination with good unit tests and linting, pre-commit hooks can help ensure that the code in your repository is always working and readable. (Note: a developer can decide to skip hooks if they’re in a rush to develop a hot fix, but this should be avoided under normal working conditions)

Configuration

Configuration is an important part of any application. It can be as simple as wanting to be able to change which database you want to connect to when on your location machine vs which one you connect to in production, however you don’t want this information to get into the public domain!

I used to use JSON files for configuration. However, these could easily be accidentally committed to git and make the secrets they contain public knowledge. Recently I’ve opted to use Environmental Variables for all the reasons outlined by Twelve Factor. The dotenv module for Node.js makes environmental variables easy to change in development. In the CSBlogs applications I’ve been writing I provide a sample.env file which includes all the names of environmental variables developers should be setting to get the app working in their local environment.

So, there it is. A quick run-down of some very basic steps to give yourself a nice place to work in JavaScript world. Now get developing!

Danny

Graduating from York

IMG_0091

On January 23rd I graduated with an MSc in Advanced Computer Science from The University of York. It was a nice occasion to get together and be merry with the family, and of course for some photos in my cap and gown — this time they were grey and blue.

My parents took my degree home with them and hung it up next to my undergraduate degree and departmental award. I think they look nice together.

12656110_1522924184675043_1926747582_o

For some more nice pictures you can check out my dads blog post about it.

Danny

HackTrain 2.0

12304083_786742814805282_703787864428458423_o

It seems like a lifetime ago, but the first HackTrain event was actually only 9 months ago — It probably feels so long ago to me as so much has change in the interim, including me getting a job as a result of the first event.

When an opportunity came up to represent Trainline at the second HackTrain I jumped at it as I thought it would be nice to go full circle in a  sense, going back with a job in the industry, and simply because I loved the friendly competitive atmosphere of the hackathon.

HackTrain 2.0 was a much larger event than the original, with 3 trains of Hackers rather than 1. I was assigned to the Big Data Train, in which participants would focus on using the APIs made available by the rail industry,

I originally thought I would be mentoring teams and helping them get set up with any RailTech APIs I knew about, however I actually ended up participating in a team consisting of myself, two Computer Science students and an engineer from french national rail operator SNCF. We worked together to build a web application called TrainGuard.

Desktop view of TrainGuard

Desktop view of TrainGuard

TrainGuards tagline is Calmer, Drier journeys. This is because the aim of the application, which is written in Node.js, is to allow customers to make more informed choices about their journeys. When using travel booking services that are currently available you can see what time you’ll depart from your origin and what time you’re expected to get to your destination, but nothing is said about what will happen in between.

When we were conducting market research on our South West Trains service to Bournemouth a lot of people told us they disliked being on trains which were full of football fans, or loud people coming out from a gig. The small sample we took rated those experiences as being as bad or worse than being on a delayed train, and that to avoid such circumstances they would be willing to reschedule their booking. Weather factored into peoples travel decisions less, but was still important.

Mobile View of TrainGuard

Mobile View of TrainGuard

On match days, or other particularly busy periods of rail travel the Train Operating Companies (TOCs) would much rather that their load of passengers was more evenly distributed throughout the days services. This is because each minute a train is delayed on the British Rail Network the train operating company gets fined, in some cases hundreds or thousands of pounds per minute — depending on where the tardiness occurs and how many other train services are affected by it.

The idea therefore occurred to me that we could aid both the traveller and the TOCs by allowing passengers who wish to avoid such events a chance to reschedule their journey and change their tickets. Initially we started out with the idea of making this an API that ticket distributors, such as Trainline, could tap into — however, we realised that at a hackathon client facing applications are more likely to do well.

As a team we implemented a subsystem that could connect to the SilverRail travel planning API, which had been specially opened up to the public for HackTrain 2.0. This subsystem would accept a journey plan in the form of an origin station code, a destination station code and a departure time and return a list of stations comprising the complete route. Using this information we then checked for events within a given distance of each station en route on Eventful, which is itself an aggregation of many other event websites — using specific filters we could find only the events that we felt could cause disruption to the end user.

In a perfect world this information would be shown to the user at or before the time of booking, or if circumstances changed their could be alerted via a notification on their phone — however, for our prototype they would have to visit a webpage hosted by us. The webpage was written using semantic HTML5 and styled using CSS3, it was designed to be responsive and as you can see above it works well on both desktop and mobile, mobile being especially important of course as the act of travelling means you’re most likely to be on a mobile rather than a desktop when you check this page.

When designing the page I wanted to make it beautiful to look at, but also really easy to grok what the page was trying to tell you — therefore I went with a design that included Bootstrap Jumbotrons with parallax scrolling high quality image backgrounds.

As with all Hackathons the most nerve-racking, but equally exciting, part of the weekend was the pitching at the end. We felt we had a good product that was mutually beneficial for both the industry and its customers but we had to convince 2 panels of judges of that. As I mentioned before, the event this time round took place on 3 different trains, so in order to get to the final we had to have one of the top 3 pitches out of the 10 teams on our train.

Excitingly we did make it to the final, in which we pitched to judges from the rail industry. There were judges from SilverRail, Trainline, Great Western Trains and The Ministry of Transport. Unfortunately we didn’t come in the top 3 teams at then end, but we had a lot of fun over the weekend and were proud that we got to the final and were given an opportunity to tell the industry how we felt they could deliver Smarter Journeys — which is something Trainline is currently focused on.

Thanks to everyone at HackPartners who made the event possible, and all of the sponsors of the event for making it so fun and giving 120 of the best hackers a chance to develop much needed improvements for the Rail industry.

Danny

Centre for Computing History

A BBC Micro at the Centre for Computing History

Since September, Charlotte and I have lived about a minutes walk from the UK Computer Museum (which also seems to go by the name Centre for Computing History), but we’d never actually gone inside.

Today after my flying lesson we went to have a quick look. Whilst it made me feel old to see things I remember from childhood in a museum, such as a PlayStation; a Dreamcast and an Acorn PC, I really enjoyed the hour or so we spent there.

There were a few exhibits, including: the history of ARM, military computers and the history of Sinclair. The best thing about the museum was that you were encouraged to touch, and play with, any machine that was turned on. I hadn’t played on a BBC Micro before, but had heard a lot about them from lecturers and other techies from the era they were made. It was fun to figure out the classic GOTO 10 infinite loop, which you can see in the image above I put to good use informing the public of Charlotte’s lameness.

One of the nice things the Museum has is a classroom to get children interested in Computing. Its the kind of thing I know I would have enjoyed as a kid.

A display at the Centre for Computing History

A display at the Centre for Computing History

I recommend anyone who reads this blog to go to the museum if you’re in the Cambridge area, you’d probably enjoy it.

Danny

Tracking Moving Trains (with the minimum possible effort)

12106848_1494412047526257_4658382875925199724_n

Trains are, in many respects, wonderful machines. They’re simultaneously both the most romantic form of travel and the most grueling way to get to work every morning. One thing trains are not however, is up-to-date with modern technology.

Whilst you can follow the course, in real time, of a plane soaring at 36,000 feet above the ground all the way from San Francisco to London you may be surprised that Train Operating Companies (TOCs) lack the ability to pinpoint the locations of their trains between stations. In other words, they may know that your train is between Cambridge and Foxton but they wont know exactly where.

Not knowing exactly where Trains are doesn’t cause any satefy issues — Block Signalling has been used to safely manage the passage of trains by identifying which “block” of track they were in, without an exact location, since the 1850s  — however, it does mean that the decisions that control rooms make are being made on educated guesses of a trains exact position at any given time.

Control rooms are the brain room of the modern railway and make decisions when things go wrong. If you’ve ever been told a train cannot run due to a “lack of train crew” its unlikely the driver was off sick and instead much more likely that a delay on a train they were driving earlier has cascaded down to your service and they simply never arrived at your station to drive the train. Control rooms try to avoid this — and many other issues.

Our earlier examples of Cambridge and Foxton aren’t exactly a million miles apart, and on a normal service you would expect a train to travel between them in around 5 minutes — but as the railway gets more and more congested making real-time decisions based on a time resolution of minutes will result in more and more decisions resulting in delays and cancelations.

After having learnt about this situation, whilst onboard HackTrain 2.0, I decided to have a go at resolving it in my own time.

One constraint of working with technology for use on trains is that, for all intents and purposes, you cannot add hardware to trains. The Rail Industry in the UK is, thankfully, obsessed with safety and that means that anything added to a train has to go through rigorous Health & Safety checks. This means it can take months, or years, for anything to be installed trackside or in a vehicle.

Whilst staff who work for South West Trains (who explained the situation to HackTrain competitors) are all issued with BlackBerry devices, some companies have started issueing their Guards and Conductors with relatively modern Android devices, some of which even act as ticket printing machine for on-the-train ticket purchases.

Armed with this knowledge I decided to try and validate my idea that a train could be tracked using just the smartphone device a member of train crew is already carrying. As I didn’t know if the idea would work in reality I set about making a working prototype with the minimum possible effort.

My acceptance criteria was pretty simple: Be able to view in real time the position of a train from Cambridge to Kings Cross. This wasn’t exactly the most measurable criteria (what does “real time” mean, for example?) but it gave me something to work towards.

In just a few hours over two evenings I developed a solution with a server written in Node.js that hosted two pages: `map.html` which, using the Google Maps JavaScript API, showed the current position and historic route of all trains being tracked and `report.html` which delivered a tiny JavaScript application to a browser which allowed it to report its location — attained through a HTML5 geolocation watch — via WebSockets, to the server.

The Prototype JavaScript client on an iPhone

The Prototype JavaScript client on an iPhone

The client and the server were, in the nicest way possible, scrappy. In order to identify a Train the guard (in these feasability tests that was me) would have to manually enter a Headcode into a JavaScript prompt and then leave the browser window open for the entire trip — I didn’t bother developing a background task. Each trains location data was stored in one big multidimensional array on the serverside that had no persistence code. But that was all fine, these applications were never intended to be production ready, just prove an idea was workable.

I walked down a few streets with my prototype client open in the Safari Browser on my iPhone and it appeared to report its location to the server as intended. So, on one of my final commutes into work before the holiday season I tested the system on a moving train on a route that includes tunnels, large areas of poor phone signal and central London; and all the different challenges these environments bring.

The route of my commute from Cambridge to London Kings Cross as tracked by my system

The route of my commute from Cambridge to London Kings Cross as tracked by my system

I was pleasently suprised with the results, shown above , despite a few anomalies.

Due to the fact I was using the `watchLocation()` function the train didn’t report its location in even time periods, as it would if you were polling the location, but rather only when the trains position changed significantly. Despite this fact, the period between the train attemting to send its location whilst on the move was never more than around 10 seconds.

I say attempted because, as expected, there were some issues with mobile coverage. I had expected to have issues in tunnels, but actually only the final tunnel before Kings Cross (which goes under the Grand Union Canal) caused issues. Lack of signal otherwised occured in a few blackspots in rural Cambridgeshire. This issue could be mitigated by employing a system such as that provided by Nomad Digital which provides an internet connection to a train using multiple sim cards connected to multiple network providers.

Another issue you can see in the above screenshot is that my train appears to have left the tracks a few times. I can assure you that this didn’t actually happen, but appears to have done so due to a combination of some inaccurate GPS readings and some failed data transfer caused by the aforementioned network connectivity problems. These errors could be mitigated using a host of techniques: comparing GPS position to known track locations, buffering changes in location that seem too great and only accepting them as true if subsequent location changes appear to be in the same area.

This was a pretty interesting project to work on. It gave me an oppertunity to develop a simple solution to a real life problem in a short period of time and learn about the awesome Socket.io library. I was happy to see it work so well, and to be wrong about tunnels. I look forward to making some of the improvements I’ve mentioned in this post and making the UI prettier and data storage a little bit more resilient.

Danny

Follow

Get every new post delivered to your Inbox.

Join 286 other followers

%d bloggers like this: