Categories
University

Languages and Their Compilers ACW Result – 80%

You may have noticed that my blog hasn’t been updated nearly as much this semester as it has been in those last year and the year before. One of the reasons for this has been the Languages and Compilers assessed coursework, which has taken up quite a bit of time.

The aim of the coursework was to build a compiler, a bit of software which converts from a “high level” human readable language to a “low level” language a computer can execute, for a completely made up language called SPL – which stands for Simple Programming Language.

Formalizing the Grammar of the Language

SPL If Then Else Syntax Diagram
SPL If Then Else Syntax Diagram

The first thing we had to do was formally specify the grammar of the language — the grammar of a language specifies what is, and isn’t, valid in said language — we did this by reading some syntax diagrams provided to us by our lecturer, such as the one above, and then writing out Backus-Naur form notation to describe it. The image above in Backus-Naur form is notated as:

<if_statement> ::= IF <conditional> THEN <statement_list> ENDIF |
                   IF <conditional> THEN <statement_list> ELSE <statement_list> ENDIF

Items in <brackets> are other rules, so there would be another rule which describes the syntax of “conditional” and “statement list”. Using multiple rules together allows us to build up a complete syntax for the language.

The Tokenizer

Once we had developed a complete BNF grammar for the language we could move on to building the compiler itself, this was especially interesting for me because my final year project uses a lot of the same concepts.

The first stage of building the compiler was building a tokenizer. A tokenizer, as I’ve spoken about before, is a system which recognizes certain keywords, or patterns, and gives them a label which can then be used later on in the compilation. For example the C# code:

public static void main()
{
     int a = 10 / 2;
}

Consists of the following tokens:

  • Public Keyword
  • Static Keyword
  • Void Keywork
  • Identifier Pattern main
  • Open Bracket (sometimes referred to as BRA)
  • Close Bracket (sometimes referred to as KET)
  • Open Curly Bracket
  • int keyword
  • Identifier Pattern a
  • Symbol =
  • Integer Pattern
  • Symbol /
  • Integer Pattern
  • Semi Colon
  • Close Curly Backet

We used an open source version of the Unix program lex, called flex, to build our tokenizer (in my FYP I’m building a tokenizer from scratch). In our flex file we recognized tokens using regular expressions. Matching something like a keyword is rather simple, you just look for that literal string, however looking for an integer is a tad more interesting. Below you can see a few examples:

/* Primitives */
digit                  [0-9]
number                 {digit}+
/* End: Primitives*/

//Public keyword
public                 TOKEN(public);

//Integers (e.g. whole numbers)
{number}               TOKEN(integer);

//Reals (e.g. Numbers with decimal point)
{number}.{number}      TOKEN(float);

On the left hand side of the token recognition is the pattern that has to be matched. For the word public it is the literal string public, for integers it is a {number} and for a real it is a {number} a full stop and then another {number}. {number} has been declared previously to be 1 or more digits, and a digit has been declared before that to be any number between 0 and 9.

The tokenizer takes a .spl file, which contains spl code written by a programmer, and spits out an array of tokens which we can then use in the next stage.

The Syntax Parser

Once we’ve finished building our tokenizer we could then go on to building a syntax parser. The job of the syntax parser is to ensure that the array of tokens provided by the tokenizer are in an order which makes sense. To produce the syntax parser we used a free program called Bison, which is based on the Unix program YACC.

This is where we go back to the Backus-Naur Form we made previously. Using that, with some alteration, we produced a .y file which, once passed through Bison, could identify correct grammars in the list of tokens passed to it — and tell the programmer there was an issue with their code if anything was detected as incorrect.

Correct grammars were added to a parse tree, in hierarchical order. As we saw earlier, an IF statement contains a conditional, so that became one of the branch nodes of the tree which represented an IF statement.

At this point we could print out the parse tree, my software could print both to the console and to an interactive animated viewer. Below you can see both:

a.spl Parse Tree as shown in the Command Prompt
a.spl Parse Tree as shown in the Command Prompt
a.spl Parse Tree as Interactive Animated Viewer (Click to enlarge)
a.spl Parse Tree as Interactive Animated Viewer (Click to enlarge)

Code Generation

Once we have determined that the input file from the programer is syntactically correct we need to generate executable code. For this coursework instead of generating an executable directly we generated ANSI C code, and passed that to the GNU C Compiler. We did this for a few reasons:

  • Generating machine code directly wouldn’t be as portable
  • Generating machine code would take a lot longer!
  • Learning ANSI C was really useful, as its the number 1 most used programming language in the world

Generating code was a simple case of walking the parse tree and generating code for each node… for example, if you reach an IF THEN ELSE node you could simply output the following:

if(/*Generate code for the CONDITIONAL here*/)
{
    /*Generate code for the IF statement list here*/
}
else
{
    /*Generate code for the ELSE statement list here*/
}

Ensuring you generate code for the conditional and statement lists in the correct sequence. Certain types of grammar throw up a few problems, such as FOR LOOPS (which way should the iteration go?) but the concept of code generation from a tree in itself is simple.

Optimization

The final part of the coursework was to add some optimization to the compiler. Some optimizations you could implement included:

  • Don’t compile the code if it doesn’t contain and input or output (if a program has no input or output, it looks the same as having no program at all!)
  • If a programmer increments a variable by 1 generate the ++ operator rather than +1
  • Constant Folding: If a variable is assigned to the outcome of a constant set of values. e.g.  a = 4/2, then generate a = 2 because this will always be the case

I chose to do both the incrementation optimization and the no input or output optimizations. I was going to add constant folding too, but didn’t think I had enough time.

Conclusion

Today I got my grade for the coursework and achieved 80%, which is a strong first class. I’m really happy with this. 🙂

I would recommend that any second year thinking about what modules he or she should take next year seriously consider Languages and Compilers, it is by far the best module I have taken. It combines knowledge from all your previous modules and really cements that information in your head.

I’d like to thank both Dr. Martin Walker and Eur Ing Brian Tompsett for setting such an interesting coursework, and providing me with the help that they did along the way.

If I’m allowed I might upload my compiler and some documentation for SPL for people to have a play with it.

Danny

Categories
University

Advanced Programming Coursework Result – 100%

Today I got back the result for my Advanced Programming Coursework.

The coursework was called “Data Structure Analysis”, and the aim was to design and develop data structures in C++, a Linked List and a C style array, and then compare them using a hardware timer to see how they compared in read from file, write to file, search and sort operations.

Once we had created both structures, making sure we kept within a coding style tested using the Parasoft Code Reviewer software and ensuring we didn’t have any memory leaks by properly implementing the gang of three, we then wrote up a 2000 word report detailing how our various algorithms worked.

I’m really happy and proud to announce I achieved a grade of 100% for my work! :D. This means that, subject to the normal exam board review, I have achieved a grade of 88.24% for the module overall! Good times.

Danny

Categories
Mobile Application Development University

An update on the Progress of Sweepy Cleaner

I posted this video to Youtube today which details the advancements made with my Sweepy Cleaner XNA game project for my Assessed coursework 2 for the Programming 2 Module. As I state in the video its mainly for the benefit of my friend Shane, who has very kindly agreed to make some banging 8 bit tunes for the Marketplace release of the game, but it does give an overview of the game as a whole for anyone who is interested (you also get to hear my very weird voice).

Danny

Categories
University

A Busy Tuesday

(My Desk — Where I Spent Most of My Day Yesterday)

Yesterday was so busy I didn’t manage to fit any blogging in, so I shall do it now!

My day started around noon when I had finished taking advantage of my one day without a 9:15 start at which point I made Pizza bread for brunch, I’ve become reasonable at cooking since coming to uni — especially after cooking each day Friday – Monday this weekend with Jess.

Pizza Bread - Its basically toast with BBQ sauce and Cheese on, I'm a Culinary genius.
Pizza Bread - Its basically toast with BBQ sauce and Cheese on, I'm a Culinary genius.

Shortly after this I paid for my accommodation online, £1180.62 well spent I think, I have enjoyed the independence of living on my own in my own place. I then filled out my student finance forms asking them to remove my Maintenance loan (which I didn’t want in the first place).

I spent much of the rest of the day doing my Quantitive Methods for Computing Coursework, writing up lecture notes and reading one of my new books – “William Stallings Computer Systems – Internals and Design Principles” – which as you can imagine is… exciting stuff. Fortunately I only have another 380 pages to read 😛 Speaking of books my book shelf is now actually starting to look full and student like.

My Bookshelf
My Bookshelf

From left to write I have

  1. A folder of University and Student Finance Documents
  2. A folder of Misc. Documents I felt I would need
  3. A reem of paper
  4. University of Hull Computer Science Department Student Handbook (full of useful stuff)
  5. Computer Systems Architecture by R. Newman, E. Gaura and D. Hibbs
  6. Computer Organization & Architecture  – Designing for Performance by William Stallings
  7. Code Complete 2 by McConnell
  8. Operating Systems Internals and Design Principles by William Stallings
  9. University of Hull Student Pages (essentially a planner)
  10. Vernon GODLittle by DCB Pierre (we got this for free on registration)
Thats everything about tuesday anyway,
Thanks for Reading
Danny.