ColourCode: Creating Language Handlers

This guide will get you started on creating language handlers for ColourCode

Getting Started

All Languages are stored in the data/languages directory in the root ColourCode directory. Your language should be named <language extension>.rb where <language extension> is the most common file extension for the language. For example if the language is Lua, lua is the preferred extension, and your file name is lua.rb.

Futhermore all handlers are expected to subclasses of class Language(data/languages/language.rb).

You should also put a comment at the top of the file with the language name, along with the following code, which will give you access to the Language class

require 'utilities'
require LANGUAGE_DIR + 'language'

If your language extends another language ( as C++ extends C ), then you should extend that language's class and change the require to point to your base language's file.

The handler class

The language handler should preferable be named Language<language name>.

Loading your class

ColourCode expects that a global function language_init is defined which returns the class (and not an instance of the class). For example, if your formatter is called LanguageLua, then your function would be:

def language_init
    return LanguageLua
end

Methods to be defined

Your class should define the following methods (best to copy the skeleton), all of which take no arguments.

   class LanguageLua < Language
    def KEYWORDS
        [<a list of keywords in your language>]
    end

    def DATATYPES
        [<a list of datatypes in your language>]
    end

    def OBJECTS
        [<a list of objects (if any) in your language>]
    end

    def BUILTINS
        [<a list of builtins (if any) in your language>]
    end

    def COMMENT_SINGLE_BEGIN
        "< the single line comment character for your language >"
    end

    def STRING
        # a list of characters that delimit strings, the default should be good enough 
        ['"', "'"]
    end
end

Comments

If your language has a syntax for defining multiple line comments, it is your responsibility to handle them. In addition, if your language's single line comment character (defined in COMMENT_SINGLE_BEGIN) consists of more than one character, the parser will not handle it, its your responsibility. The Language class calls two optional methods of your class to handle special cases and give your language more control over the structure, for this please see below.

Handling tokens directly

If your language has special syntactical structures or comments, then you can implement two methods which Language will call before it decides to take action on the token. But first a look at the member variables provided by Language for use.

@formatter - a reference to the formatter in use
@inString - set to true if the parser is in a string.
@inComment - set to true if in a comment
@escapeChar - true if the earlier character was an escape character '\'
@singleComment - set to true if the comment is a single comment

subHandleAlphaNum and subHandlePunct

The Language class will call these two methods before it handles tokens by itself. If you define these functions you can use them to handle unique aspects of your language which cannot be handled by the generic handlers. subHandleAlphaNum should take a token which will be the complete alphanumeric token. subHandlePunct will be passed every punctuation character as a single character, so its your job to preserve the state of the object so that you can keep track of the characters. The best example of this would be the C Language file (data/languages/c.rb).

Both functions return values are very important. Both the functions should return a list, the first element should be a boolean, and the second the data string that is being returned. If the first element is true then Language will take your return data, return it to the Parser and move on to the next token, it will perform no handling of its own. If the first element is false then Language will take your return data, append it to its return data, and continue further handling. In the case of subHandlePunct only that character will be skipped if it returns true.

Distributing handlers

If you want to contribute your handlers to ColourCode, drop me the file at nsm.nikhil @t gmail.com.