How To - CYK Algorithm

September 24, 2014

In this quick post I try to explain the basics of the Cock-Younger-Kasami (CYK) Algorithm to you.
This shouldn't take more than 15min for you to understand,
since I am terrible at logic and math related studies - if I could do it so can you :D

The CYK is a theoretical construct that can be used to find context free languages by feeding grammar and its production rules into a table and see wether a word from the language you wish to identify, matches up with these rules.
To make this process run smooth, the production rules must be formatted in such way that it matches Chomsky Normal Form (CNF) standards;

on the left hand side of a production rule stands a non-terminal symbol (a variable)
on the right hand side of a production rule there can be maximal 2 non-terminals or 1 terminal symbol (a symbol that represents a constant value) but never non-terminals and terminals mixed
For my course I worked through an example step by step, we started out with 3 production rules:

As you can see in production 2 and 3, there are non-terminals and terminals all mixed up.
We can't have that ._.

So we will make up other productions, until all the productions are conform with the CNF standards:

Now that we have our grammar, we can try wether if the word we are interested in, fits our productions.
In this example our "word" will be "( ) [ [ ] ]" - yes, these are literally 6 brackets...

The easiest way is to do this with a table.
To make sure that we both are talking about the same cell in our tables, I named rows and collumns with coordinates.

Start filling up the table by finding productions, that would produce the word.
For cell 1/1 that would be C -> ( for 2/1 it would be D -> ) and so on.

For the second row we will need help from the cells in the first row. To fill 1/2 we take a look at cells 1/1 and 2/1
We are only interested in the non-terminals here; so C -> ( and D -> ) become CD and we check in our productions what could produce CD.
We find A -> CD
The same goes as we move on in our row to the right.

For the third row cell 1/3, take information from the cells 1/1 and 2/2

Once we start with the fourth row, you will notice more clear that there is a pattern in the movement of the auxiliary cells (blue), which help us to fill our target cell (red).
As you can see, the left blue box moves down from the top, while the right blue box moves diagonally upwards.

Should you find more than one production applying, write them both into our target cell.

Our goal is to identify, if the word fits into our grammar, so after filling up all our cells the table should look like this.
The final cell 1/6 should not be blank, but hold our production rule S -> AB
The S stands for start and matches with the start of our word.

Yay. This post took me way longer than I needed to draw the table -___-
On midway my app broke, which I use to pull the pictures together, I hope it doesn't mess up the whole concept here.

If this helps you, please leave a comment^^ much appreciated to know wether I am alone here with my blog ._.

Search This Blog

theSciencyStuff

How To - CYK Algorithm

Comments

Post a Comment

Popular posts from this blog

How To - Variance & Standard Deviation

How To - Citric Acid Cycle