Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 336 Vote(s) - 3.55 Average
  • 1
  • 2
  • 3
  • 4
  • 5
CYFA - Creating Your First Assembler - The Processing

#1
So, in

[To see links please register here]

we wrote all of the structures, enumerations, format declarations, etc for our language. In a normal dev environment, we would probably use yacc (Yet Another Compiler Compiler), but I wanted to give you all the guts of how it worked.

Before we begin
I realized that I made a mistake in

[To see links please register here]

, dealing with the structure for handling Operand2 when using a register. Rather than explain all of the changes, I'll simply paste the new contents of instruction/data.h below. I'll also be pasting this same block in part 6 (if I can still edit it).

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.


So, in this one we're going to start writing the rules, as well as some of the processing. So, the first thing I want to do in this is to make a note. We will not be accepting rules into our parser unless it starts with an opcode. This means that we can set up category rules to catch and organize things. Let's go ahead and write 3 basic rules. These rules will be category rules, meaning that they will allow us to group our instructions into our 3 different types at the top level. Remember these types are as follows:
  1. Data Processing
  2. Single Data Transfer
  3. Branch
So, let's go ahead and navigate to language/rules.c. I'll walk you through the first one.
First, we will need to define (and undefine at the end) two macros. These make it possible for us to initialize our list statically. They are as follows:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

We'll undefine these after our rules block, so don't worry.

Next, we need to modify language/constants.h
I mistakenly told you previously that these should all be const static uint32_t when actually they need to be macros. Go ahead and replace with this text:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Remember our format:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

I like to paste this block (in comments) in the file I'm working with when I write this, so that I don't lose track of anything. Don't worry, I'll also comment the fields.
Let's go over the fields real quick:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

These ones are the basic ones. So let's go ahead and set the name and type:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

OK. Now, let's break down the syntax into tokens.

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Now, we have two tokens here, technically 3. We have

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Now, we actually wrote a shorthand for this, it's the token kTOKEN_MNEMONIC, which can either evaluate to a straight opcode, or an opcode and a condition. This is the key to category rules. So that structure looks like this:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

I apologize for the formatting....it's acting weird.
Ok. So next we need to fill in the characters, because we have 2 of those. We know these two are comas.
Lastly, what are our allowed opcodes? Well, it's all 16 of the data processing ones. Rather than writing them all out there, we have a handy constant that we made in the last one, called C_OPCODES_DP. Our final rule should look like this:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.


Cool. Now, I'm going to write the rules for DT and BR, feel free to try writing these on your own as well. The final file looks like this:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.


Cool. Now, in order for this to work, we need to make 2 hard coded rules (not the same type of rule) into our assembler:
  1. All lines must begin matching with a rule who's Syntax[0] is kTOKEN_MNEMONIC
  2. All matched rules must have Syntax[0] = kTOKEN_OPCODE before being passed to the encoder
What this basically says is that a line must match a category rule, and then be processed from there. This will prevent us from having to write seriously complex rules when we start adding more instructions later, and allow us to have a clear debug path.

Ok, now I want to make a few changes to our rules list. These categories will match the most basic form, and operate only on registers. We actually need 4 rules to match it all. Data processing has two forms. One that takes a register as the third arg, and one that takes an offset. We need two different rules for this. Secondly, Since this is a category rule, we should avoid using kTOKEN_CONSTANT, since that requires the user know the constant at the time, and disallows the use of simple math operators, which are handy. Let's go ahead and change that to kTOKEN_EXPRESSION, which will allow us to use a constant, a simple equation, or even the name of a label (which is really useful for branching into subroutines). My updated file looks like this:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.


Just for the sake of being organized, I'm going to add some comments to separate category rules from parsing rules and the null rule. I won't repaste the code, you'll see it in the next paste.



Ok, I want to take a quick break to apologize for the number of errors I found and corrected without much explanation in this and the last part. Like I've said before, I only have a general plan about how this should all fit together, so occasionally I'm off by a little bit. Ok, now back to the tutorial.



So, now its time to write some rules. I want to segment them so that we have some more valuable debug information, even though we don't have to. I'll separate them into three groups for data processing instructions. The groups will be as follows:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.


So, let's go into language/constants.h and create these three groups:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.


Now, the only instructions we don't yet have more defined groups for are

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

That's ok, because there is only two instructions per group for these already, we can just use their opcode specifiers when we write their rules. Let's go ahead and write the basic rule for the logic (LGC) group. While we're add it, let's go ahead and remove that cast from the rule "CAT_DP_REG".
Ok, so our new rule is not a category rule. This means it will not start with a mnemonic, but rather it will start with an opcode. It can still end in an expression if we want it to though. Since this is our basic rule, it will take the most basic form of the instruction, meaning that it will be an opcode with no condition (AKA condition=AL), will take 3 registers as arguments, and will not evaluate to expressions. The three register form of an instruction takes an optional shift to be applied to the value in the register, examples are:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

So, for our absolute base rule, we will use only the default form, which is three registers, no shift or rotation applied. Here is the code I wrote for the logic group:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

as you can see, I gave the names a new prefix AB. This stands for "Absolute Base", and it will help us when we are reading the match output, or just in general organization. Now, let's go ahead and write the code for the other two (CMP and ARR). I advise that you try to write these rules yourself, there will only be a couple changes to my above code, and it will be good for you to give it a try.

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Perfect, now our (unwritten) parser should be able to match this form. At this point, it can match every instruction we need to write a basic program. While we're at it, lets write the absolute base rules for DT and BR:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

At this point, it should look (organizationally speaking) like this:
[Image: RRuThHD.png]



At this point, I want to take a break from writing rules and start working on the parser. I'm going to start in language/parsing.h
[Image: zhoeVPa.png]

Here we have the three structures that will help us decode and match our instruction lines, but we aren't actually doing anything with them yet, let's change that. Right now, we don't really have any place to hold a mnemonic, but we should just to stay consistent.

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Now, I want to make a couple changes to this file at this point.
1. we don't need to call things reg_name, opcode_value, etc. We can just use name and value. This makes it a lot easier to remember later on.
2. we don't actually need the whole language_parsing structures, it will tempt us to modify them, which we shouldn't.
here is the new file:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Now, why did I use bit fields there? Well, a couple of reasons. In using bit fields, I have made the structure size exactly 32 bits long, this will make it not only faster to read the structure, but also saved space in memory. We are currently making use of 20 bits to hold our instructions, so I gave it an extra 7 bits (8-ish more instructions) just so we don't have to adjust. If we need more than that, then we will lose our benefit, but for now it's worth it.
Secondly, what is this setflags bit?
Well, every instruction has the ability to be a comparison instruction at the end. For example, the following C code:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

would normally look like this (without the set bit)

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.


But, if we use the set bit, we can lose a few lines of code in that. The set bit sets the status register AFTER the operation is completed, TST is the generic "set status register" instruction, TEQ is just a subset of TST that only sets the Z (equality) bit in that register. Here's the code with the set bit:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Not only is that sequence shorter (and thus more dense and easier to read), but it also saves 2 whole clock cycles! That may not seem like a lot, but imagine if this is in a loop from 0=5000. That means it saved 10k clock cycles, if that was at a clock rate of 700MHz (pretty common for ARM), that saved exactly one millisecond of time, see, they add up.
So, that's what the set bit does, and why you need it.

Ok, this part might get a bit spotty, I was writing it and had a system crash, and when I tried to recover it I found out that I was over the mybb post size limit, so I'm having to rewrite it.
Let's go ahead and write our first parsing function. This function is going to take in a line of input and spit out a linked list of tokens that other functions will process. The point of this function is to strip out the white space and other unnecessary bits. Let's start by making our linked list.
Create a folder called data_structures and inside of it create two files:

data_structures/linked_list.h

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.


data_structures/linked_list.c

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.


Now, I just built a generic linked list, it will work fine for our project.

Next, create a folder called parser and create two files: parser/rules.c and parser/rules.h
at this point, your project tree should look like this:
[Image: oeZFQgP.png]

Inside parser/rules.c we need to start off with our includes. It's pretty obvious that we want to include our own header, and since our function will return a linked list, we need to include that too

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.


Ok, so we're going to call this function tokenize and it will take in a single line, strip the white space and other stuff out of it, and then spit out a linked list. Let's go ahead and put the function signature in the header file:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.


Ok, now I'm just going to write this function myself, to keep it simple. I'm using a basic

[To see links please register here]

loop. Here's my code

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.


Let's do a quick build check to make sure we are on track
[Image: 3r0HCpq.png]

Perfect! Ok, now lets run a test of it. Let's go ahead and add some includes to main and write a very basic function to test that.
[Image: Mc2CJWV.png]

Cool, now that its in place and we know it builds, let's set our breakpoint, build it, and start a debugger on it:
[Image: bh2rInC.png]

Awesome. At this point, our tokenize function has run, and we should have a first token. This token should be "EOR", let's check it
[Image: IqWDJ0K.png]
Perfect! now, I'm just going to run through and check all of the tokens
[Image: PsuQuT9.png]

we got

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.


This should look strange to you. Our tokenizer did work exactly how we wrote it to, but it's not quite finished. We should only have (for final processing)

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.


In our test string, I added a comment. We need to filter those out. We'll do that in two places. The easiest one is inside tokenize
Go ahead and find the line that says

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

and change it to be

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.


cool. Now, we need a second function. This one will be internal to our parser though, so no function signature in the header. Its job is to sort through the list, find the first token that starts with ';', and delete both it and everything after. Pretty simple. We already know that token 0 will not be a comment, so we don't have to worry about nulling the whole list. This function is actually pretty simple:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Ok, let's give it a test: I just added this into my main
[Image: jw2kX4t.png]
Perfect, now let's add the signature in parser/rules.c, add it to tokenize (right above the return) and wrap it up!



Ok, well due to constraints on how long this post can be (the text alone is 26Kb), I have to wrap this one up. Next time we'll work more on our parser, sorry. Please talk about this thread below!
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through