Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 322 Vote(s) - 3.46 Average
  • 1
  • 2
  • 3
  • 4
  • 5
CYFA - Creating Your First Assembler - Instruction Set Design

#1
So, the introduction to this series seemed to get quite a lot of buzz about it, so I'm going to write these a little quicker. Hopefully I'll be able to get them all out in a couple of weeks.

So, part 2, this is when we get to go into the cool details without getting into the complicated shit. This one will feature the instruction set, sort of. We'll be talking about instruction set design, that is what makes up an instruction set (we won't look at more than a few instructions).

Ok. For most processors, assembly mnemonics correspond directly to an operational code, or opcode for short. This opcode is what tells the CPU what to do, and is usually followed by a couple bytes of encoded parameters. In a quick example, that might look like this:

opcodes:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

and then maybe some register encodings like this:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

So, if we have the following program:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

it would become the following in assembly

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

which our assembler would then translate to the following hex code (binary)

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.


It's pretty simple in operation, the assembler knows what each token resolves into, and memory addresses or constants always resolve to themselves.

But how do we know what the programmer will write?
We don't. The programmer has to know what we expect, and that's what makes assembly a language that's not for the weak of heart.



The ARM instruction set:
In our little example there, we used what's called a static 16 bit fixed instruction set. That is, instructions are always encoded in the same way, and all instructions are 16 bits (2 bytes) wide. ARM uses a variable 32 bit fixed instruction set. This means that most instructions get encoded in the same way, but some are different (like instructions that need more space for parameters, so they have less space for registers).

Here's a graphic that explains how they are put together:
[Image: instr.gif]
For this tutorial series, we will only be working with the data processing, single data transfer, and branch categories. Using these 3, we can build usable software, and for the most part you will be able to test this software on your own.

Let's get into what each of these mean. First of all, you'll notice that the only one that contains an opcode is the data processing category. That's because there are more data processing instructions than anything else. The other instruction categories are controlled by flags, we'll get into that as we go along. These DP instructions are things like add, subtract, carry, etc. They get used the most. We'll go into depth on these in part 3 of this series.

Secondly, the single data transfer category. These instructions allow the CPU to communicate with the memory controller. Don't be fooled into thinking that these interface over a known bus, they don't. They ONLY talk to the memory banks. You can't read data from a hard drive with these (you would use a software interrupt or an undefined instruction for that, we won't talk about those). For our example, these make up the LDR and STR instructions, and we'll talk a lot about these and their uses in part 4.

Finally, the branch category. Technically this type of instruction can be accomplished with a data processing instruction, but it's less CPU intensive to know that it's a branch before hand. Branches are jumps in CPU level terms. You can relate these to goto, break, continue, return, etc in C. This will be part 5 of the series, and will likely be a short one, as these can get really deep and I want to avoid spending a week working on branches. We will likely be talking about branches a lot throughout the series though, so pay attention.



Now, let's have a talk about the first field in every one of those categories, the condition. This is specific to ARM/RISC processors, and is amazing to have. For an intel processor, the CPU actually has different instructions for if something is greater, less than, or equal to. ARM only has one instruction, but adds a condition. For intel, only jumps/branches can be conditional, with ARM all instructions can be. Here's an example of an intel increment/decrement until 0 block

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Now, let's do the same with ARM assembly:

Hidden Content
You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Not only is that 2 instructions shorter, but it's also 5 clock cycles shorter (on a 1hz processor, the ARM one will finish 5 seconds sooner).

These conditionals are something that we will have to program our assembler to understand, but since we're using ARM, it means that we have to program it to understand fewer instructions and opcodes. This is kick ass!



That's the end of this one, I hope you have a little bit more understanding of what we're about to go through. I'm taking this slow for a reason, I want everybody to understand this. I'm basically teaching you assembly language and compiler theory at the same time.

Please let me know if I need to slow down, speed up, or go back and touch on something! YOUR FEEDBACK IS CRITICAL!
Reply

#2
That was a pretty good post.

ARM is a lot more efficient than x86, as x86 is CISC, and ARM is RISC. (you only mentioned that ARM is RISC, not the other way)

x86 is a lousy design, and was made pretty quickly. However, it seems that since the Pentium Pro, Intel CPUs have a RISC core, with CISC instructions that are broken down into smaller instructions (still not great). A well designed CISC CPU could have some advantages over RISC-ones (but RISC is still better in the long run), but x86 is not well designed. Intel did try to make a CPU line that uses neither RISC nor CISC, called Itanium (IA), but that is a complete failure (yeah, it's still around), as everyone already targets ARM and x86, and nobody is targeting IA-64/IA-32.
Reply

#3
Quote:(10-12-2017, 12:06 AM)Ender Wrote:

[To see links please register here]

That was a pretty good post.

ARM is a lot more efficient than x86, as x86 is CISC, and ARM is RISC. (you only mentioned that ARM is RISC, not the other way)

x86 is a lousy design, and was made pretty quickly. However, it seems that since the Pentium Pro, Intel CPUs have a RISC core, with CISC instructions that are broken down into smaller instructions (still not great). A well designed CISC CPU could have some advantages over RISC-ones (but RISC is still better in the long run), but x86 is not well designed. Intel did try to make a CPU line that uses neither RISC nor CISC, called Itanium (IA), but that is a complete failure (yeah, it's still around), as everyone already targets ARM and x86, and nobody is targeting IA-64/IA-32.

Very interesting that you know how all that works. That process is called microcoding. You can see hints of this in the linux kernel, it contains a sort of map of the individual things it can do.
Reply

#4
Quote:(10-12-2017, 04:55 AM)Inori Wrote:

[To see links please register here]

I'm trying to get up to speed with all the threads I missed and I'm really tired so a lot of this went over my head. As is pretty standard from your tutorials, I actually learned a bunch from what I grasped. Definitely going to read this again when I've had my coffee tomorrow. Nice job!

I don't know if you saw the thread, but I put up a page with a list to all of them. Helps a lot to have them in one place. Link in my signature.

I'm going slow with this series because to teach someone to make an assembler they have to know the assembly language in question. Let me know if I go too far over your head.
Reply

#5
I'm trying to get up to speed with all the threads I missed and I'm really tired so a lot of this went over my head. As is pretty standard from your tutorials, I actually learned a bunch from what I grasped. Definitely going to read this again when I've had my coffee tomorrow. Nice job!
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through