You touched on quite a few things in your question, so I'll put in my 2 cents to try and make it all a bit clearer. Let's look at an in-order MIPS architecture as an example - it features all of the things you mention except the variable-length instructions.
Many MIPS CPUs have 5-stage pipelines with stages: `IF -> ID -> EX -> MEM -> WB`. (
[To see links please register here]
). Let's first look at those instructions where each of these stages will generally take a single clock-cycle (this might not be the case on cache misses, for example). For instance, SW (store word to memory), BNEZ (branch on not zero) and ADD (add two registers and store to register). Not all of these instructions have useful work in all pipe stages. For example, SW has no work to do in WB stage, BNEZ can be finished as early as ID stage (that's the earliest the target address can be computed) and ADD has no work in MEM stage.
Regardless of that, each of these instructions will go through each and every stage of the pipeline even if they have no work in some of them. The instruction will occupy a given stage but no actual work will be done (i.e. no result is written to a register in WB stage for SW instruction). In other words, there will be no stalls in this case.
Moving over to more complex instructions whose EX stage can take up to tens of cycles such as MUL or DIV. Things get much trickier here. Now the instructions can get completed out of order even though they are always fetched in order (meaning [WAW hazards][1] are now possible). Take the following example:
MUL R1, R10, R11
ADD R2, R5, R6
MUL is fetched first and it reaches the EX stage before ADD, however ADD will get completed way before as MUL's EX stage runs for more than 10 clock-cycles. However, the pipeline won't be stalled at any point as there is no possibility of hazards in this sequence - neither RAW nor WAW hazards are possible. Take another example:
MUL R1, R10, R11
ADD R1, R5, R6
Now both MUL and ADD write the same register. As ADD will complete way earlier than MUL, it will complete WB and write its result. At later point, MUL will do the same and R1 would end up having wrong (old) value. This is where pipeline ***stall*** is needed. One way to solve this is to prevent ADD from issuing (moving from ID to EX stage) until MUL enters MEM stage. That's done by freezing or stalling the pipeline. Introducing floating-point operations leads to similar problems in the pipeline.
I'd complete my answer by touching on the topic of fixed-length vs. variable length instruction format (even though you didn't explicitly asked for it). MIPS (and most RISC) CPUs have fixed-length encoding. This tremendously simplifies the implementation of a CPU pipeline, as instructions can be decoded and input registers read within a single cycle (assuming that register locations are fixed in a given instruction format which is true for MIPS). Additionally, the fetching is simplified as instructions are always of the same length so there's no need to start decoding the instruction to find its length.
There are of course disadvantages: the possibility to generate compact binary is reduced which leads to larger programs which in turn leads to poorer cache performance. Additionally, memory traffic is increased as well as more bytes of data are read/written from/to memory which might be important for energy efficient platforms.
This advantage has led to some RISC architectures defining a 16-bit instruction-length mode ([MIPS16][2] or ARM Thumb), or even a variable-length instruction set ([ARM Thumb2][3] has 16-bit and 32-bit instructions). Unlike x86, Thumb2 was designed to make it easy to determine instruction-length quickly, so it's still easy for CPUs to decode.
These compacted ISAs often require more instructions to implement the same program, but take less total space and run faster if code-fetch is more of a bottleneck than instruction throughput in the pipeline. (Small / nonexistent instruction cache, and/or reading from a ROM in an embedded CPU).
[1]:
[To see links please register here]
[2]:
[To see links please register here]
[3]:
[To see links please register here]