A string is a data type that is traditionally a sequence of characters. And it is often implemented as an array data structure of bytes (or words). In assembly language also, strings are used to store large numbers or groups of characters and words.
String Functions and Uses in Assembly Language
A string can contain multiple characters. The length is set according to the size of the string in a variable in the assembly. Generally, we specify the length of the string by either of the two methods −
- explicitly storing string length
- using the $msg character
We can store the string length explicitly using the $location counter symbol which represents the current value of the location counter. $ points to the byte after the last character of the string variable msg. Therefore, $-msg returns the length of the string. You can also give the length of the string in place of $-msg - len equ 13
msg db 'Hello, world!',0xa ;our dear string
len equ $ - msg ;length of our dear string
;Or
len equ 13 ;length of our dear string
The string is written after the trailing sentinel character - message DB 'I am loving it!', 0
string instructions in assembly
Each string instruction may require a source operand, a destination operand, or both. ES and EDI in 32-bit use the same SI and DI registers as in 16-bit for source and destination.
There are five basic instructions for processing strings.
- MOVS − This instruction moves 1 byte, word or doubleword data from one memory location to another.
- LODS − This instruction loads from memory. If the operand is a single byte, it is loaded into the AL register, if the operand is a word, it is loaded into the AX register, and a doubleword is loaded into the EAX register.
- STOS− This instruction stores data from register (AL, AX, or EAX) into memory.
- CMPS− This instruction compares two data items in the memory. Data can be of byte size, word or doubleword.
- SCAS− This instruction compares the contents of a register (AL, AX or EAX) with the contents of an item in memory.
These instructions use the pair of ES:DI and DS:SI registers, where the DI and SI registers contain valid offset addresses that refer to bytes stored in memory. SI is usually associated with DS (Data Segment) and DI is always associated with ES (Excess Segment).
According to the special operation of all these (byte operation - B, word operation - S, double word operation - W) is added to the instructions,
repetition prefix
The REP prefix, when set before a string instruction. This instruction repeats the processing until CX becomes zero.br
The Direction Flag (DF) sets the direction of the operation.
Use the CLD (Clear Direction Flag, DF = 0) to perform the operation from left to right.
Use STD (set direction flag, DF = 1) to perform the operation from right to left.
REP prefix - REPE or REPZ: This is conditional repetition. It repeats the operation while the zero flag indicates equal/zero. It stops when ZF indicates not equal/zero or when CX is zero. Or when cx becomes less than zero.
Example
The following program prints the string hello world to the screen -
section .text
global _start ;must be declared for using gcc
_start: ;tell linker entry point
mov edx, len ;message length
mov ecx, msg ;message to write
mov ebx, 1 ;file descriptor (stdout)
mov eax, 4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax, 1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
msg db 'Hello, world!',0xa ;our dear string
len equ $ - msg ;length of our dear string
When the above code is compiled and executed, it produces the following result −
Hello, world!