Assembly Strings

Prahlad Godara ------ From DOOSEEP


A string is a data type that is traditionally a sequence of characters. And it is often implemented as an array data structure of bytes (or words). In assembly language also, strings are used to store large numbers or groups of characters and words.

String Functions and Uses in Assembly Language

A string can contain multiple characters. The length is set according to the size of the string in a variable in the assembly. Generally, we specify the length of the string by either of the two methods −

  1. explicitly storing string length
  2. using the $msg character

We can store the string length explicitly using the $location counter symbol which represents the current value of the location counter. $ points to the byte after the last character of the string variable msg. Therefore, $-msg returns the length of the string. You can also give the length of the string in place of $-msg - len equ 13

 msg db 'Hello, world!',0xa ;our dear string
      len  equ  $ - msg            ;length of our dear string
      ;Or
      len equ 13                 ;length of our dear string
    

The string is written after the trailing sentinel character - message DB 'I am loving it!', 0

string instructions in assembly

Each string instruction may require a source operand, a destination operand, or both. ES and EDI in 32-bit use the same SI and DI registers as in 16-bit for source and destination.

There are five basic instructions for processing strings.

  1. MOVS − This instruction moves 1 byte, word or doubleword data from one memory location to another.
  2. LODS − This instruction loads from memory. If the operand is a single byte, it is loaded into the AL register, if the operand is a word, it is loaded into the AX register, and a doubleword is loaded into the EAX register.
  3. STOS− This instruction stores data from register (AL, AX, or EAX) into memory.
  4. CMPS− This instruction compares two data items in the memory. Data can be of byte size, word or doubleword.
  5. SCAS− This instruction compares the contents of a register (AL, AX or EAX) with the contents of an item in memory.

These instructions use the pair of ES:DI and DS:SI registers, where the DI and SI registers contain valid offset addresses that refer to bytes stored in memory. SI is usually associated with DS (Data Segment) and DI is always associated with ES (Excess Segment).

According to the special operation of all these (byte operation - B, word operation - S, double word operation - W) is added to the instructions,

repetition prefix

The REP prefix, when set before a string instruction. This instruction repeats the processing until CX becomes zero.br
The Direction Flag (DF) sets the direction of the operation.

Use the CLD (Clear Direction Flag, DF = 0) to perform the operation from left to right.
Use STD (set direction flag, DF = 1) to perform the operation from right to left.

REP prefix - REPE or REPZ: This is conditional repetition. It repeats the operation while the zero flag indicates equal/zero. It stops when ZF indicates not equal/zero or when CX is zero. Or when cx becomes less than zero.

Example

The following program prints the string hello world to the screen -


  section	.text
	global _start       ;must be declared for using gcc
_start:                     ;tell linker entry point
	mov	edx, len    ;message length
	mov	ecx, msg    ;message to write
	mov	ebx, 1	    ;file descriptor (stdout)
	mov	eax, 4	    ;system call number (sys_write)
	int	0x80        ;call kernel
	mov	eax, 1	    ;system call number (sys_exit)
	int	0x80        ;call kernel

section	.data

msg	db	'Hello, world!',0xa	;our dear string
len	equ	$ - msg			;length of our dear string 

When the above code is compiled and executed, it produces the following result −

 Hello, world!


Tags- Nasm Assembly language Strings. This blogcreates content similar to stackoverflow geeks for geeks tutorialspoint w3schools and dooseep