x86_64 Assembler + C = One Love

In this article I will describe the process of calling C functions from assembler.
Let’s try to call printf ( “Hello World \ n!”); and exit (0);

section .rodata
    message: db "Hello, world!", 10, 0

section .text
    extern printf
    extern exit
    global main

    xor	rax, rax
    mov	rdi, message    
    call printf
    xor rdi, rdi
    call exit

Everything is much simpler than it seems, in the section .rodata we describe the static data, in this case the string “Hello, world!”, 10 it is a newline character, and will not forget it annihilate the.

The section of code declare outside of the printf function, exit libraries, stdio, stdlib, also declare main entry function:

section .text
    extern printf
    extern exit
    global main

In the case of the return function rax pass 0, can be used mov rax, 0; but to accelerate the use xor rax, rax; Further, in the first argument is a pointer to a string:

rdi, message

Next call external C functions printf:

    xor	rax, rax
    mov	rdi, message    
    call printf
    xor rdi, rdi
    call exit

By analogy, transfer case 0 in the first argument and calling exit:

    xor rdi, rdi
    call exit

As the Elves say:
Who does not listen
He eats plov @Alexander Pelevin



Source Code



Hello World x86_64 Assembly

In this article I will describe the IDE configuration process, writing the first Hello World assembler x86_64 for Ubuntu Linux operating system.
Let’s start with IDE SASM plant assembler nasm:

sudo apt install sasm nasm

Next, invoke SASM and write Hello World:

global main

section .text

    mov rbp, rsp      ; for correct debugging
    mov rax, 1        ; write(
    mov rdi, 1        ;   STDOUT_FILENO,
    mov rsi, msg      ;   "Hello, world!\n",
    mov rdx, msglen   ;   sizeof("Hello, world!\n")
    syscall           ; );

    mov rax, 60       ; exit(
    mov rdi, 0        ;   EXIT_SUCCESS
    syscall           ; );

section .rodata
    msg: db "Hello, world!"
    msglen: equ $-msg

Hello World code is taken from the blog James Fisher, Adapted for assembling and debugging SASM. In SASM documentation states that the entry point must be a function named main, otherwise debug and compile code is incorrect.
What we did in this code? Made the call syscall – an appeal to the Linux operating system kernel with the correct arguments in registers, a pointer to a string in the data section.

Zoom Enhance

Consider the code details:

global main

global – assembler directive allows you to set global symbols with string names. A good analogy – interface header files C / C ++ languages. In this case, we ask the main character for the input function.

section .text

section – assembler directive allows define sections (segments) of code. Section directive or a segment equal. The .text section is placed code.


Announces the beginning of the main function. The assembler function called subroutines (subroutine)

mov rbp, rsp

The first machine instruction mov – puts the value of the argument 1 to argument 2. In this case, we transfer the register value in rbp rsp. Of comments you can understand that this line added SASM to simplify debugging. Apparently that is a personal affair between SASM and debugger gdb.

Next, look at the code to .rodata data segment, two call syscall, first outputs Hello World string exits from the second application with the correct code 0.

Let us imagine that the registers are variables with names rax, rdi, rsi, rdx, r10, r8, r9. By analogy with the high-level language, turn from vertical to horizontal view of the assembly, then the call syscall will look like this:

syscall(rax, rdi, rsi, rdx, r10, r8, r9)

Then the call to print text:

syscall(1, 1, msg, msglen)

Calling the exit with the correct code 0:

syscall(60, 0)

Consider the arguments in more detail in the header asm/unistd_64.h file find function __NR_write – 1, then look in the documentation for the arguments write:
ssize_t write (int fd, const void * buf, size_t count);

The first argument – the file descriptor, the second – the buffer with the data, the third – the counter bytes to write to a file handle. We are looking for the number of file descriptor for standard output, in the manual on stdout find the code 1. Then the case for small, to pass a pointer to the Hello World string buffer from the data section .rodata – msg, byte count – msglen, transfer registers rax, rdi, rsi, rdx correct has argument and call syscall.

Designation constant length lines and is described in manual nasm:

message db 'hello, world'
msglen equ $-message

Simple enough right?



Source Code



Hash Table

Hash table data structure allows to realize an associative array (dictionary), with an average capacity of O (1) to insert, delete, search.

Below is an example of a simple implementation of a hash mapy on nodeJS:

How it works? Watching the hands:

  • Inside is an array of hash mapy
  • Inside the element of the array is a pointer to the first node of a linked list
  • Partitioning the memory to an array of pointers (e.g. 65,535 cells)
  • Implement the hash function, the input dictionary is the key, and at the outlet it can do just about anything, but in the end returns the array index

How does the record:

  • At the entrance there is a pair of key – value
  • The hash function returns the index on
  • Get node linked list from an array by index
  • Check whether it matches the key
  • If it matches, then replace the value
  • If it does not, then move on to the next node, until we find or do not find the node with the correct key.
  • If the node has not found, we create it at the end of a linked list

How does the search key:

  • At the entrance there is a pair of key – value
  • The hash function returns the index on
  • Get node linked list from an array by index
  • Check whether it matches the key
  • If it matches, the return value
  • If it does not, then move on to the next node, until we find or do not find the node with the correct key.

Why do we need a linked list in the array? Because of possible conflicts in the calculation of the hash function. In such a case several different key-value pairs will be located on the same index in the array, in such a case is carried out by extending the linked list with the search key necessary.



Source Code



Resources access through NDK C++ Android

To work with resources in Android through ndk – C ++ there are several options:

  1. Use access to the resources of the apk file using AssetManager
  2. Download resources from the Internet and extract them in the application directory, used by standard methods C ++
  3. Combined method – to get access to the archive with resources apk through AssetManager, unpack them in the application directory, then use with standard C ++ techniques

Next, I will describe the combination of access methods using the game engine Flame Steel Engine.
When using SDL can facilitate access to the resources of the apk, library wraps the calls to AssetManager, offering a similar interface to the stdio (fopen, fread, fclose, etc.)

SDL_RWops *io = SDL_RWFromFile("files.fschest", "r");

After the file download from the apk to the buffer, you need to change the current working directory to the application directory, it is available to the application without additional permits. For this we use a wrapper on SDL:


Next, write down the file from the clipboard to the current working directory using fopen, fwrite, fclose. After the archive will be available in the directory for C ++, unpack it. Archives zip can extract a combination of two libraries – minizip and zlib, the first structure can work with files, the second decompresses the data.
For more control, simplicity, portability, I realized own archive format with zero compression called FSChest (Flame Steel Chest). This format supports the directory archiving files, and unpacking; Support folder hierarchy is missing, can work only with files.
Connecting the library header FSChest, unpack the archive:

#include "fschest.h" 
FSCHEST_extractChestToDirectory(archivePath, SDL_AndroidGetInternalStoragePath()); 

After unpacking, C / C ++ interfaces will be available files from the archive. So I did not have to rewrite all the work with the files in the engine, and add only unpacking files at startup.



Source Code



Stack Machine and RPN

Let’s say we need to implement a simple bytecode interpreter, which approach to the implementation of this task to choose?

Stack data structure provides an opportunity to implement a simple bytecode machine. Features and realization of machines stack described in numerous articles of Western and domestic Internet, just mention that the Java virtual machine is an example of a stack machine.

The principle of operation of the machine is simple, the input is a program containing data and opcodes (opcodes), using the stack manipulations performed realization of necessary operations. Consider the example of the program bytecode my stack machine:

пMVkcatS olleHП

At the output we get the string “Hello StackVM”. Stack machine reads the program from left to right, character by character by uploading data onto the stack, with the appearance of the opcode to symbol – performs implementation team using the stack.

An example of the implementation of the stack machine to nodejs:

Reverse polish notation (RPN)

Also, stacking machine is easy to use for the implementation of calculators, this is done using RPN (postfix notation).
An example of a conventional infix:

Converted into RPN:

To calculate postfix notation use a stack machine:
2 – at the top of the stack (stack 2)
2 – on top of the stack (Stack: 2.2)
* – get the top of the stack twice, multiply the result is sent to the top of the stack (stack of 4)
3 – on top of the stack (the stack 4, 3)
4 – on top of the stack (a stack of 4, 3, 4)
* – get the top of the stack twice, multiply the result is sent to the top of the stack (stack of 4, 12)
+ – get the top of the stack twice, add up the results, go to the top of the stack (stack 16)

As you can see – the result of operations 16 remains on the stack, it can be derived by implementing opcodes stack printing, for example:

П – print start opcode stack, n – opcode closure print stack and sending the final line in rendering.
For conversion from arithmetic operations in postfix infix used Edsger Dijkstra algorithm called “Shunting-yard algorithm”. An example implementation can be found above or in the project repository on the machine stack nodejs below.



Source Code



Skeletal Animation (Part 2 – Node Hierarchy, Interpolation)

Algorithm goes on to describe skeletal animation, as its implementation in the game engine Flame Steel Engine.

Because the algorithm is the most complex of all that I implemented, in the notes on the process of development can occur errors. In the last article of this algorithm, I made a mistake, bone mass is passed to the shader for each mesh separately, rather than for the entire model.

Node Hierarchy

To work correctly you need to model the algorithm contained a link bones together (graph). Imagine a situation in which both played two animations – jumping and raising his right hand. Animation jump should raise the model on the Y axis, the animation show of hands should take this into account and to rise along with the model in a jump, otherwise the hand will remain on its own on the spot.

Describe the relationship of nodes in this case – the body contains a hand. In developing the algorithm will produce bone graph reading, all animations will be included with the correct connections. The memory model graph is stored separately from all animations, just to reflect the connectivity model bones.

Interpolation on CPU

In the last article, I described the principle of rendering skeletal animation – “transformation matrix are transferred from the CPU to the shader when rendering each frame.”

Rendering each frame is processed on the CPU, for each bone mesh engine receives a final transformation matrix by interpolation position, rotation, zoom. During the final interpolation bone matrix produced by extending the tree nodes for all active nodes animations, final matrix is ​​multiplied to the parent, is then sent to the rendering in the vertex shader.

For interpolation position and increasing use of the vector, quaternions are used to rotate because they are very easy interpolated (SLERP) in contrast to the Euler angles, as they are very easy to imagine a transformation matrix.

How to simplify the implementation of

To simplify debugging work vertex shader, I added the simulation work on the vertex shader CPU using FSGLOGLNEWAGERENDERER_CPU_BASED_VERTEX_MODS_ENABLED macro. At NVIDIA graphics cards manufacturer has a tool for debugging the shader code Nsight, perhaps she, too, can simplify the development of complex algorithms vertex / pixel shaders, however, test the functionality I have not had the opportunity, enough simulation on the CPU.

In the next article I plan to describe the mixing of multiple animations, supplement to fill the remaining gaps.




Add JavaScript Support For C++

In this article I will describe a method of adding support for JavaScript scripts in C ++ application using the Tiny-JS library.

Tiny-JS is a library for embedding in C ++, provides the execution of JavaScript code, with support for bindings (the ability to call C ++ code from a script)

At first I wanted to use the popular ChaiScript library, Duktape or include the Lua, but due to dependencies and possible difficulties in porting to different platforms, it was decided to find a simple, minimal but powerful MIT JS lib, meets these criteria Tiny-JS. The only disadvantage of this library in the absence of support/development from the developer, but it is fairly simple code that allows you to take the support, if required.

Download Tiny-JS from the repository:

Next, add Tiny-JS headers:

#include "tiny-js/TinyJS.h" 
#include "tiny-js/TinyJS_Functions.h" 

Add .cpp TinyJS files to build stage, then you can start writing load and run scripts.

Lib usage examples available in the repository:

An example of a class handler implementation can be found in the project SpaceJaguar:

Game script example integrated into application:




Linux to iOS C++ Cross Compile

In this article I will describe the build process of C++ SDL iOS app on Linux, resign ipa file without a paid subscription to Apple Developer Program and installation on a clean device (iPad) via macOS without Jailbreak.

First, install the build toolchain in Linux:

Toolchain needs to be downloaded from the repository, follow instructions on the site of Godot Engine to complete the installation:

At this point you need to download Xcode dmg and copy ios sdk to build cctools-port. This stage is easier to pass on MacOS, simply copy of Xcode installed sdk necessary files. After successful build, the terminal will show a path to crosscompiler bin directory.

You can then proceed to the build of SDL applications for iOS. Open cmake and add the necessary changes for C ++ build code:

SET(CMAKE_C_COMPILER arm-apple-darwin11-clang)
SET(CMAKE_CXX_COMPILER arm-apple-darwin11-clang++)
SET(CMAKE_LINKER arm-apple-darwin11-ld)

Now you can build using cmake and make, but do not forget to register to the $PATH of crosscompiler bin directory:


For correct linking with frameworks and SDL add them to cmake, Space Jaguar games depending for example:


In my case, SDL library, SDL_Image, SDL_mixer compiled in Xcode on macOS advance for static linking; Frameworks copied from Xcode. Also added libclang_rt.ios.a library, which includes specific runtime iOS calls, e.g. isOSVersionAtLeast. Enabled macros for working with OpenGL ES, disables unsupported features in the mobile platforms, similar to Android.

After the resolving build problems, you must get a binary compiled for the arm. Next, I will describe binary installation and run on the device without Jailbreak.

On macOS make Xcode installation, register on Apple’s website, without having to pay for the development of the program. Add account in Xcode -> Preferences -> Accounts, create an empty application and build for a real device. During assembly, device will be added to a free developer account. After building and running, you need to make an archive of the build, for this select Generic iOS Device and Product -> Archive. By the end of the archive build, copy files called embedded.mobileprovision, PkgInfo. From the build log on the device, find the line codesign with the correct key signature, the path to the file with the extension of entitlements app.xcent, copy it.

Copy the .app from the archive, replace binary in the archive created by cross-compiler in Linux (eg SpaceJaguar.app/SpaceJaguar), further adding to the .app necessary resources to check the safety and PkgInfo embedded.mobileprovision in the .app file from the archive, copy again if needed. Resign .app using codesign command – codesign requires the input argument of the key for the sign, the path to the entitlements file (can be renamed with the extension .plist)

After resign create Payload folder, move to the folder with the .app extension, create a zip file with the Payload directory, rename the file with the extension .ipa. After that, in Xcode, open the list of devices and Drag’n’Drop ipa in the list of device applications; Installation via Apple Configurator 2 for this process does not work. If resign made correctly, a new application is installed with correct binary on iOS device (e.g. iPad) with 7 day certificate for testing period that is sufficient.